Performance test your assumptions

02 Nov 2011

This article was originaly published on http://www.spilgames.com/performance-test-your-assumptions/.

Every programming language is different, even between versions of the same language. And yet we base many of our solutions on knowledge gathered in the past. As I've found when I started working with Javascript years ago, many of the solutions from other languages equally apply to Javascript. However, you are making assumptions even when they make a lot of sense on the face of it. As I will show in this article, these assumptions need to be either proven or disproven.

My team, Team Panda, is responsible for maintaining the platform to facilitate games on hyves.nl/games and apps.facebook.com/zapapagames. The platform depends heavily on Javascript, and for a specific use case, we did a comparative performance test between two approaches for implementing regular expressions.

With regard to regular expressions, most developers assume that re-using one instance of a fixed regular expression will perform much better when compared to generating multiple regular expressions on the fly. The reasoning behind this mostly stems from regular expressions being expensive to compile, while relatively cheap to use. We had within our team a use-case where these two opposing concepts could both be applied.

The use-case involves a relatively simple templating approach. We have a text-blob which contains ‘tags’ that are to be replaced by content supplied as a set of key-value pairs. The tags in our case are similar to self-closing html-tags only we use different delimiters so as not get a conflict with the html-tags in our templates.

The easy to understand approach is to loop over the set of keys, search and replace all instances of each key for its matching value. The opposite is to search the text-blob for all ‘tags’ and see if a key exists to match that tag, then replacing it with the matching value. In pseudo-code it would look like this:

Solution 1:

for each ‘key’ in ‘key-value set’
  search and replace all instances of ‘key’ with ‘value’

Solution 2:

for each ‘tag’ in ‘text-blob’
  get ‘value’ for ‘key’ which matches ‘tag’
  replace ‘tag’ with ‘value’

The spec for JavaScript only contains ‘String.replace(RegExp search, String replace)’. This would mean that the first solution will generate as many regular expressions as there are keys in the set. The second solution would only need a single regular expression. At the time we started discussing this we had example code for both solutions, which looked more or less like the following:

Solution 1:

for (var i in data) {
  template = template.replace(new RegExp('{' + i + '}', 'g'), data[i]);
}

Solution 2:

var re = /{([^}]+)}/gi;

var m;
while (m = re.exec(o)) {
  o = RegExp.leftContext;
  o += data[m[1]];
  o += RegExp.rightContext;
  re.test("");
}

Our team assumed the second solution would perform much better at the cost of adding some complexity. We created a performance test at jsperf.com to put this assumption to the test. If the performance gain would be high enough, we would choose the second solution and its added complexity. As the title of this post may suggest, we were wrong :)

Check the numbers yourself at http://jsperf.com/regex-templating where it turns out Solution 1 was performing much better. For the largest target audience our assumptions appears to be wrong. FireFox and Opera give us the expected outcome, here the static regular expression by far outperformed the use of multiple dynamically created expressions.

Performance results

Even after putting some effort into tweaking the single Regular Expression solution for better performance, it still cannot out-perform the quick and easy solution, at least not for the majority of our target audience (IE / Chrome). http://jsperf.com/regex-templating/9 is a final cleanup with the most viable code still pointing towards the more easily understood Solution 1.

Revised performance results

Concluding, I would point out that the only arguments that can truly stand up to scrutiny are based on facts, not assumptions. So put your code to the test and the results will either argue your case for you, or proof that you must change your way of coding.