SLB: Benchmarking Performance Optimizations

By Sol in Lab

As Simple Lightbox grows (read: does more), it’s important to make sure that the additional functionality does not degrade performance.  If possible, software should run faster despite additional functionality.  I spent some time during the current beta phase to test my assumptions and compare the performance of different approaches to new features currently in development.

Regular Expressions

I love regular expressions.

It’s as simple as that really.  You can create a very complex text search using a relatively simple regular expression.  Anything from validating email addresses and phone numbers, to parsing the links in a document can be easily accomplished with regular expressions.

There is, of course, a downside to this convenience– Performance.

In exchange for simplifying the task of matching a complex bit of text, regular expressions are slower than simple string operations.  How much slower greatly depends on the complexity of the search pattern and the size of the text being searched.

One of SLB’s new features currently in development is the option to group gallery links in a separate slideshow from the rest of the links in a post.  To do this, SLB separates and processes the gallery links separately from the rest of a post’s links.  This is a relatively simple task for regular expressions, but since several posts each containing several galleries may be processed in a single request, I wanted to see just how much of a performance hit there was when using regular expressions as opposed to using a string-based approach.

Both approaches do the same thing:

  1. Find a gallery in the post’s content
  2. Save gallery links and replace with a placeholder in the post’s content
  3. Repeat steps #1 & #2 until all galleries have been found
  4. Process remaining links in post
  5. Reinsert processed gallery links back into post

Here is the code for each approach

String-based

$content = $c_temp;
$w = $this->group_get_wrapper();
$groups = array();
$g_idx = 0;
$g_end_idx = 0;
//Iterate through galleries
while ( ($g_start_idx = strpos($content, $w->open, $g_end_idx)) && $g_start_idx !== false
		&& ($g_end_idx = strpos($content, $w->close, $g_start_idx)) && $g_end_idx != false ) {
	$g_start_idx += strlen($w->open);
	//Extract gallery content & save for processing
	$g_len = $g_end_idx - $g_start_idx;
	$groups[$g_idx] = substr($content, $g_start_idx, $g_len);
	//Replace content with placeholder
	$g_ph = sprintf($g_ph_f, $g_idx);
	$content = substr_replace($content, $g_ph, $g_start_idx, $g_len);
	//Increment gallery count
	$g_idx++;
	//Update end index
	$g_end_idx = $g_start_idx + strlen($w->open);
}

Regex-based

$rgx = "|" . preg_quote($w->open) . "(.+?)" . preg_quote($w->close) . "|is";
$content = $c_temp;
$w = $this->group_get_wrapper();
$groups = array();
$g_idx = 0;
$matches = array();
$o = 0;
while ( preg_match($rgx, $content, $matches, PREG_OFFSET_CAPTURE, $o) ) {
	//Extract gallery content
	$groups[$g_idx] = $matches[1][0];
	//Replace content
	$g_ph = $w->open . sprintf($g_ph_f, $g_idx) . $w->close;
	$content = substr_replace($content, $g_ph, $matches[0][1], strlen($matches[0][0]));
	//Update offsets
	$o = $matches[0][1] + strlen($w->open);
	$g_idx++;
}

Results

Each implementation was run in a loop (to normalize the execution times) on a post with 2 galleries and a few normal image links.

Iterations: 10,000
String-based: 0.108232975006
Regex-based: 0.957661867142
String-based faster than Regex-based by 8.85x (785%)

I was expecting the string-based approach to be faster, but I was still surprised by how wide of a gap there was between the two approaches.  Robustness and ease of maintaining the code must also be accounted for when deciding on an implementation, but in this case, the performance difference was too much to ignore.  Fortunately, the string-based approach is not that unwieldy, so moving forward with the string-based approach was a simple decision.

Retrieving Media Properties

Another new feature currently in development is enhanced skin functionality, which provides access to all of an image’s properties (URL, dimensions, title, description, etc.).  The previous release of SLB (v1.5.6) enabled image descriptions to be displayed in the lightbox, but now that skins would have access to all properties, it was important that the implementation of this functionality was as efficient and speedy as possible.

Until now, SLB fetched certain data on a per-link basis.  This effectively means that there is a database query for every single link being processed.  I will be the first to say that even though retrieving a single variable (using $wpdb->get_var()) from the DB is quite fast, it is not as efficient as retrieving multiple records from the DB in a single query.

As an example, here is the code for the different approaches used to retrieve the attachment ID of the internal image links in a request (an archive page with 10 posts):

Per-link

$pids = array();
foreach ( $links as $link ) {
	$pid_temp = $wpdb->get_var($wpdb->prepare("SELECT post_id FROM $wpdb->postmeta WHERE `meta_key` = %s AND `meta_value` = %s LIMIT 1", '_wp_attached_file', basename($link)));
	if ( is_numeric($pid_temp) )
		$pids_p[$link] = intval($pid_temp);
}

Bulk Query

$pids = array();
$l_names = array();
foreach ( $links as $link ) {
	$l_names[] = basename($link);
}
$l_names = strtolower("('" . implode("','", $l_names) . "')");
$q = $wpdb->prepare("SELECT post_id, meta_value FROM $wpdb->postmeta WHERE `meta_key` = %s AND LOWER(`meta_value`) IN $l_names", '_wp_attached_file');
$pids_temp = $wpdb->get_results($q);
foreach ( $pids_temp as $pd ) {
	if ( is_numeric($pd->post_id) )
		$pids[$pd->meta_value] = intval($pd->post_id);
}

Results

As in the previous example, both implementations were run in a loop to normalize results, but also to provide some insight into how a larger request with thousands of links might be affected by the approach.

Iterations: 10,000
Per Link: 48.4570000172
Bulk Query: 8.63826394081
Bulk Query faster than Per Link by 5.61x (461%)

Retrieving post data in as few queries as possible is clearly faster. As with regular expressions in the previous example, this test was more of a confirmation of my assumptions, but it also provided valuable data regarding the extent to which performance can be affected based on the approach.

Making the necessary changes to allow all links in a request to be processed took some doing, but the end result was much more modular (thus, more flexible) code, so it was well worth the time and effort.

It’s Worth it

I honestly did not want to do all of this benchmarking because it would delay Simple Lightbox’s next release.  However, seeing the potential performance that can be gained (or lost) based on a feature’s implementation reminded me that testing and comparing is an important part of developing fast and efficient plugins.  The result for everyone is that the next version of SLB will be faster than its precursor, despite having more features!