Monday, March 17, 2008

PHP Benchmarking: Arrays and Iteration

As a programming department, it is always our goal to code in such a manner that makes use of methods that provide, on average, the fastest execution times. In today’s discussion, we’ll be testing the many different ways to process and interact with arrays in PHP in the first of three main areas: reading, modifying, and reconstructing.

We’ve got some ground work to establish today, so modifying and reconstructing will be saved for later. Also, as time goes on, we’ll be talking about many different areas of PHP (like functions versus objects and static methods). But, for today, arrays have been chosen as our first point of discussion because, at least in my experience, they are more than ubiquitous when it comes to developing dynamic or content-driven applications, and dealing with large amounts of data (e.g.: user lists, pages, resources of all kinds, etc).

Preface

As PHP is an interpreted language, and all machines/OS’s/configurations/processors are not created equal, please note that results will vary from machine to machine. My configuration for these tests are as follows: AMD Athlon 64 X2 Dual core 6000+, 2 GB Kingston PC-6400, Windows XP sp2.

Getting started

When benchmarking, the goal is to determine the comparable time percent difference for a given group of methods; however, it is equally important to consider the data that being used in these test, and how it is applied. It’s best to use data that would reflect a real, or as close to real-world application as possible. We’re going to be dealing with a decent size array with 500 elements, each containing a string of 1024 bytes. In the real world, this could reflect a list of pages that we’ve chosen to search through via PHP, or, more believably, a list of comments to a blog post such as this one.

I’ve created a few generic classes to help me out with the nitty-gritty of the benchmarking process. At the end of this series on PHP-specific benchmarks, I’ll walk everyone through how to use these classes for their own tests. For today though, I’m just going to gloss over this groundwork and get on to the specific tests. Here’s how I’ve setup my test data and iteration control.


define('ITERATIONS', 10000);
define('ELEMENTS' , 5000);
define('VALUE_SIZE', 1024);

$ag = new ArrayGenerator(ELEMENTS, VALUE_SIZE);

$tc = new TimeCompare();
// The associative array run tests on
$a_assoc = $ag->generate(AG_TYPE_ASSOC);
// The indexed array to run tests on
$a_index = $ag->generate(AG_TYPE_INDEX);

As an extra benefit to those who download the source code, ArrayGenerator and all forthcoming Generators provide a small, mostly trivial lesson in PHP 5’s class inheritance and interface models.

Alright, lets get on with it.

Reading arrays

Right off the top of my head, whether they be associative or indexed, I can think of four (4) different iterative structures that can be used to read through an array in PHP: for, foreach, list/each, and current/next.

For-loops have several variants. First, we will examine the traditional form, and then we will look at more optimized versions. In our test, we’re simply going to concatenate all of our array values into a separate variable. This test exemplifies the scenario where there are several pieces of data stored in array that need to be output to the page (sans formatting for us). Here’s the code for both indexed and associative arrays.


Indexed

$t = '';
for ( $i=0; $i<count($a_index); $i++ ) {
$t .= $a_index[$i];
}

Associative

$t = '';
$keys = array_keys($a_assoc);
for ( $i=0; $i<count($a_assoc); $i++ ) {
$t .= $a_assoc[$keys[$i]];
}

Indexed vs. associative arrays
















TestTime (%)Total time (ms)
Traditional (index)1000.000000126
Traditional (assoc)1050.000000147

The lesson here is to use index arrays when possible; however, this may only be applicable to for and other loops that use index counters. This may not apply to loops that use keys or other means of iteration; associative may be faster, if may be the same — we’ll find out later.

Now, lets be smart about our resources and move the count() statement outside of the loop-body and eliminate those extra n function calls. In general, we would write something like this.

Limit counting outside of the loop body

$limit = count($some_array);
for ( $i=0; $i<$limit; $i++ ) {
// … do something …
}

Now, applying this method to the previous two loops, lets retest to see what, if any improvement we get. I theory, depending on how count() behaves, this could be quite substantial.

Pre-counting the array size outside of the loop















TestTime (%)Total time (ms)
Traditional, pre-count (index)1000.000000110
Traditional, pre-count (assoc)1060.000000117
Pre-counted vs. counting as part of the loop

























TestTime (%)Total time (ms)
Traditional (index)1150.000000126
Traditional (assoc)1340.000000147
Traditional, pre-count (index)1000.000000110
Traditional, pre-count (assoc)1060.000000117

Furthermore, if we switch from the post-increment operator, to the pre-increment operator, we should also see some time improvements. The difference is that in using the post-increment operator (a++), a copy of the variable is made, the original value is incremented, and then that copy is returned. With pre-increment operator (++a), the value is incremented and then returned, cutting the process down one less step. In general we would write something like this.

Pre-incrementing the loop counter

$limit = count($some_array);
for ( $i=0; $i<$limit; ++$i ) {
// … do something …
}

This gives us the following comparison.

Post vs. pre-incrementing


























TestTime (%)Total time (ms)
Traditional, pre-count (index)1100.000000111
Traditional, pre-count (assoc)1190.000000121
Traditional, pre-count, pre-increment * (index)1000.000000101
Traditional, pre-count, pre-increment * (assoc)1120.000000114
* We’ll refer to this last version of the for-loop as an optimized for-loop (limit pre-counting, pre-incrementing) from now on.

So now that we have everyone’s favorite for-loop thoroughly smashed (like an Idaho potato), lets move on to the other three methods for iterating over an array. In these final tests, we’ll be upping our test parameters a tad.


define('ITERATIONS', 1000000);
define('ELEMENTS' , 5000);
define('VALUE_SIZE', 1024);

The beauty of these final tests is that since the subscripts aren’t used, we can use the same code and just change the array names where necessary, and since we’ve spent so much time already, lets just put all the code out there straight away.

Foreach indexed/associative

$t = '';
foreach ( $some_array as $value ) {
$t .= $value;
}

List/each indexed/associative

$t = '';
while ( list(,$value) = each($some_array) ) {
$t .= $value;
}

Current/next indexed/associative

$t = '';
$value = current($some_array);
while ( $value !== false ) {
$t .= $value;
$value = next($some_array);
}

So now that we have our code, lets take a look at our results.










































TestTime (%)Total time (ms)
For, optimized (index)1000.0000000610
For, optimized (assoc)1130.0000000691
Foreach (index)1040.0000000637
Foreach (assoc)1040.0000000633
List/each (index)1130.0000000692
List/each (assoc)1110.0000000680
Current/next (index)1090.0000000666
Current/next (assoc)1080.0000000659
Interestingly enough, it appears that in the last three methods, associative arrays seems to out-perform indexed array by a slight amount, and although we’re talking only one to two percent differences, in the grand scheme of things, it all adds up.

So, overall, it appears that for is overall the winner of this match-up. with foreach, current/next and list/each following successively. For associative arrays, foreach appears to be the definite way to go being at least 4% faster than its closest rival, current/next. This is good news, since foreach is used so commonly in almost every piece of code in PHP that I’ve ever seen. It’s easy, and the good news is: its fast; but better than that, it appears, at least from this benchmark, that it is the right choice when dealing with associative arrays.

No comments: