Decreasing Perl Script Response Times with Persistent Perl

February, 2008
by Rob Harrigan

About the Author
Rob Harrigan

Rob has been CMG Headquarter's IT Specialist, Programmer & Webmaster since 2002. Very familar with the LAMP Stack (Linux, Apache, MySQL, Perl/PHP), he's developed a number of in-house solutions, including the Paper Susbmission System, the Online Agenda and Conference Scheduler. In his spare time, Rob plays both lead and bass guitar proficiently, and dabbles with keyboard & drums, electronics, graphics, web design and diy home repair. He can be contacted at .

Related Papers
Performance Diagnosis Support for IIS6 Through Request Tracking
Dr. Insung Park

DB2 Performance Monitoring using Snapshots
Thomas Bodenheimer

ARMing Apache
David Carter

Is There Value in the RFP Process?
John H. Silver

Using RAID to Improve Database Performance
Greg Schulz

A Performance Database for the Distributed Environment
Tim K. Hayward

See more
Join CMG

Introduction

In a typical perl script-based web application the web server reads a script into memory and launches a perl interpreter process which executes then unloads it, freeing the memory for the next request. However, this is very wasteful for large programs that are used repeatedly. The same script has to be read and executed for each request, often simultaneously. What would be nice is to have your script run as a daemon or in a client-server model to reduce the repetitive calls. While this is possible, it is not practical being that it likely requires a major code overhaul.

Persistent Perl (PP) aims to solve this problem by allowing your perl programs to stay in memory after initial execution and thus be re-executed for subsequent requests. This greatly reduces the overhead of launching processes for each request such as re-reading the interpreter and script into memory, as well as limiting costly operations like database and file handle creation. The net effect is lower response times, happier users and a better performing web server.

Getting Started

To use PP you must first install it from source (http://daemoninc.com/PersistentPerl/download.html), as a binary package for your distribution or from CPAN (http://search.cpan.org/~horrocks/PersistentPerl-2.22/lib/PersistentPerl.pm). I prefer CPAN, since any required modules can also be installed automatically. Installing from CPAN is easy and generally involves the following steps:

  1. Launch CPAN
  2. Install Module
  3. Exit.

Please note: you may need to have root access to use CPAN.

The command:

perl -e shell -MCPAN

will launch the CPAN shell, which may be followed by a series of configuration questions if this is your first time using it. Eventually you should be greeted with the CPAN prompt at which you type the following:

cpan> install PersistentPerl

You'll be asked if you would like to compile mod_persistentperl, you can answer 'no' for now and save this for another day, since it involves changing the Apache configuration which is beyond the scope of this article. After a few screens worth of output you should have a working PersistentPerl module and a perperl executable usually installed in /usr/bin/perperl, and you can now exit the cpan shell.

Once you have PP installed, it's time to start using it. There are several ways to get your scripts to run using PP, for instance you could configure a new file type-handler in your web server configuration or setup all scripts in particular directory to use the PP interpreter rather than the standard perl interpreter. More details for advanced configurations can be found on the PP site (http://daemoninc.com/PersistentPerl/). However, the simplest way to get started is to change the interpreter in the first line of a standard perl script.

#!/usr/bin/perl

becomes

#!/usr/bin/perperl

This line tells Apache to use the persistent perl interpreter instead of the standard perl executable to interpret this script. This is technically all you "need" to do to use PP, however the results may be less than desirable without a few more tweaks to existing code.

The next stumbling block is global variables. Variables defined in the top-level scope (that are outside of any functions) are available to all subroutines. This comes in handy in one-shot script execution but can lead to a twisted mess when using PP. For instance, it's common practice when using CGI.pm to declare a CGI object in the global scope and access any of the parameters as needed in the subroutines. When using PP though, variables declared in this way are not only visible to all subroutines, but to the subroutines of all the running invocations of the script. In this way it is possible for User A to update a variable being used by User B. This could range from something as benign as a changing a page number in a result set parameter or as severe as swapping a session or user id.. To protect against this, the only variables declared globally should be 'constants' in the sense that we do not expect them to be changed, although perl has no way to enforce this. Things that fall into this category are configuration options, as well as file and database handles that need to be initialized before they can be used by all invocations.

Let's examine the case of a database handle. Opening a database connection is a notoriously expensive operation. Some languages like PHP attempt to reuse database handles whenever possible, Perl does not have this feature. To compensate, we declare a global database handle so that it may be re-used by subsequent requests, the code is as follows:

use vars qw($DBH);

unless ( defined($DBH) && $DBH->ping) {

$DBH = &connectToDB();

}

The first line prevents PP from complaining when we try to declare a variable ($DBH) without an explicit 'my' as is required by 'use strict'. The second block of code first checks to see if the database handle has been previously defined and that the database is still alive using DBI's ping method. On the first invocation of the script, these tests should fail allowing the creation of the handle, but will prevent the re-creation of the database resource for subsequent requests. The ping also allows for the handle to be re-established in the case the connection has gone away for some reason.

The final technique to convert our code is to encapsulate any instance related code into a function. This prevents CGI data from being accessed by other invocations. One way to accomplish this is to structure your code in the following manner:

# Beginning of program

...

# End Global Variables

start();

sub start{

my $query = new CGI;

&doSomething($query);

}

# Other subs

...

# End of program

Now that we have isolated our instance variables leaving only constants at the global scope, most code will run just like it did before, with the exception that you should see a performance increase. For a full working example, please see the Appendix.

Making the case

To see just how much we've gained in terms of performance. I created a user load of 20 concurrent connections, each making a series of 5 requests, for a total of 100 requests. I did this using the same code, once with the standard perl interpreter, once with the PP interpreter. So as to avoid network delays affecting the outcome, the tests are performed over a LAN.

To perform the tests I used a benchmark script that makes use of the LWP module for network access and the Time::HiRes module to record the time in milliseconds between request and response. This script is also available in the Appendix.

In Figure 1 we see that the response times when using the PP interpreter are lower on average and more consistent with less fluctuation. As opposed to the standard perl interpreter which has larger response times on average and much greater fluctuation, jumping from near 0 to 7-8 seconds. Imagine the frustrated user who's surfing experience is delayed for 7 seconds while waiting for a advertising banner to load.

Figure 1

Why is it so expensive to use the standard perl interpreter? The web server has to fork a separate perl interpreter process for each request, where PP can get by with a small number of backend processes. To be fair there are other ways around this. One is to use mod-perl which embeds a perl interpreter in the Apache Server. The drawback is that this configuration is more complicated and usually requires that an entire directory of scripts to be executed by mod-perl. The problem with this is the mod-perl scripts suffer from the same global variable problem as described earlier. Which means to use it reliably, a lot of code has to be re-written and tested all at once, a dangerous proposition, where as with PP, you can choose to implement it on a script by script basis.

The second reason the Standard Perl interpreter in so expensive is the creation of the database handle. The database server, MySQL in this case, is a deamon sitting around waiting for work to do. So it makes sense to treat it as such and re-use the connection rather than repeat the authentication process for each request. The same would hold true if you had to read a large file, a dictionary for example, into memory at the beginning of each script. With PP you could load it into memory once, and use it as necessary. The true potential for PP now becomes apparent. Traditionally expensive operations which have been omitted based on their performance impact can now be re-examined for possible inclusion.

Caveats

There are just a few downsides. Other than the previously discussed global variable issue, the one other thing to keep in mind is libraries and "require"d files. These are only loaded once at initial invocation, if they are modified by developers they are not automatically reloaded. This can be a bit frustrating until you get used to it. You may scratch your head a few times thinking, "but I changed that function, it should work" until you realize the function you changed is in a library which hasn't been loaded into memory yet. The solution is to update the modification time of the calling script (not the library). This will force a reload of the script, all libraries and "require"d files. To do this, simply use the 'touch' command as in:

touch someprogram.pl

You can also make PP reload programs after a set amount of time. I believe the default is an hour, but if you have an exceptionally buggy or stable program you may want to alter this value. You can simply add a flag to the first line of your programs like so:

#!/usr/bin/perperl -- -t300

There are other configuration options and special methods to help you code PP enabled scripts. Documentation is available on the project website (http://daemoninc.com/PersistentPerl/).

Conclusion

Hopefully you will give PersistentPerl a try and realize for yourself the performance gains it can offer. It has the potential to turn ordinary perl scripts into daemon-like applications thereby reducing initialization overhead as well as memory usage thus allowing your web server to perform better and your users to surf faster.

Appendix

Banner-demo.pl
#!/usr/bin/perperl ########## # # Above, instead of the default perl interpreter, we use the persistent # perl interpreter, 'perperl' # use strict; use CGI; use DBI; use POSIX qw(strftime); use vars qw($DBH); ########### # # Globals # # Any variables created in this scope will be truly global in the # sense that they will be shared among all the running invocations of this script. # Therefore it is necessary to keep only variables that should not be # changed/shared at this scope, notably user and session specific variables. # Re-use Database handle. unless ( defined($DBH) && $DBH->ping) { #returns a handle to Database $DBH = &connectToDB(); } my $THIS = '/cgi-bin/banner-demo.pl'; my $LOGFILE = 'banner-log.tab'; ########## # # Begin Application # start(); ########## # # Check arguments and proceed # sub start{ # Creating a query object here prevents it from being shared # among all invocations of script my $query=new CGI; my $go=$query->param('go'); # Check that redirect is valid if($go=~m/^http/){ &clickThru($query) }else{ &printBanner($query); } } ########## # # When user clicks on banner link, log details, then redirect to $go URL # sub clickThru{ # Retrieve CGI object argument my $query=shift; my $ref = $query->referer(); my $go = $query->param('go'); my $ip = $ENV{'REMOTE_ADDR'}; my $ua = $ENV{'HTTP_USER_AGENT'}; my $now = strftime( "%Y-%m-%d %H:%M:%S",localtime(time); # Log Link, IP address, Day/Time to file open(LOG,">>$LOGFILE"); flock(LOG,2); print LOG "$now\t$ref\t$go\t$ip\t$ua\n"; close LOG; print $query->redirect(-uri=>"$go"); } ########## # # Intended to be called by Apache as a Server Side Include # sub printBanner{ # Pull an active random banner from database my $statement = qq(select * from banners where active=1 order by RAND() limit 1); my $sth = $DBH->prepare($statement); $sth->execute or die "Unable to execute query: $DBH->errstr\n"; my $row=$sth->fetchrow_hashref; $sth->finish; # Print banner code print "Content-type: text/html\n\n"; print qq(<a href="$THIS?go=$row->{'link'}"><img src="$row->{'src'}" alt="$row->{'alt'}"></a>); } ########## # # Database connection # sub connectToDB{ my $database = "db_name"; my $username = "db_username"; my $password = "db_password"; my $socket = '/tmp/mysql.sock'; my $data_source = "DBI:mysql:$database:mysql_socket=$socket"; my $dbh = DBI->connect( $data_source, $username, $password) \ or die "Can't connect to $data_source\n"; return $dbh; } __END__ ########## # # Anything below __END__ is ignored, use these queries to create and populate a test # database. # ########## # # Database table creation schema # create table banners ( id int auto_increment PRIMARY KEY, link varchar(128), src varchar(128), alt varchar(64), active char ); ########## # # Database sample data # insert into banners values(NULL,'http://slashdot.org','/banners/slash.jpg','Slashdot',1); insert into banners values(NULL,'http://digg.com','/banners/digg.gif','Digg',1); insert into banners values(NULL,'http://linux.com','/banners/linux.gif','Linux.com',1); Benchmark.pl
#!/usr/bin/perl -w use strict; use LWP; use Time::HiRes qw(gettimeofday tv_interval); use Getopt::Std; pipe(README, WRITEME); my %opts = (); my $status = getopts('u:c:r:', \%opts); ($status == 0) and die ("Usage: $0 -c <Number of Clients> -r <Number of Requests per Client> -u <URL>\n"); my $url=$opts{'u'}; my $num_tries=$opts{'r'}; my $maxkids=$opts{'c'}; my @results; my @childs; my $errs; # Setup browser ## my $browser = LWP::UserAgent->new; # Set agent name $browser->agent('Cruncher'); # Allow redirection of Post requests push @{ $browser->requests_redirectable }, 'POST'; # Fork children for(my $j=0;$j<$maxkids;$j++){ my $pid = fork(); if ($pid) { # parent push(@childs, $pid); } elsif ($pid == 0) { # child close(README); doRequest($j); exit(0); } else { die "couldn’t fork: $!\n"; } } # Save results foreach (@childs) { waitpid($_, 0); close(WRITEME); my @strings = <README>; foreach my $string (@strings) { chomp($string); my ($elapsed,$error)=split("\t",$string); $errs+=$error; push(@results, $elapsed); } } print STDERR "Results for: $url\n"; print STDERR "Sum: " . &array_sum(@results) . "\n"; print STDERR "Average: " . &array_average(@results) . "\n"; print STDERR "Errors: " . $errs . "\n"; print STDERR "Printing results to STDOUT\n"; foreach my $res (@results){ print STDOUT "$res\n"; } sub doRequest{ my $elapsed; my $error; my $response; my $t0; my $t1; my $j=shift; $browser->get($url); # Initialize browser to avoid startup costs being counted below. for(my $i=0;$i<$num_tries;$i++){ $t0 = [gettimeofday]; $response = $browser->get($url); $t1 = [gettimeofday]; $elapsed=tv_interval($t0, $t1); if($response->is_error){$error=1;}else{$error=0;} printf STDERR ("Try %d for Child %d, of %d reporting\n", $i+1,$j+1,$maxkids); print WRITEME $elapsed ."\t" . $error ."\n"; } } sub array_sum{ my $sum=0; foreach my $a (@_){ $sum+=$a; } return $sum; } sub array_average{ my $array_count=array_count(@_); if($array_count!=0){ return (array_sum(@_) / $array_count); }else{ return 'NaN'; } } sub array_count{ my $count=0; foreach my $a (@_){ $count++; } return $count; }