Open Source Technology: July 2008

Thursday, July 31, 2008

Define: HTTP_LOAD in details

Http_load is another cool webserver performance tester that gives simple stats on how your webapp is performing.

How to install in OS X

1. Download from http://www.acme.com/software/http_load/
2. Open terminal, cd to the directory where the archive is and unzip
$ tar -zxvf http_load-12mar2006.tar.gz
3. Move to that directory
$ cd http_load-12mar2006
4. Run
$ make
5. Run
$ make install

Once installed, using http_load for quick benchmarking is really quite straightforward. You call the program,
tell it how many requests to make concurrently, and how long to run (either in number of seconds, or total fetches),
and finally pass in a file full of URLs to request.

Testing

http_load requires at least 3 parameters:

* One start specifier, either -parallel or -rate
-parallel tells http_load to make the specified number of concurrent requests.
-rate tells http_load to start the specified number of new connections each second. If you use the -rate start
specifier, you can specify a -jitter flag parameter that tells http_load to vary the rate randomly by about 10%.
* One end specifier, either -fetches or -seconds
-fetches tells http_load to quit when the specified number of fetches have been completed.
-seconds tells http_load to quit after the specified number of seconds have elapsed.
* A file containing a list of URLs to fetch
The urls parameter specifies a text file containing a list of URLs, one per line. The requested URLs are
chosen randomly from this file.

You’re ready! Open up a text editor and write down the website’s url you want to test (your own preferably),
then cd to the directory where the .txt is and run

To see how many requests your server can take care of over a 100 fetches

Run:

$ http_load -parallel 5 -fetches 100 name_of_file.txt
which means open 5 concurrent connections and fetch the webpage 100 times.

You’ll get something like this:

100 fetches, 5 max parallel, 1.34237e+07 bytes, in 15.842 seconds
134237 mean bytes/connection
6.31234 fetches/sec, 847351 bytes/sec
msecs/connect: 28.9069 mean, 75.011 max, 14.865 min
msecs/first-response: 435.84 mean, 2484.28 max, 96.082 min
93 bad byte counts
HTTP response codes:
code 200 — 100

I highlighted the important bits. At the moment the webserver is capable of handling 6 requests per second and
has a mean average initial latency of 435 milliseconds.

The numbers you’ll want to look at in more detail are “fetches/sec” and “msecs/first-response”.
These are critical in terms of really understanding what your site is doing.

It’s important to note the difference between “benchmarking” and “profiling”. What we’re doing here with http_load
is the former: we’re getting a feel for a specific page’s overall performance. We know that it serves X pages per
second, and generally takes about Y milliseconds to response. What we don’t know yet is why either of these is the
case. You’ll have to dig in more detail into your PHP code and server configuration to determine what to tweak to
bring up your site’s performance to an acceptable level. http_load doesn’t, and can’t, do that for you.

Http_load tells you how your webapp is currently performing allowing you to test it under different conditions,
basically it’s a benchmarking tool juts like httperf. The next step is optimization. Have a look at
the 1st part of Getting Rich with PHP 5 (what a crappy title) by rasmus lerdorf for tools you can use to profile
your code and some tips on optimization. In the example shown he goes from 17 reqs/sec to 1100 reqs/sec .

-------------------------------------------------------------------------------------------------------
$ http_load --h

usage: http_load [-checksum] [-throttle] [-proxy host:port] [-verbose] [-timeout secs] [-sip sip_file]
-parallel N | -rate N [-jitter]
-fetches N | -seconds N
url_file
One start specifier, either -parallel or -rate, is required.
One end specifier, either -fetches or -seconds, is required.

--------------------------------------------------------------------------------------------------------
$ man http_load

NAME
http_load - multiprocessing http test client

SYNOPSIS
http_load [-checksum] [-throttle] [-proxy host:port] [-verbose] [-timeout secs] [-sip sip_file]
[-cipher str] ( -parallel N | -rate N [-jitter] ) ( -fetches N | -seconds N ) url_file

DESCRIPTION
http_load runs multiple http fetches in parallel, to test the throughput of a web server. However unlike
most such test clients, it runs in a single process, so it doesn’t bog down the client machine. It can be
configured to do https fetches as well.

The -checksum flag tells http_load to do checksums on the files fetched, to make sure they came across ok.
The checksums are computed the first time each URL gets fetched, and then recomputed and compared on each
subsequent fetch. Without the -checksum flag only the byte count is checked.

The -throttle flag tells http_load to throttle its consumption of data to 33.6Kbps, to simulate access by
modem users.

The -proxy flag lets you run http_load through a web proxy.

The -verbose flag tells http_load to put out progress reports every minute on stderr.

The -timeout flag specifies how long to wait on idle connections before giving up. The default is 60 seconds.

The -sip flag lets you specify a file containing numeric IP addresses (not hostnames), one per line.
These get used randomly as the *source* address of connections. They must be real routable addresses
on your machine, created with ifconfig, in order for this to work. The advantage of using this
option is you can make one client machine look like a whole bank of machines, as far as the server knows.

The -cipher flag is only available if you have SSL support compiled in. It specifies a cipher set to use.
By default, http_load will negotiate the highest security that the server has available, which is
often higher (and slower) than typical browsers will negotiate. An example of a cipher set might be
"RC4-MD5" - this will run considerably faster than the default. In addition to specifying a raw cipher
string, there are three built-in cipher sets accessible by keywords:
* fastsec - fast security - RC4-MD5
* highsec - high security - DES-CBC3-SHA
* paranoid - ultra high security - AES256-SHA
Of course, not all servers are guaranteed to implement these combinations.

One start specifier, either -parallel or -rate, is required. -parallel tells http_load to keep that
many parallel fetches going simultaneously. -rate tells http_load to start that many new connections each
second. If you use the -rate start specifier, you can also give the -jitter flag, telling http_load to
vary the rate randomly by about 10%.

One end specifier, either -fetches or -seconds, is required. -fetches tells http_load to quit when that
many fetches have been completed. -seconds tells http_load to quit after that many seconds have elapsed.

The url_file is just a list of URLs, one per line. The URLs that get fetched are chosen randomly from this
file.

All flags may be abbreviated to a single letter.

Note that while the end specifier is obeyed precisely, the start specifier is only approximate. If you
use the -rate flag, http_load will make its best effort to start connections at that rate, but may not
succeed. And if you use the -parallel flag, http_load will attempt to keep that many simultaneous connections
going, but may fail to keep up if the server is very fast.

--------------------------------------------------------------------------------------------------------------------

* Note that when you provide a file with a list of URLs make sure that you don't have empty lines in it.
If you do -- the utility won't work complaining:

./http_load: unknown protocol -

* Basic errors
- byte count wrong
- timeout
- parallel may be at most 1021

Solutions:

To remove "byte count wrong" error
install patch and run, to download click here
http://www.lighttpd.net/assets/2007/3/5/http_load-12mar2006-timeout.diff
and to install here is a command
patch -p 1 < http_load-12mar2006-timeout.diff

----------------------------------------------------------------------------------------------------------------
Overviewed by other blogs/forums:

1) http_load does not replicate heavy load; it replicates a DOS attack.

2) It generates N requests every second without waiting for the previous N requests to complete. Actually, it never waits for requests to complete. It kills them so the http server has no where to send data.

3) However this is just HTTP load.. if you have JS running and making database calls after page load I don't think this will help your testing methods.

---------------------------------------------------------------------------------------------------------------
Note :- It's my personnel experience, so if any suggestion or new test cases you have please drop me in comments.

Thursday, July 24, 2008

ALTER TABLE, SELECT AND INNODB

Let's assume you have a 512MB table, and you decide to alter the table to add an index to make queries faster.

How long would you expect this alter to take? Hours? Days?

Even with 7200 RPM-slow disks the alter should of finished in less then 1/2 hour.

I ran across an alter that was running for 4 days-on 512MB datasize. The reason why it ran so long is because there was a SELECT that was running preventing mySQL from performing "rename table", the last leg of the ALTER TABLE process.

Killing that SELECT released the shared lock allowing the alter to finish.

Do not KILL THE ALTER when stuck in this SHARED LOCK STATE; Do NOT then remove the temporary tablespace file "#sql-320f_106f99a2.*".

What will happen if you do remove the #sql* file by hand?

Well for one INNODB will crash the mysql instance saying it could not find the temporary table space. It failed to open it. Then on recovery the original table gets unlinked from the filesystem and you just lost all data for that tablespace.

Why?

Here are roughly the order of events for an alter:

Lock all writes from said table
Make a temporary table #sql - file
Copy all data from the old file to the new file
Do a quick consistency check between the two files
unlink the old file
rename the temp file into the old file name

Each step operates on the data dictionary pointers for the two tables. Issuing a filesystem rm command for the step right before unlink, will cause INNODB to crash and on recovery unlink the old file and of course fail on the rename.

Wednesday, July 16, 2008

Basic SVN Commands

Lets get it started….

How to get help with svn?
svn help
This will make svn list all the available functions, to get the function reference, let say checkout

svn help checkout
The same thing goes to other svn related commands, such as svnadmin
svnadmin help

How to create a svn repository?
First of all what is repository? It is a core file for svn, or you can call it a centralized svn backup database.
After created it, it is just a directory with its files.
IMPORTANT! Do NOT try to modify or add something into the repository, unless you know what are you doing.

To create a svn repo, let say I wanna create a repo to store all my programming codes, I do this
svnadmin create /home/mysurface/repo/programming_repo

Remember try to use absolute path for everything, sometimes the relative path is not going to work.

How to import my existing directories into the new repo?
svn import /home/mysurface/programming file:///home/mysurface/repo/programming_repo -m "Initial import"

-m stand for log message, the first revision was created with log as “Initial import”. You need to specified URL for the repo,
URL is the standard argument for svn. Therefore for local file, you need to specified with file://

How to see what is inside the repo?
svn list file:///home/mysurface/repo/programming_repo

Another way of listing all the files and folder in the tree view, I use svnlook

svnlook tree programming_repo

The difference between svn list and svnlook tree is one expect URL another one do not.

How to checkout files from svn repo?
This is the most critical part of svn and also the most common part of svn command line. A lots of open source development
projects provided the way for user to check out their latest code through the internet.

You need to check out in order to commit the changes to svn repo later. Refers back to the previous post, where I import entire
directory /home/mysurface/programming to programming_repo. I am going to checkout to the same folder. If you are skeptical of
doing this, you may want to backup the directory first.

mv programming programming-bk

Now checkout to programming, mkdir is not needed, as svn will create the directory for you if it is doesn’t exist.

svn co file:///home/mysurface/repo/programming_repo programming
co is the shortform of checkout.

Okay, lets just compare both folder with diff and store the result into a file comp.diff

diff programming programming-bk > comp.diff
Diff will list the folder in common, and also the differences. Check comp.diff, as it tracks the additional folder .svn
that only exist in programming/. Again, do NOT modified or delete this folder.

Are you convinced to remove your programming-bk/ ? Make sure you keep the repo safe and you can check out the same data
anytime, at any place.

You can even checkout only a specific file or specific folder from your repo. e.g.

svn co file:///home/mysurface/repo/programming_repo/c/curses/matrix.cc

This will only check out a file at current directory.

Single file can’t be checkout like directories, but you can extract them from repository by svn export

svn export file:///home/mysurface/repo/programming_repo/c/curses/matrix.cc

How to track the changes before commit to repo?
First of all, you track what files had changed,

svn status

It will list files which have changed, with some attributes besides the filename. Common attributes are M, ?, A … M is modified,
A is newly added (how to add refers later section), ? indicate the file is added into local directory but not added into repo.

Secondly, you want to track the differences between the previous revision and the working one. Lets assume color.c has changed,

svn diff color.c

I really don’t like svn diff ’s result. Fortunately, I found a simple bash script what makes vimdiff as the compare tool.

I name it as svndiff and place it at /usr/bin, change the mode to executable.

chmod +x /usr/bin/svndiff

Now, I can simply do this,

svndiff color.c

To close the vimdiff, type :qa

How to commit the changes?
You can commit with -m to place your log message if it is short. But if it is long, I suggest you to make use of your default
editor. I am a vim user, therefore I add a line into my ~/.bashrc

EDITOR=vim

Now I can commit with this:

svn ci

ci is the shortform of commit. Write the log message and close save vim :x, I am done. The same way as checkout, you can choose
to commit one file or any folder.

How to add or delete file to or from repo?
The file won’t be committed if you don’t add it into repo. Therefore you need to add it manually if you want it to goes into
your repo. Let say you wanna add a new file color2.cc

svn add color2.cc

Delete does the same way, if you only delete file at your working directory, it won’t reflects the changes to our repo.

How to check the logs for each revision?
The simplest way is doing just,

svn log

It will list all logs, start from latest revision. That is really irritating! You can limit it to 3 latest revision log by
doing this

svn log --limit 3

If you wanna check for specific revision, specified with -r,

svn log -r 3

I find something awkward, let say I have done svn delete at revision 3 (latest), and revision 2 is the changes of the deleted
file at revision 3. When I do svn log, by right it should show all 3 logs, but It only shows for revision 1. It means the svn
log will only shows the log if the file is exist, bare in mind.

How to update the working directory into the latest revision?

svn update
Update to specific revision?
svn update -r 3

I think thats all for normal use of svn commands, further reading at http://svnbook.red-bean.com/.

/home/amol/.mozilla/firefox/2vq5fzb2.default/Cache/Lets get it started….

How to get help with svn?
svn help
This will make svn list all the available functions, to get the function reference, let say checkout

svn help checkout
The same thing goes to other svn related commands, such as svnadmin
svnadmin help

How to create a svn repository?
First of all what is repository? It is a core file for svn, or you can call it a centralized svn backup database.
After created it, it is just a directory with its files.
IMPORTANT! Do NOT try to modify or add something into the repository, unless you know what are you doing.

To create a svn repo, let say I wanna create a repo to store all my programming codes, I do this
svnadmin create /home/mysurface/repo/programming_repo

Remember try to use absolute path for everything, sometimes the relative path is not going to work.

How to import my existing directories into the new repo?
svn import /home/mysurface/programming file:///home/mysurface/repo/programming_repo -m "Initial import"

-m stand for log message, the first revision was created with log as “Initial import”. You need to specified URL for the repo,
URL is the standard argument for svn. Therefore for local file, you need to specified with file://

How to see what is inside the repo?
svn list file:///home/mysurface/repo/programming_repo

Another way of listing all the files and folder in the tree view, I use svnlook

svnlook tree programming_repo

The difference between svn list and svnlook tree is one expect URL another one do not.

How to checkout files from svn repo?
This is the most critical part of svn and also the most common part of svn command line. A lots of open source development
projects provided the way for user to check out their latest code through the internet.

You need to check out in order to commit the changes to svn repo later. Refers back to the previous post, where I import entire
directory /home/mysurface/programming to programming_repo. I am going to checkout to the same folder. If you are skeptical of
doing this, you may want to backup the directory first.

mv programming programming-bk

Now checkout to programming, mkdir is not needed, as svn will create the directory for you if it is doesn’t exist.

svn co file:///home/mysurface/repo/programming_repo programming
co is the shortform of checkout.

Okay, lets just compare both folder with diff and store the result into a file comp.diff

diff programming programming-bk > comp.diff
Diff will list the folder in common, and also the differences. Check comp.diff, as it tracks the additional folder .svn
that only exist in programming/. Again, do NOT modified or delete this folder.

Are you convinced to remove your programming-bk/ ? Make sure you keep the repo safe and you can check out the same data
anytime, at any place.

You can even checkout only a specific file or specific folder from your repo. e.g.

svn co file:///home/mysurface/repo/programming_repo/c/curses/matrix.cc

This will only check out a file at current directory.

Single file can’t be checkout like directories, but you can extract them from repository by svn export

svn export file:///home/mysurface/repo/programming_repo/c/curses/matrix.cc

How to track the changes before commit to repo?
First of all, you track what files had changed,

svn status

It will list files which have changed, with some attributes besides the filename. Common attributes are M, ?, A … M is modified,
A is newly added (how to add refers later section), ? indicate the file is added into local directory but not added into repo.

Secondly, you want to track the differences between the previous revision and the working one. Lets assume color.c has changed,

svn diff color.c

I really don’t like svn diff ’s result. Fortunately, I found a simple bash script what makes vimdiff as the compare tool.
The script was written by Erik C. Thauvin, you can get it from here.

I name it as svndiff and place it at /usr/bin, change the mode to executable.

chmod +x /usr/bin/svndiff

Now, I can simply do this,

svndiff color.c

To close the vimdiff, type :qa

How to commit the changes?
You can commit with -m to place your log message if it is short. But if it is long, I suggest you to make use of your default
editor. I am a vim user, therefore I add a line into my ~/.bashrc

EDITOR=vim

Now I can commit with this:

svn ci

ci is the shortform of commit. Write the log message and close save vim :x, I am done. The same way as checkout, you can choose
to commit one file or any folder.

How to add or delete file to or from repo?
The file won’t be committed if you don’t add it into repo. Therefore you need to add it manually if you want it to goes into
your repo. Let say you wanna add a new file color2.cc

svn add color2.cc

Delete does the same way, if you only delete file at your working directory, it won’t reflects the changes to our repo.

How to check the logs for each revision?
The simplest way is doing just,

svn log

It will list all logs, start from latest revision. That is really irritating! You can limit it to 3 latest revision log by
doing this

svn log --limit 3

If you wanna check for specific revision, specified with -r,

svn log -r 3

I find something awkward, let say I have done svn delete at revision 3 (latest), and revision 2 is the changes of the deleted
file at revision 3. When I do svn log, by right it should show all 3 logs, but It only shows for revision 1. It means the svn
log will only shows the log if the file is exist, bare in mind.

How to update the working directory into the latest revision?

svn update
Update to specific revision?
svn update -r 3