Wednesday, September 10, 2008

Make AdSense For Revenue

AdSense Tip #1: Find your keywords

Make a research the best keywords.

AdSense Tip #2: Improve your keywords

Make sure your keyword density between 3% to 10 %.

AdSense Tip #3: Keep your website focused on a theme

Do not mix you content.Focus on your main theme and content.

AdSense Tip #4: Write a new page every day

Always update your site or blog. If you can,updated daily.

AdSense Tip #5: Choose the right AdSense formatÂ

Choose the best ads format for your site.Top ads format that suggest by google is:

  1. 336×280 large rectangle
  2. 300×250 medium rectangle
  3. 160×600 wide skyscraper

AdSense Tip #6: Color tips

AdSense Tip #7: Position tips

AdSense Tip #8: Increase the number of ads

Maximized the usage of your ads.

AdSense Tip #9: Use images

Use images to combine with your ads.

AdSense Tip #10: Blend

Trace - Debug ( Profiling ) PHP script by xdebug & kcachegrind


Step 1 : Install XDebug

xdebug is an open source debugger available for PHP. xdebug can be used to display more information in error traces. It can also be used to collect detailed code coverage & profiling information.

Installation

You need to install following packages to prepare environment for installation of pecl module xdebug.

sudo apt-get -y install php-pear php5-dev build-essential

Now we install xdebug using pecl.

sudo pecl install xdebug

Above command will download, compile & install xdebug on your system.

Configuration

Open usr/local/lib/php.ini and append following line.

zend_extension=/usr/lib/php5/20060613/xdebug.so

Be careful about that number part in the path as it could be different for you. Now restart apache by issuing following command.

sudo apache2ctl restart

Now try to view the output of your phpinfo() function. If you find xdebug word in that then it means you have successfully installed xdebug.

Stacktrace

xdebug.collect_params accepts values from 1 to 4. Where 1 refers to less verbosity & 4 refers to maximum verbosity while displaying stack traces. The easiest way of setting this option is using function ini_set in PHP.

ini_set("xdebug.collect_params", )

Get to know more stack trace options of xdebug.

var_dump

Once after installing xdebug when ever you call var_dump you are actually calling xdebug version of var_dump instead of built in one. By default xdebug version of var_dump gives you friendly info on the variables you are trying to inspect.

See more options on controlling display of var_dump information.

Debugging

xdebug uses DBGp protocol to communicate with external debuggers. Vim already has a plugin to do that. So you can debug php scripts from within vim.

Download php debugger plugin for vim.

Step 2: Enable Profiling

You need to add following line to /usr/local/lib/php.ini to enable profiling of php scripts.

xdebug.profiler_enable=1
Now restart the apache server by issuing following command.
sudo apache2ctl restart

When ever you access a php page through apache, xdebug will create a file something like cachegrind.out.15093. xdebug by default uses /tmp directory to dump files which contain profiling information. You can change this target directory by using xdebug option xdebug.profiler_output_dir and you can change result file name by using the option xdebug.profiler_output_name.

See more xdebug profiling options.

Some times you don't want to profile all requests. xdebug provides a selective mechanism to trigger profiling of specific requests. In order to enable this option you have to add following configuration option to php.ini.
xdebug.profiler_enable_trigger=On

Now you can trigger profiling of specific requests by adding XDEBUG_PROFILE to the query part of the GET request.

Analyzing Profiling results

We need to use kcachegrind to analyse profile file results. We can install kcachegrind by issuing following command.

sudo apt-get -y install kcachegrind

Now open your profle result file in kcachegrind and you can visuall inspect which part of your script is eating cpu resources. Callmap & Callgrap provide easy to understand visualizations to find bottlenecks in your script.

Monday, August 4, 2008

SQL Joins by venn diagram

If you have tried to understand how joins work and constantly get confused about what join to use, you just need to keep a simple picture in mind ( I like pictures). I will be explaining joins by referencing a Venn diagram. Have no fear - I won’t get into any of the set theory/math involved. This is just a basic overview to give you an idea the data a particular join will return to you. This is not a technical discussion - just concepts.

We will start with just an empty diagram:
basicvenn.thumbnail.png

The T1 circle represents all the records in table 1. The T2 circle represents all the records in table 2. Notice how there is a bit of overlap of the 2 circles in the middle. Simple right?

I will use red to signify the records that will be returned by a particular join.

INNER JOIN
An inner join only returns those records that have “matches” in both tables. So for every record returned in T1 - you will also get the record linked by the foreign key in T2. In programming logic - think in terms of AND.

venn1.thumbnail.png

OUTER JOIN
An outer join is the inverse of the inner join. It only returns those records not in T1 and T2. “Give me the records that DON’T have a match.” In programming logic - think in terms of NOT AND.

outervenn.thumbnail.png

LEFT JOIN
A left join returns all the records in the “left” table (T1) whether they have a match in the right table or not.

If, however, they do have a match in the right table - give me the “matching” data from the right table as well. If not - fill in the holes with null.

left_venn.thumbnail.png

It should be noted that the same thing is possible with a right join - most people just use a left one.

LEFT OUTER JOIN
A left outer join combines the ideas behind a left join and an outer join. Basically - if you use a left outer join you will get the records in the left table that DO NOT have a match in the right table.

leftOutervenn.thumbnail.png

Again it is noted that the same thing is possible with a right outer join - most people just use a left one.

Theta JOIN
A theta join is the Cartesian product of the 2 tables and not normally what people are looking for - but what they sometimes get by mistake. How many of us have written a join similar to this only to get way more then we were ever expecting.

SELECT t1.*, t2.*
FROM table1 t1, table2 t2
WHERE t1.id = 5;

thetavenn.thumbnail.png

So there you have the basic concepts of joins. Next time you need to use a join and have no clue what to do to get the data you need from the database, draw a picture. It may help you figure out what join to use. Least it helps me.

Thursday, July 31, 2008

Define: HTTP_LOAD in details

Http_load is another cool webserver performance tester that gives simple stats on how your webapp is performing.

How to install in OS X

1. Download from http://www.acme.com/software/http_load/
2. Open terminal, cd to the directory where the archive is and unzip
$ tar -zxvf http_load-12mar2006.tar.gz
3. Move to that directory
$ cd http_load-12mar2006
4. Run
$ make
5. Run
$ make install

Once installed, using http_load for quick benchmarking is really quite straightforward. You call the program,
tell it how many requests to make concurrently, and how long to run (either in number of seconds, or total fetches),
and finally pass in a file full of URLs to request.

Testing

http_load requires at least 3 parameters:

* One start specifier, either -parallel or -rate
-parallel tells http_load to make the specified number of concurrent requests.
-rate tells http_load to start the specified number of new connections each second. If you use the -rate start
specifier, you can specify a -jitter flag parameter that tells http_load to vary the rate randomly by about 10%.
* One end specifier, either -fetches or -seconds
-fetches tells http_load to quit when the specified number of fetches have been completed.
-seconds tells http_load to quit after the specified number of seconds have elapsed.
* A file containing a list of URLs to fetch
The urls parameter specifies a text file containing a list of URLs, one per line. The requested URLs are
chosen randomly from this file.


You’re ready! Open up a text editor and write down the website’s url you want to test (your own preferably),
then cd to the directory where the .txt is and run

To see how many requests your server can take care of over a 100 fetches

Run:

$ http_load -parallel 5 -fetches 100 name_of_file.txt
which means open 5 concurrent connections and fetch the webpage 100 times.

You’ll get something like this:

100 fetches, 5 max parallel, 1.34237e+07 bytes, in 15.842 seconds
134237 mean bytes/connection
6.31234 fetches/sec, 847351 bytes/sec
msecs/connect: 28.9069 mean, 75.011 max, 14.865 min
msecs/first-response: 435.84 mean, 2484.28 max, 96.082 min
93 bad byte counts
HTTP response codes:
code 200 — 100

I highlighted the important bits. At the moment the webserver is capable of handling 6 requests per second and
has a mean average initial latency of 435 milliseconds.

The numbers you’ll want to look at in more detail are “fetches/sec” and “msecs/first-response”.
These are critical in terms of really understanding what your site is doing.

It’s important to note the difference between “benchmarking” and “profiling”. What we’re doing here with http_load
is the former: we’re getting a feel for a specific page’s overall performance. We know that it serves X pages per
second, and generally takes about Y milliseconds to response. What we don’t know yet is why either of these is the
case. You’ll have to dig in more detail into your PHP code and server configuration to determine what to tweak to
bring up your site’s performance to an acceptable level. http_load doesn’t, and can’t, do that for you.

Http_load tells you how your webapp is currently performing allowing you to test it under different conditions,
basically it’s a benchmarking tool juts like httperf. The next step is optimization. Have a look at
the 1st part of Getting Rich with PHP 5 (what a crappy title) by rasmus lerdorf for tools you can use to profile
your code and some tips on optimization. In the example shown he goes from 17 reqs/sec to 1100 reqs/sec .

-------------------------------------------------------------------------------------------------------
$ http_load --h

usage: http_load [-checksum] [-throttle] [-proxy host:port] [-verbose] [-timeout secs] [-sip sip_file]
-parallel N | -rate N [-jitter]
-fetches N | -seconds N
url_file
One start specifier, either -parallel or -rate, is required.
One end specifier, either -fetches or -seconds, is required.

--------------------------------------------------------------------------------------------------------
$ man http_load

NAME
http_load - multiprocessing http test client

SYNOPSIS
http_load [-checksum] [-throttle] [-proxy host:port] [-verbose] [-timeout secs] [-sip sip_file]
[-cipher str] ( -parallel N | -rate N [-jitter] ) ( -fetches N | -seconds N ) url_file

DESCRIPTION
http_load runs multiple http fetches in parallel, to test the throughput of a web server. However unlike
most such test clients, it runs in a single process, so it doesn’t bog down the client machine. It can be
configured to do https fetches as well.

The -checksum flag tells http_load to do checksums on the files fetched, to make sure they came across ok.
The checksums are computed the first time each URL gets fetched, and then recomputed and compared on each
subsequent fetch. Without the -checksum flag only the byte count is checked.

The -throttle flag tells http_load to throttle its consumption of data to 33.6Kbps, to simulate access by
modem users.

The -proxy flag lets you run http_load through a web proxy.

The -verbose flag tells http_load to put out progress reports every minute on stderr.

The -timeout flag specifies how long to wait on idle connections before giving up. The default is 60 seconds.

The -sip flag lets you specify a file containing numeric IP addresses (not hostnames), one per line.
These get used randomly as the *source* address of connections. They must be real routable addresses
on your machine, created with ifconfig, in order for this to work. The advantage of using this
option is you can make one client machine look like a whole bank of machines, as far as the server knows.

The -cipher flag is only available if you have SSL support compiled in. It specifies a cipher set to use.
By default, http_load will negotiate the highest security that the server has available, which is
often higher (and slower) than typical browsers will negotiate. An example of a cipher set might be
"RC4-MD5" - this will run considerably faster than the default. In addition to specifying a raw cipher
string, there are three built-in cipher sets accessible by keywords:
* fastsec - fast security - RC4-MD5
* highsec - high security - DES-CBC3-SHA
* paranoid - ultra high security - AES256-SHA
Of course, not all servers are guaranteed to implement these combinations.

One start specifier, either -parallel or -rate, is required. -parallel tells http_load to keep that
many parallel fetches going simultaneously. -rate tells http_load to start that many new connections each
second. If you use the -rate start specifier, you can also give the -jitter flag, telling http_load to
vary the rate randomly by about 10%.

One end specifier, either -fetches or -seconds, is required. -fetches tells http_load to quit when that
many fetches have been completed. -seconds tells http_load to quit after that many seconds have elapsed.

The url_file is just a list of URLs, one per line. The URLs that get fetched are chosen randomly from this
file.

All flags may be abbreviated to a single letter.

Note that while the end specifier is obeyed precisely, the start specifier is only approximate. If you
use the -rate flag, http_load will make its best effort to start connections at that rate, but may not
succeed. And if you use the -parallel flag, http_load will attempt to keep that many simultaneous connections
going, but may fail to keep up if the server is very fast.

--------------------------------------------------------------------------------------------------------------------

* Note that when you provide a file with a list of URLs make sure that you don't have empty lines in it.
If you do -- the utility won't work complaining:

./http_load: unknown protocol -

* Basic errors
- byte count wrong
- timeout
- parallel may be at most 1021

Solutions:

To remove "byte count wrong" error
install patch and run, to download click here
http://www.lighttpd.net/assets/2007/3/5/http_load-12mar2006-timeout.diff
and to install here is a command
patch -p 1 < http_load-12mar2006-timeout.diff

----------------------------------------------------------------------------------------------------------------
Overviewed by other blogs/forums:

1) http_load does not replicate heavy load; it replicates a DOS attack.

2) It generates N requests every second without waiting for the previous N requests to complete. Actually, it never waits for requests to complete. It kills them so the http server has no where to send data.

3) However this is just HTTP load.. if you have JS running and making database calls after page load I don't think this will help your testing methods.

---------------------------------------------------------------------------------------------------------------
Note :- It's my personnel experience, so if any suggestion or new test cases you have please drop me in comments.

Thursday, July 24, 2008

ALTER TABLE, SELECT AND INNODB

Let's assume you have a 512MB table, and you decide to alter the table to add an index to make queries faster.

How long would you expect this alter to take? Hours? Days?

Even with 7200 RPM-slow disks the alter should of finished in less then 1/2 hour.

I ran across an alter that was running for 4 days-on 512MB datasize. The reason why it ran so long is because there was a SELECT that was running preventing mySQL from performing "rename table", the last leg of the ALTER TABLE process.

Killing that SELECT released the shared lock allowing the alter to finish.


Do not KILL THE ALTER when stuck in this SHARED LOCK STATE; Do NOT then remove the temporary tablespace file "#sql-320f_106f99a2.*".

What will happen if you do remove the #sql* file by hand?

Well for one INNODB will crash the mysql instance saying it could not find the temporary table space. It failed to open it. Then on recovery the original table gets unlinked from the filesystem and you just lost all data for that tablespace.

Why?

Here are roughly the order of events for an alter:

Lock all writes from said table
Make a temporary table #sql - file
Copy all data from the old file to the new file
Do a quick consistency check between the two files
unlink the old file
rename the temp file into the old file name

Each step operates on the data dictionary pointers for the two tables. Issuing a filesystem rm command for the step right before unlink, will cause INNODB to crash and on recovery unlink the old file and of course fail on the rename.

Wednesday, July 16, 2008

Basic SVN Commands

Lets get it started….

How to get help with svn?
svn help
This will make svn list all the available functions, to get the function reference, let say checkout

svn help checkout
The same thing goes to other svn related commands, such as svnadmin
svnadmin help

How to create a svn repository?
First of all what is repository? It is a core file for svn, or you can call it a centralized svn backup database.
After created it, it is just a directory with its files.
IMPORTANT! Do NOT try to modify or add something into the repository, unless you know what are you doing.

To create a svn repo, let say I wanna create a repo to store all my programming codes, I do this
svnadmin create /home/mysurface/repo/programming_repo

Remember try to use absolute path for everything, sometimes the relative path is not going to work.

How to import my existing directories into the new repo?
svn import /home/mysurface/programming file:///home/mysurface/repo/programming_repo -m "Initial import"

-m stand for log message, the first revision was created with log as “Initial import”. You need to specified URL for the repo,
URL is the standard argument for svn. Therefore for local file, you need to specified with file://

How to see what is inside the repo?
svn list file:///home/mysurface/repo/programming_repo

Another way of listing all the files and folder in the tree view, I use svnlook

svnlook tree programming_repo

The difference between svn list and svnlook tree is one expect URL another one do not.

How to checkout files from svn repo?
This is the most critical part of svn and also the most common part of svn command line. A lots of open source development
projects provided the way for user to check out their latest code through the internet.

You need to check out in order to commit the changes to svn repo later. Refers back to the previous post, where I import entire
directory /home/mysurface/programming to programming_repo. I am going to checkout to the same folder. If you are skeptical of
doing this, you may want to backup the directory first.

mv programming programming-bk

Now checkout to programming, mkdir is not needed, as svn will create the directory for you if it is doesn’t exist.

svn co file:///home/mysurface/repo/programming_repo programming
co is the shortform of checkout.

Okay, lets just compare both folder with diff and store the result into a file comp.diff

diff programming programming-bk > comp.diff
Diff will list the folder in common, and also the differences. Check comp.diff, as it tracks the additional folder .svn
that only exist in programming/. Again, do NOT modified or delete this folder.

Are you convinced to remove your programming-bk/ ? Make sure you keep the repo safe and you can check out the same data
anytime, at any place.

You can even checkout only a specific file or specific folder from your repo. e.g.

svn co file:///home/mysurface/repo/programming_repo/c/curses/matrix.cc

This will only check out a file at current directory.

Single file can’t be checkout like directories, but you can extract them from repository by svn export

svn export file:///home/mysurface/repo/programming_repo/c/curses/matrix.cc

How to track the changes before commit to repo?
First of all, you track what files had changed,

svn status

It will list files which have changed, with some attributes besides the filename. Common attributes are M, ?, A … M is modified,
A is newly added (how to add refers later section), ? indicate the file is added into local directory but not added into repo.

Secondly, you want to track the differences between the previous revision and the working one. Lets assume color.c has changed,

svn diff color.c

I really don’t like svn diff ’s result. Fortunately, I found a simple bash script what makes vimdiff as the compare tool.

I name it as svndiff and place it at /usr/bin, change the mode to executable.

chmod +x /usr/bin/svndiff

Now, I can simply do this,

svndiff color.c

To close the vimdiff, type :qa

How to commit the changes?
You can commit with -m to place your log message if it is short. But if it is long, I suggest you to make use of your default
editor. I am a vim user, therefore I add a line into my ~/.bashrc

EDITOR=vim

Now I can commit with this:

svn ci

ci is the shortform of commit. Write the log message and close save vim :x, I am done. The same way as checkout, you can choose
to commit one file or any folder.

How to add or delete file to or from repo?
The file won’t be committed if you don’t add it into repo. Therefore you need to add it manually if you want it to goes into
your repo. Let say you wanna add a new file color2.cc

svn add color2.cc

Delete does the same way, if you only delete file at your working directory, it won’t reflects the changes to our repo.

How to check the logs for each revision?
The simplest way is doing just,

svn log

It will list all logs, start from latest revision. That is really irritating! You can limit it to 3 latest revision log by
doing this

svn log --limit 3

If you wanna check for specific revision, specified with -r,

svn log -r 3

I find something awkward, let say I have done svn delete at revision 3 (latest), and revision 2 is the changes of the deleted
file at revision 3. When I do svn log, by right it should show all 3 logs, but It only shows for revision 1. It means the svn
log will only shows the log if the file is exist, bare in mind.

How to update the working directory into the latest revision?

svn update
Update to specific revision?
svn update -r 3

I think thats all for normal use of svn commands, further reading at http://svnbook.red-bean.com/.

/home/amol/.mozilla/firefox/2vq5fzb2.default/Cache/Lets get it started….

How to get help with svn?
svn help
This will make svn list all the available functions, to get the function reference, let say checkout

svn help checkout
The same thing goes to other svn related commands, such as svnadmin
svnadmin help

How to create a svn repository?
First of all what is repository? It is a core file for svn, or you can call it a centralized svn backup database.
After created it, it is just a directory with its files.
IMPORTANT! Do NOT try to modify or add something into the repository, unless you know what are you doing.

To create a svn repo, let say I wanna create a repo to store all my programming codes, I do this
svnadmin create /home/mysurface/repo/programming_repo

Remember try to use absolute path for everything, sometimes the relative path is not going to work.

How to import my existing directories into the new repo?
svn import /home/mysurface/programming file:///home/mysurface/repo/programming_repo -m "Initial import"

-m stand for log message, the first revision was created with log as “Initial import”. You need to specified URL for the repo,
URL is the standard argument for svn. Therefore for local file, you need to specified with file://

How to see what is inside the repo?
svn list file:///home/mysurface/repo/programming_repo

Another way of listing all the files and folder in the tree view, I use svnlook

svnlook tree programming_repo

The difference between svn list and svnlook tree is one expect URL another one do not.

How to checkout files from svn repo?
This is the most critical part of svn and also the most common part of svn command line. A lots of open source development
projects provided the way for user to check out their latest code through the internet.

You need to check out in order to commit the changes to svn repo later. Refers back to the previous post, where I import entire
directory /home/mysurface/programming to programming_repo. I am going to checkout to the same folder. If you are skeptical of
doing this, you may want to backup the directory first.

mv programming programming-bk

Now checkout to programming, mkdir is not needed, as svn will create the directory for you if it is doesn’t exist.

svn co file:///home/mysurface/repo/programming_repo programming
co is the shortform of checkout.

Okay, lets just compare both folder with diff and store the result into a file comp.diff

diff programming programming-bk > comp.diff
Diff will list the folder in common, and also the differences. Check comp.diff, as it tracks the additional folder .svn
that only exist in programming/. Again, do NOT modified or delete this folder.

Are you convinced to remove your programming-bk/ ? Make sure you keep the repo safe and you can check out the same data
anytime, at any place.

You can even checkout only a specific file or specific folder from your repo. e.g.

svn co file:///home/mysurface/repo/programming_repo/c/curses/matrix.cc

This will only check out a file at current directory.

Single file can’t be checkout like directories, but you can extract them from repository by svn export

svn export file:///home/mysurface/repo/programming_repo/c/curses/matrix.cc

How to track the changes before commit to repo?
First of all, you track what files had changed,

svn status

It will list files which have changed, with some attributes besides the filename. Common attributes are M, ?, A … M is modified,
A is newly added (how to add refers later section), ? indicate the file is added into local directory but not added into repo.

Secondly, you want to track the differences between the previous revision and the working one. Lets assume color.c has changed,

svn diff color.c

I really don’t like svn diff ’s result. Fortunately, I found a simple bash script what makes vimdiff as the compare tool.
The script was written by Erik C. Thauvin, you can get it from here.

I name it as svndiff and place it at /usr/bin, change the mode to executable.

chmod +x /usr/bin/svndiff

Now, I can simply do this,

svndiff color.c

To close the vimdiff, type :qa

How to commit the changes?
You can commit with -m to place your log message if it is short. But if it is long, I suggest you to make use of your default
editor. I am a vim user, therefore I add a line into my ~/.bashrc

EDITOR=vim

Now I can commit with this:

svn ci

ci is the shortform of commit. Write the log message and close save vim :x, I am done. The same way as checkout, you can choose
to commit one file or any folder.

How to add or delete file to or from repo?
The file won’t be committed if you don’t add it into repo. Therefore you need to add it manually if you want it to goes into
your repo. Let say you wanna add a new file color2.cc

svn add color2.cc

Delete does the same way, if you only delete file at your working directory, it won’t reflects the changes to our repo.

How to check the logs for each revision?
The simplest way is doing just,

svn log

It will list all logs, start from latest revision. That is really irritating! You can limit it to 3 latest revision log by
doing this

svn log --limit 3

If you wanna check for specific revision, specified with -r,

svn log -r 3

I find something awkward, let say I have done svn delete at revision 3 (latest), and revision 2 is the changes of the deleted
file at revision 3. When I do svn log, by right it should show all 3 logs, but It only shows for revision 1. It means the svn
log will only shows the log if the file is exist, bare in mind.

How to update the working directory into the latest revision?

svn update
Update to specific revision?
svn update -r 3

Tuesday, June 24, 2008

Prevent form post request from another domain in PHP

HTTP POST request from outside domain is one of the way of attacking your website. A intruder can use JavaScript in other domain or localhost to send the repetitive POST request to your web page containing PHP script. We must prevent this kind of cross domain form posting which might be harmful of our website.

Example of form post a spam

Let’s suppose that, we have a contact form in our website and we’re posting the detail of the form to “contact.php” file. A intruder can use JavaScript in another domain and can send the repetitive post request by placing “http://our-site/contact.php” in the action field of their code and spam our website.

How to check the form being posted from another domain

We can use HTTP_REFERRER server variables to prevent the cross domain form post request. You can look at the example code in PHP below to check the POST request is from the same domain or different domain.

//if example.com is there in HTTP_REFERRER variable
if(strpos($_SERVER['HTTP_REFERER'],’example.com’))
{
//only process operation here
}

HTTP_REFERRER variable is used here to check where the post request came from. Then, along with strpos() function of PHP, we can check weather the HTTP_REFERRER variable contains our domain as a referrer website or not. If the post request is from our domain then only we can execute the remaining code of our page.

Articles for optimizing MySQL queries

Optimization is a complex task because ultimately it requires understanding of the entire system to be optimized. Although it may be possible to perform some local optimizations with little knowledge of your system or application, the mo

Optimization is a complex task because ultimately it requires understanding of the entire system to be optimized. Although it may be possible to perform some local optimizations with little knowledge of your system or application, the more optimal you want your system to become, the more you must know about it.

The following 10 articles explain and give some examples of different ways to optimize MySQL:

  1. Identifying slow MySQL queries
  2. MySQL & PHP Performance Optimization Tips
  3. MySQL Presentations: Optimizing MySQL
  4. MySQL 5 Optimization and Tuning Guide
  5. Survive heavy traffic with your Web server
  6. Tools to use for MySQL Performance - Peter Zaitsev
  7. MySQL Queries Optimization (video) - Peter Zaitsev
  8. Table Design and MySQL Index Details
  9. MySQL Reference Manual: Optimization Overview
  10. MySQL Reference Manual: The MySQL Query Cache

Tuesday, May 13, 2008

Create User Account Example for MySQL

There are three different methods that can be used to create user accounts and they are fairly straightforward.

mysql> use mysql;
Database changed

Method 1: (create user, password - no privileges)
mysql> CREATE USER 'esoft'@'localhost' IDENTIFIED BY '12wer56hi';
Query OK, 0 rows affected (0.16 sec)

--grant certain privileges
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,CREATE,DROP ON *.* TO 'esoft'@'localhost';
Query OK, 0 rows affected (0.00 sec)

--grant all privileges
mysql> GRANT ALL ON *.* TO 'esoft'@'localhost';
Query OK, 0 rows affected (0.00 sec)


Method 2: (create user/password and grant all privileges at one fell swoop)
mysql> GRANT ALL ON *.* TO 'topblog'@'localhost' IDENTIFIED BY 'topblog123';
Query OK, 0 rows affected (0.00 sec)


Method 3: (Insert user, password and privileges in table)
mysql> INSERT INTO user (Host,User,Password)
-> VALUES('localhost','freeads',PASSWORD('adlists123'));
Query OK, 1 row affected, 3 warnings (0.01 sec)

mysql> INSERT INTO user (Host,User,Password,Select_priv,Insert_priv)
-> VALUES('localhost','softhub',PASSWORD('softhub126'),'Y','Y');
Query OK, 1 row affected, 3 warnings (0.00 sec)

mysql> INSERT INTO user (Host,User,Password,Select_priv,Insert_priv,Update_priv,Delete_priv,
-> Create_priv,Drop_priv)
-> VALUES('localhost','freebies',PASSWORD('afreeb456'), 'Y','Y','Y','Y','Y','Y');
Query OK, 1 row affected, 3 warnings (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

Note additional *_priv:
Select_priv
Insert_priv
Update_priv
Delete_priv
Create_priv
Drop_priv
Reload_priv
Shutdown_priv
Process_priv
File_priv
Grant_priv
References_priv
Index_priv
Alter_priv
Show_db_priv
Super_priv
Create_tmp_table_priv
Lock_tables_priv
Execute_priv
Repl_slave_priv
Repl_client_priv
Create_view_priv
Show_view_priv
Create_routine_priv
Alter_routine_priv
Create_user_priv
Event_priv
Trigger_priv

Reset User Account Password Example for MySQL

MySQL provides a utility to handle this fairly straightforward task. Here is the syntax and a couple examples.

mysql> use mysql;
Database Changed
mysql> set password for 'elforum'@'localhost' = password('98m4@9er');
Query OK, 0 rows affected (0.00 sec)

Wednesday, May 7, 2008

Auto Reconnect In Mysql

What is auto reconnect in MySQL?
The MySQL client library can perform an automatic reconnect to the server if it finds that the connection is down when you attempt to send a statement to the server to be executed. In this case, the library tries once to reconnect to the server and send the statement again.
Automatic reconnection can be convenient because you need not implement your own reconnect code, but if a reconnection does occur, several aspects of the connection state are reset and your application will not know about it. This is extremely useful if you long running persistent connections to the database. Also if you have too many sql queries / use connection pool etc. auto reconnect is a handy feature. Let’s look at how to enable / disable auto reconnect in MySQL and what are the side-effects of enabling auto reconnect.

How to enable MySQL client auto reconnect
my_bool reconnect = 1;
mysql_options(&mysql, MYSQL_OPT_RECONNECT, &reconnect);

What are the side-effects of MySQL auto reconnect?
* Any active transactions are rolled back and autocommit mode is reset.
* All table locks are released.
* All TEMPORARY tables are closed (and dropped).
* Session variables are reinitialized to the values of the corresponding variables. This also affects variables that are set implicitly by statements such as SET NAMES.
* User variable settings are lost.
* Prepared statements are released.
* HANDLER variables are closed.
* The value of LAST_INSERT_ID() is reset to 0.
* Locks acquired with GET_LOCK() are released.
* mysql_ping() does not attempt a reconnection if the connection is down. It returns an error instead.

How to disable MySQL client auto reconnect
In view of the side-effects you may want to disable auto reconnect. In MySQL version 5.1 and above auto reconnect is disabled by default. In any version you can disable auto reconnect with the following PHP code:
my_bool reconnect = 0;
mysql_options(&mysql, MYSQL_OPT_RECONNECT, &reconnect);

Tuesday, May 6, 2008

Optimising MYSQL

DO NOT CHANGE your setting without understanding what you do.
Make a backup of your my.cnf before editing the /etc/my.cnf
DO THESE CHANGES AT YOUR OWN RISKS.


My thread is only to help you to more well understand how tuning mysql.

Optimising mysql is very well commented on the net, and you’ll find huge information on how to do this. There is never “best parameters”, the best parameters is those fits your needs, box hardware, mysql usage…
So, I’ll not give the best parameters but rather how to define these ones. Make some tests, and you’ll quickly find your own parameters.

I’ll give you at the end of this post some web pointers which may help you.

There a lot of available parameters but only few one are very important to tweak your mysql box.


The most important variables are (for me, and it is not exhaustive)
- max_connections
- wait_timeout
- thread_cache_size
-
- table_cache
-
- key_buffer_size
- query_cache_size
- tmp_table_size

First of all, how to find your variable, and the mysql usage ?

*VARIABLES
from mysql :
show variables;

or from command line :
mysqladmin variables

*PROCESS / STATUS
from Mysql :
show status;

or from command line
mysqladmin –i10 processlist extended-status

*SOME USEFUL COMMAND FOR YOU BOX USAGE
>Top

>ps –axfu

>vmstat 1

* OPTIMISING MYSQL

To obtain the stat of your mysql server since it has been loaded, run mysqladmin processlist extended-status as mentionned above.

1 - The two most important variables : Table_cache and Key_buffer_size

* If Opened_tables is big, then your table_cache variable is probably
too small.

table_cache 64
Open_tables 64
Opened_tables 544468

This is the first serious problem. "The table_cache is the number of open
tables for all threads. MySQL, being multi-threaded, may be running many
queries on the table at one time, and each of these will open a table."
Therefore, even though we only have a few tables, we will need many more
open_tables.

The Opened_tables value is high and shows the number of
cache misses. Getting the table_cache size correct is one of the two best
things you can do to improve performance.



* If Key_reads is big, then your key_buffer_size variable is probably
too small. The cache hit rate can be calculated with
Key_reads/Key_read_requests.

key_buffer_size 16M
Key_read_requests 2973620399
Key_reads 8490571
(cache hit rate = 0.0028)

“The key_buffer_size affects the size of the index buffers and the speed
of index handling, particularly reading." The MySQL manual (and other sources) say that
"Key_reads/Key_read_request ratio should normally be ".

2 - Others important settings are : Wait_timeout, max_connexion, thread_cache

A little explanation :
Generaly you have a lot of mysql process that are sleeping because wait_timeout are not set low. So I make sure that the wait_timeout is set to a very low value: 15 seconds (for me) . That means MySQL would close any connection that was idle for more than 15 seconds.

The problem is you also have to increment your max_connexion (mine is set to 300) to be sure there is not a lot of idle clients holding connections and blocking out new clients from connecting and getting real work done.
The pbm is that the box has to create new threads (MySQL is a multi-threaded server) at a very high rate. That may sucks up a measurable amount of CPU time.

So the solution is to use the Thread_cache (from mysql doc) :
“How many threads we should keep in a cache for reuse. When a client disconnects, the client's threads are put in the cache if there aren't more than thread_cache_size threads from before. All new threads are first taken from the cache, and only when the cache is empty is a new thread created. This variable can be increased to improve performance if you have a lot of new connections. (Normally this doesn't give a notable performance improvement if you have a good thread implementation.) By examing the difference between the Connections and Threads_created you can see how efficient the current thread cache is for you.”


* If Threads_created is big, you may want to increase the
thread_cache_size variable. The cache hit rate can be calculated with
Threads_created/Connections.

thread_cache_size 0
Threads_created 150022
Connections 150023

This is the second problem that should be fixed. A cache size of zero is the default for my-medium.cnf but the recommended size in my-large.cnf is 8.

you may try this formula : table_cache = opened table / max_used_connection


3 - Finally, you may also have a look at : tmp_table_size and Handler_read_rnd / Handler_read_rnd_next

* If Created_tmp_disk_tables is big, you may want to increase the
tmp_table_size variable to get the temporary tables memory-based instead
of disk based.


tmp_table_size 32M
Created_tmp_disk_tables 3227
Created_tmp_tables 159832
Created_tmp_files 4444

Created_tmp_disk_tables are the "number of implicit temporary tables on
disk created while executing statements" and Created_tmp_tables are
memory-based. Obviously it is bad if you have to go to disk instead of
memory. About 2% of temp tables go to disk, which doesn't seem too bad
but increasing the tmp_table_size probably couldn't hurt either.


* If Handler_read_rnd is big, then you probably have a lot of queries
that require MySQL to scan whole tables or you have joins that don't use
keys properly.

Handler_read_rnd 27712353
Handler_read_rnd_next 283536234

These values are high, that we could probably stand to improve
the indexes and queries.


I hope this will help some of you to more understand how it is possible to optimise MYSQL to fit your needs, hardaware box, or mysql current usage.

I forgot to tell you two or three importants things like :

Used MySQL memory = key_buffer + max_connections * (join_buffer + record_buffer + sort_buffer + thread_stack + tmp_table_size)

Notice the max_connexion and the multiplier.
connexion increase = memory usage increase too.

Notice key_buffer
for a given memory :
more you add mem to key buffer, less connexion is
less is key buffer, more connexion is

If you change one of these settings for a high value, you system may swap.
If you system swap, try lot decrease these values


Also, about table_cache :
Increasing the size of the table cache may really help you.
But you must be careful not to make the value too large. All operating systems have a limit on the number "open file pointer" (sorry in french it is called pointer, maybe descriptors is the good translation) a single process may have.
If MySQL tries to open a lot of files, the OS may refuse it and MySQL will generate error message in the error log

Mysql: HowTo

How MySQL Uses Indexes

Indexes are used to find rows with specific column values fast. Without an index, MySQL has to start with the first record and then read through the whole table to find the relevant rows. The bigger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. If a table has 1,000 rows, this is at least 100 times faster than reading sequentially. Note that if you need to access almost all 1,000 rows, it is faster to read sequentially, because that minimizes disk seeks.

Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees. Exceptions are that indexes on spatial column types use R-trees, and MEMORY (HEAP) tables support hash indexes.

Strings are automatically prefix- and end-space compressed.

In general, indexes are used as described in the following discussion. Characteristics specific to hash indexes (as used in MEMORY tables) are described at the end of this section.

  • To quickly find the rows that match a WHERE clause.

  • To eliminate rows from consideration. If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows.

  • To retrieve rows from other tables when performing joins.

  • To find the MIN() or MAX() value for a specific indexed column key_col. This is optimized by a preprocessor that checks whether you are using WHERE key_part_# = constant on all key parts that occur before key_col in the index. In this case, MySQL will do a single key lookup for each MIN() or MAX() expression and replace it with a constant. If all expressions are replaced with constants, the query will return at once. For example:
    SELECT MIN(key_part2),MAX(key_part2)
    FROM tbl_name WHERE key_part1=10;
  • To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable key (for example, ORDER BY key_part1, key_part2). If all key parts are followed by DESC, the key is read in reverse order. See Section 6.2.9, "How MySQL Optimizes ORDER BY."

  • In some cases, a query can be optimized to retrieve values without consulting the data rows. If a query uses only columns from a table that are numeric and that form a leftmost prefix for some key, the selected values may be retrieved from the index tree for greater speed:
    SELECT key_part3 FROM tbl_name WHERE key_part1=1

Suppose that you issue the following SELECT statement:

mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer tries to find the most restrictive index by deciding which index will find fewer rows and using that index to fetch the rows.

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL can't use a partial index if the columns don't form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here:

SELECT * FROM tbl_name WHERE col1=val1;
SELECT * FROM tbl_name WHERE col2=val2;
SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;

If an index exists on (col1, col2, col3), only the first of the preceding queries uses the index. The second and third queries do involve indexed columns, but (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).

An index is used for columns that you compare with the =, >, >=, <, <=, or BETWEEN operators.

MySQL also uses indexes for LIKE comparisons if the argument to LIKE is a constant string that doesn't start with a wildcard character. For example, the following SELECT statements use indexes:

SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
SELECT * FROM tbl_name WHERE key_col LIKE 'Pat%_ck%';

In the first statement, only rows with 'Patrick' <= key_col < 'Patricl' are considered. In the second statement, only rows with 'Pat' <= key_col < 'Pau' are considered.

The following SELECT statements will not use indexes:

SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
SELECT * FROM tbl_name WHERE key_col LIKE other_col;

In the first statement, the LIKE value begins with a wildcard character. In the second statement, the LIKE value is not a constant.

MySQL 4.0 and up performs an additional LIKE optimization. If you use ... LIKE '%string%' and string is longer than three characters, MySQL will use the Turbo Boyer-Moore algorithm to initialize the pattern for the string and then use this pattern to perform the search quicker.

Searching using col_name IS NULL will use indexes if col_name is indexed.

Any index that doesn't span all AND levels in the WHERE clause is not used to optimize the query. In other words, to be able to use an index, a prefix of the index must be used in every AND group.

The following WHERE clauses use indexes:

... WHERE index_part1=1 AND index_part2=2 AND other_column=3
/* index = 1 OR index = 2 */
... WHERE index=1 OR A=10 AND index=2
/* optimized like "index_part1='hello'" */
... WHERE index_part1='hello' AND index_part3=5
/* Can use index on index1 but not on index2 or index3 */
... WHERE index1=1 AND index2=2 OR index1=3 AND index3=3;

These WHERE clauses do not use indexes:

  /* index_part1 is not used */
... WHERE index_part2=1 AND index_part3=2
/* Index is not used in both AND parts */
... WHERE index=1 OR A=10
/* No index spans all rows */
... WHERE index_part1=1 OR index_part2=10

Sometimes MySQL will not use an index, even if one is available. One way this occurs is when the optimizer estimates that using the index would require MySQL to access a large percentage of the rows in the table. (In this case, a table scan is probably much faster, because it will require many fewer seeks.) However, if such a query uses LIMIT to only retrieve part of the rows, MySQL will use an index anyway, because it can much more quickly find the few rows to return in the result.

Hash indexes have somewhat different characteristics than those just discussed:

  • They are used only for = or <=> comparisons (but are very fast).

  • The optimizer cannot use a hash index to speed up ORDER BY operations. (This type of index cannot be used to search for the next entry in order.)

  • MySQL cannot determine approximately how many rows there are between two values (this is used by the range optimizer to decide which index to use). This may affect some queries if you change a MyISAM table to a hash-indexed MEMORY table.

  • Only whole keys can be used to search for a row. (With a B-tree index, any prefix of the key can be used to find rows.)

The MyISAM Key Cache

To minimize disk I/O, the MyISAM storage engine employs a strategy that is used by many database management systems. It exploits a cache mechanism to keep the most frequently accessed table blocks in memory:

  • For index blocks, a special structure called the key cache (key buffer) is maintained. The structure contains a number of block buffers where the most-used index blocks are placed.

  • For data blocks, MySQL uses no special cache. Instead it relies on the native operating system filesystem cache.

This section first describes the basic operation of the MyISAM key cache. Then it discusses changes made in MySQL 4.1 that improve key cache performance and that enable you to better control cache operation:

  • Access to the key cache no longer is serialized among threads. Multiple threads can access the cache concurrently.

  • You can set up multiple key caches and assign table indexes to specific caches.

The key cache mechanism also is used for ISAM tables. However, the significance of this fact is on the wane. ISAM table use has been decreasing since MySQL 3.23 when MyISAM was introduced. MySQL 4.1 carries this trend further; the ISAM storage engine is disabled by default.

You can control the size of the key cache by means of the key_buffer_size system variable. If this variable is set equal to zero, no key cache is used. The key cache also is not used if the key_buffer_size value is too small to allocate the minimal number of block buffers (8).

When the key cache is not operational, index files are accessed using only the native filesystem buffering provided by the operating system. (In other words, table index blocks are accessed using the same strategy as that employed for table data blocks.)

An index block is a contiguous unit of access to the MyISAM index files. Usually the size of an index block is equal to the size of nodes of the index B-tree. (Indexes are represented on disk using a B-tree data structure. Nodes at the bottom of the tree are leaf nodes. Nodes above the leaf nodes are non-leaf nodes.)

All block buffers in a key cache structure are the same size. This size can be equal to, greater than, or less than the size of a table index block. Usually one of these two values is a multiple of the other.

When data from any table index block must be accessed, the server first checks whether it is available in some block buffer of the key cache. If it is, the server accesses data in the key cache rather than on disk. That is, it reads from the cache or writes into it rather than reading from or writing to disk. Otherwise, the server chooses a cache block buffer containing a different table index block (or blocks) and replaces the data there by a copy of required table index block. As soon as the new index block is in the cache, the index data can be accessed.

If it happens that a block selected for replacement has been modified, the block is considered "dirty." In this case, before being replaced, its contents are flushed to the table index from which it came.

Usually the server follows an LRU (Least Recently Used) strategy: When choosing a block for replacement, it selects the least recently used index block. To be able to make such a choice easy, the key cache module maintains a special queue (LRU chain) of all used blocks. When a block is accessed, it is placed at the end of the queue. When blocks need to be replaced, blocks at the beginning of the queue are the least recently used and become the first candidates for eviction.


Key Cache Block Size

MySQL 4.1 introduces a new key_cache_block_size variable on a per-key cache basis. This variable specifies the size of the block buffers for a key cache. It is intended to allow tuning of the performance of I/O operations for index files.

The best performance for I/O operations is achieved when the size of read buffers is equal to the size of the native operating system I/O buffers. But setting the size of key nodes equal to the size of the I/O buffer does not always ensure the best overall performance. When reading the big leaf nodes, the server pulls in a lot of unnecessary data, effectively preventing reading other leaf nodes.

Currently, you cannot control the size of the index blocks in a table. This size is set by the server when the .MYI index file is created, depending on the size of the keys in the indexes present in the table definition. In most cases, it is set equal to the I/O buffer size. In the future, this will be changed and then key_cache_block_size variable will be fully employed.


Restructuring a Key Cache

A key cache can be restructured at any time by updating its parameter values. For example:

mysql> SET GLOBAL cold_cache.key_buffer_size=4*1024*1024;

If you assign to either the key_buffer_size or key_cache_block_size key cache component a value that differs from the component's current value, the server destroys the cache's old structure and creates a new one based on the new values. If the cache contains any dirty blocks, the server saves them to disk before destroying and re-creating the cache. Restructuring does not occur if you set other key cache parameters.

When restructuring a key cache, the server first flushes the contents of any dirty buffers to disk. After that, the cache contents become unavailable. However, restructuring does not block queries that need to use indexes assigned to the cache. Instead, the server directly accesses the table indexes using native filesystem caching. Filesystem caching is not as efficient as using a key cache, so although queries will execute, a slowdown can be anticipated. Once the cache has been restructured, it becomes available again for caching indexes assigned to it, and the use of filesystem caching for the indexes ceases.

How MySQL Counts Open Tables

When you execute a mysqladmin status command, you'll see something like this:

Uptime: 426 Running threads: 1 Questions: 11082
Reloads: 1 Open tables: 12

The Open tables value of 12 can be somewhat puzzling if you have only six tables.

MySQL is multi-threaded, so there may be many clients issuing queries for a given table simultaneously. To minimize the problem with multiple client threads having different states on the same file, the table is opened independently by each concurrent thread. This takes some memory but normally increases performance. With MyISAM tables, one extra file descriptor is required for the data file for each client that has the table open. (By contrast, the index file descriptor is shared between all threads.) The ISAM storage engine shares this behavior.

You can read more about this topic in the next section. See Section 6.4.8, "How MySQL Opens and Closes Tables."

How MySQL Opens and Closes Tables

The table_cache, max_connections, and max_tmp_tables system variables affect the maximum number of files the server keeps open. If you increase one or more of these values, you may run up against a limit imposed by your operating system on the per-process number of open file descriptors. Many operating systems allow you to increase the open-files limit, although the method varies widely from system to system. Consult your operating system documentation to determine whether it is possible to increase the limit and how to do so.

table_cache is related to max_connections. For example, for 200 concurrent running connections, you should have a table cache size of at least 200 * N, where N is the maximum number of tables in a join. You also need to reserve some extra file descriptors for temporary tables and files.

Make sure that your operating system can handle the number of open file descriptors implied by the table_cache setting. If table_cache is set too high, MySQL may run out of file descriptors and refuse connections, fail to perform queries, and be very unreliable. You also have to take into account that the MyISAM storage engine needs two file descriptors for each unique open table. You can increase the number of file descriptors available for MySQL with the --open-files-limit startup option to mysqld_safe. See Section A.2.17, "File Not Found."

The cache of open tables will be kept at a level of table_cache entries. The default value is 64; this can be changed with the --table_cache option to mysqld. Note that MySQL may temporarily open even more tables to be able to execute queries.

An unused table is closed and removed from the table cache under the following circumstances:

  • When the cache is full and a thread tries to open a table that is not in the cache.

  • When the cache contains more than table_cache entries and a thread is no longer using a table.

  • When a table flushing operation occurs. This happens when someone issues a FLUSH TABLES statement or executes a mysqladmin flush-tables or mysqladmin refresh command.

When the table cache fills up, the server uses the following procedure to locate a cache entry to use:

  • Tables that are not currently in use are released, in least recently used order.

  • If a new table needs to be opened, but the cache is full and no tables can be released, the cache is temporarily extended as necessary.

When the cache is in a temporarily extended state and a table goes from a used to unused state, the table is closed and released from the cache.

A table is opened for each concurrent access. This means the table needs to be opened twice if two threads access the same table or if a thread accesses the table twice in the same query (for example, by joining the table to itself). Each concurrent open requires an entry in the table cache. The first open of any table takes two file descriptors: one for the data file and one for the index file. Each additional use of the table takes only one file descriptor, for the data file. The index file descriptor is shared among all threads.

If you are opening a table with the HANDLER tbl_name OPEN statement, a dedicated table object is allocated for the thread. This table object is not shared by other threads and is not closed until the thread calls HANDLER tbl_name CLOSE or the thread terminates. When this happens, the table is put back in the table cache (if the cache isn't full).

You can determine whether your table cache is too small by checking the mysqld status variable Opened_tables:

mysql> SHOW STATUS LIKE 'Opened_tables';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Opened_tables | 2741 |
+---------------+-------+

If the value is quite big, even when you haven't issued a lot of FLUSH TABLES statements, you should increase your table cache size. See Section 4.2.3, "Server System Variables," and Section 4.2.4, "Server Status Variables."

Drawbacks to Creating Many Tables in the Same Database

If you have many MyISAM or ISAM tables in a database directory, open, close, and create operations will be slow. If you execute SELECT statements on many different tables, there will be a little overhead when the table cache is full, because for every table that has to be opened, another must be closed. You can reduce this overhead by making the table cache larger.

Wednesday, April 2, 2008

GET or POST

From Stephen Chapman


When you use Ajax to access the server without reloading the web page you have two choices on how to pass the information for the request to the server. These two options are to use GET or POST.

These are the same two options that you have when passing requests to the server to load a new page with two differences. The first difference is that you are only requesting a small piece of information instead of an entire web page. The second and most noticable difference is that since the Ajax request doesn't appear in the address bar there is no noticable difference that your visitor will see on their screen when the request is made. Calls made using GET will not expose the fields and their values anythere that using POST does not also expose when the call is made from Ajax.

So how should we make the choice as to which of these two alternatives that we should use?

A mistake that some beginners might make is to use GET for most of their calls simply because it is the easier of the two to code.

The most noticable difference between GET and POST calls in Ajax is that GET calls still have the same limit on the amount of data that can be passed as applies when requesting a new page load. The only difference is that as you are only processing a small amount of data with an Ajax request (or at least that's how you should use it) you are far less likely to run into this length limit from within Ajax to what you are with complete web page loads. A beginner may reserve using POST requests for the few instances where they do need to pass more information that the GET method allows.

The best solution when you have lots of data to pass like that is to make multiple Ajax calls passing a few pieces of information at a time. If you are going to pass huge amounts of data all in the one Ajax call then you would probably be better off simply reloading the entire page since there will be no significant difference in the processing time when huge amounts of data are involved.

So if the amount of data to be passed isn't a good reason to use for choosing between GET and POST then what should we use to decide which to use? These two methods were in fact set up for entirely different purposes and the differences between how they work are in part due to the difference in what they are intended to be used for. This not only applies to using GET and POST from Ajax but applies to these methods where ever they are used.

The purpose of GET is as its name implies - to GET information. It is intended to be used when you are reading information to display on the page. Browsers will cache the result from a GET request and if the same GET request is made again then they will display the cached result rather than rerunning the entire request. This is not a flaw in the browser processing but is deliberately designed to work that way so as to make GET calls more efficient when the calls are used for their intended purpose. A GET call is retrieving data to display in the page and data is not expected to be changed on the server by such a call and so re-requesting the same data should be expected to obtain the same result.

The POST method is intended to be used where you are updating information on the server. Such a call is expected to make changes to the data stored on the server and the results returned from two identical POST calls may very well be completely different from one another since the initial values before the second POST call will be differentfrom the initial values before the first call because the first call will have updated at least some of those values. A POST call will therefore always obtain the response from the server rather than keeping a chached copy of the prior response.

So rather than choosing between GET and POST based on the amount of data that you are passing in your Ajax call, you should select between them based on what the Ajax call is actually doing. If the call is to retrieve data from the server then use GET. If the value to be retrieved is expected to vary over time as a result of other processes updating it then add a current time parameter to what you are passing in your GET call so that the later calls will not use an earlier cached copy of the result that is no longer correct. If your call is going to write any data at all to the server then use POST.

In fact you should not only use this criteria for selecting between GET and POST for your Ajax calls. You should use this as the criteria for choosing whether to use GET or POST when processing forms on your web page as well.

Monday, March 17, 2008

9 Practical Ways to Enhance your Web Development Using the Firefox Web Developer Extension

Whether you’re a front-end graphics designer or a back-end web programmer, if you’ve worked long enough in the field of creating web-based solutions, you’ve no doubt heard about an extension for the Mozilla Firefox web browser called (simply enough) the Web Developer extension. If you have no clue what I’m talking about, here’s a brief overview from Webs Tips to get you familiarized with this wonderful tool.

A screenshot of the Mozilla Firefox Web Developer tool

This article lists some practical, everyday uses of the Web Developer extension to help improve your web-building methods. I’ve tried to stay away from the more common and basic uses of the Web Developer extension like troubleshooting layout issues with the Information > Display Div Order option because I feel these have been discussed quite enough in other places. New users, don’t run away quite yet, I think this guide will help you get a rapid jump start into applying this tool into your daily development routine.

So without further ado, here’s nine highly pragmatic uses of the Web Developer extension for Firefox.

1) Change XHTML on-the-fly without changing your web files.

Unfortunately for many developers, we don’t all have the luxury of testing servers and sandbox environments. I for one, confess to developing on live websites even during peak web traffic times.

If you’d like to lessen customer support requests due to an inadvertent display:none; property assignment on the log-in box — use the Web Developer extension to effortlessly check your XHTML modifications before you commit them to the server.

Here’s an (extreme) example of how I was able to change a few of reddit’s XHTML markup.

The original front page:

Screenshot of reddit's front page before editting XHTML markup.

And here’s the modified version:

Screenshots of reddit's front page after changing some XHTML markup.

As you can see in the above picture, I changed the top three stories (to something else I’d much read about) and modified the background color to pink (I have an odd affection towards hot pink for some reason).

You can achieve the same results by using the Miscellaneous > Edit HTML Markup option which will open up the Edit HTML tab panel displaying the XHTML of the web page. Unfortunately, the window isn’t color-coded and the Search HTML function doesn’t quite work properly (yet).

A screenshot of the Edit HTML Panel, Displayed on the left of the page.

Tip: You can change the position of the Edit HTML panel by clicking on the Position icon (right next to the Edit HTML tab on the above screenshot).

To change the CSS styles of the page, use the CSS > Edit CSS option, which will allow you to edit the styles used on the web page.

2) Measure things quickly with the Ruler Tool.

Raise your hand if you’ve ever print-screen’ed, and then copy-&-paste’d the screenshot onto Photoshop just to determine dimensions of certain page objects (like the width of an image) with the selection tool. *Raises hand in shame*

With the Ruler Tool (enable it via Miscellaneous > Display Ruler Tool), you can speedily size-up objects inside the web browser. It’s a great tool in conjunction with outline options such as Information > Display Div Order option or Information > Display Block Size option, allowing you to detect the amount of padding and margin between elements.

Screenshot of the Mozilla Firefox Web Developer extension Ruler Tool.

3) See how web pages look on a non-traditional web browser.

Nowadays, tons of people have mobile devices that lets them view web pages in non-traditional ways. Determine whether your pages render correctly (or close enough) on portable device screens by using the Miscellaneous > Small Screen Rendering option. This saves you from going out and purchasing a Blackberry or a Trio with an internet dataplan just for cross-browser checking.

How the Gamespot website looks on normal browsers:

A screenshot of Gamespot.com viewed through Mozilla Firefox web browser.

What it will look like on a Small Screen Rendering device…

A screenshot of Gamespot.com rendered in a Small Screen Rendering Device as simulated by Mozilla Firefox Web Developer extension.

4) Find out how optimized your page is.

Use the Tools > View Speed Report option to automatically send your page to WebSiteOptimization.com, a site that provides a plethora of information about your web page load times like how quickly your page loads and how many HTTP connections are being used among a ton of other things.

There are built-in tools in Adobe Dreamweaver and Flash (if you even have access to them) that simulates download speeds, but nothing beats a free, comprehensive and actual live speed report.

Screenshot of the result of Six Revision's front page speed report from Web Optimizer

5) Populate web form fields instantly.

Don’t you hate it when you have to fill in your custom-built web form for the nth time because you’re testing it? You can quit tabbing and entering junk information on your form fields and switch to using the Form > Populate Form Fields option in the Web Developer extension.

In the example below, you can see that it populates most web forms somewhat intelligently – It was able to guess the email field — but missed the phone number field.

Screenshot of eBay's registration form automatically filled about using the Forms - Populate Form Fields option of Mozilla Firefox Web Developer extension.

6) Find all the CSS styles that affect an element.

For most fairly-proficient CSS developers, it’s quite easy to find the exact selectors that style an element’s properties - fyi: #selector { property: value; }. This is especially true when you’re the original author and/or the styles are contained in one stylesheet.

But what if you were working on someone else’s project… and the project in question has 1,000+ lines of pure CSS goodness, split into several external stylesheets (because Bob a.k.a. “Mr. Modularity” likes to keep things “simple“)? Another scenario you might encounter is being tasked to theme a content management system like Drupal or Wordpress and you’re not quite sure where all the external stylesheets are.

For example, the Yahoo! home page has over 2,400 lines of CSS, spread over several external stylesheets and inline styles (Bob, you built this page didn’t you?).

Screenshot of Yahoo! front page with CSS - View Style Information of Mozilla Firefox Web Developer extension being used.

If you’re tasked with revising this page, you have two choices: (1) look through, understand, and hunt down the styles you need or (2) decide that you’re smarter (and lazier) than that and so you use the CSS > View Style Information option of the Web Developer extension. With this option enabled, clicking on a page element opens up the Style Information panel which displays all the styles that affect the element.

7) View JavaScript and CSS source code in a jiffy.

One of the ways I troubleshoot rendering issues is by looking at how other web pages do it. JavaScript and CSS are often divided into several files — who wants to look through all of them?

Using the Information > View JavaScript and the CSS > View CSS options instantly displays all the JavaScript and CSS in a new browser tab. This has the side benefit of being able to aggregate all the CSS styles or JavaScript in one web page allowing you to use the Find tool of the Mozilla Firefox browser (keyboard shortcut: ctrl + f for PC users).

8) See how web pages are layered.

It’s often very helpful to determine which page div’s and objects are on a higher plane. Using the Information > View Topographic information gives you a visual representation of the depths of the page elements — darker shades are lower than lighter shades of gray.

Original web design…

Screenshot of before using View Topography Information.

Using the Topographic Information option renders the page to this:

Screenshot of a webpage with Information - View Topographic Information enabled.

9) See if your web page looks OK in different screen sizes.

I use a monitor size between 19 – 22 inches (wide screen). This can be problematic because many of our visitors use smaller monitors. Short of switching to a smaller LCD screen to simulate the user experience, I just use the Resize > Resize window option. It helps test whether my fluid layout works well in smaller windows (sometimes you forget to set min-widths for div elements and it jacks up the layout in smaller screen sizes), or if your fixed-width layout displays important content without user’s having to scroll.

Be sure to enable the Resize > Display Window Size in Title option to help you determine the exact dimensions, and also for documentation purposes when you’re taking screenshots of your webpages.

Screen shot of drupal.org with the width of the page set to 800 pixels.

So there we are, nine ways you can employ the Mozilla Firefox Web Developer extension to better your web development experience. I don’t claim to be an expert, but I certainly know enough about the Web Developer extension to improve my web-building speed.

Do you have other tips and strategies on how to further utilize the Web Developer extension? What are the ways you use Web Developer extension in your job? Share them here.

Related links:

Making Secure PHP Applications

There are 2 basic types of attacks that a cracker will try to gain access you don’t really want him to have. This lesson runs though what the cracker does and how you can fight against. This is really anything but a definitive guide to security, there is no possible way to cover security in a 4 page article. But this is a good start to security. To conclude the introduction, XSS attacks are too broad of a range and wont be covered here.

Basics
Generally, scripts that don’t have their source revealed to the public are harder to crack. Scripts which you can get the source from always have to take more precautions. Either way, the same precautions should be taken, not giving the source doesn’t make it uncrackable. Security is extremely important, I learned that when I had a bug in img911’s script that allowed php files to be uploaded. There was a script that gave the uploader full control of my files, the site was only a week old! People will come to your site and try to gain access if they can. It is a matter of time before it happens to you, are you ready?

Attack One: SQL Injection Attacks
What it is
This is by a good margin the most common type of attack because of its shear power, that and its easy to do. SQL injection attacks inject commands via user imputed data that could cause damage to your database.

How it works
SQL injection attacks happen when you modify data that is being sent into a database, for instance
showimage.php?id=1
You could change that to
showimage.php?id=1′; DELETE FROM images WHERE 1;
That would create an error for the sql if there is any command passed the WHERE clause, but the DELETE command would run and work. This gives the cracker full control of your database, anything you could do with mysql_query, he can do. He doesnt just have to use get data, he can use POST data that is being sent from a form, he can also edit cookies that the site uses.

How to prevent
There are a few ways to prevent an SQL attack.

Method 1, Clean the data
The first way is to strip slashes, quotes and other things that have no legit purpose in the query. THIS IS NECESSARY IN ANYTHING THAT IS USER INPUTED AND WILL BE USED WITH A DATABASE! User inputted data is anything that can be edited from the outside, GET data (.php?getdata=data), user inputted POST data and cookies can all be edited by a user. Anything coming from those must be cleaned or your script is not safe. I use this function to clean my data

function sql_safe($value)
{
// Stripslashes
if (get_magic_quotes_gpc())
{
$value = stripslashes($value);
}

// Quote if not integer
if (!is_numeric($value) || $value[0] == ‘0′)
{
$value = mysql_real_escape_string($value);
}
return $value;
}

Just make all your user inputed data data like this

$var = sql_safe($_GET[“data”]);

This way, all invalid data is stripped out and your database is safe. This method is not an option, every application you make must have this or equivalent code, this is the only sure fire method that cant be hacked. If you do this with all your data, your site is safe from SQL injection attacks. The next two methods are icing on the cake, not hacker proof methods.

Method 2, Table prefixes
Fact of life is everyone makes mistakes, chances are pretty high you will forget to clean one user inputted data down the road, as small as the chance is, a cracker might find it. Most programs use standard names for their database, such as in a forum the table that holds the posts would be named “posts”. Crackers know this and will try every relevant name to the type of site it is. The best way to get around this is to add a prefix to your table name, instead of “posts”, make it “forum_posts”, even a common prefix like that makes it a good deal harder to hack. I use the first 3 letters of my control panel login name as my db prefix. Do not rely on this method, it just makes it harder for a cracker to get in should you miss a step.

Method 3, Don’t give sql user delete rights
This method is anything but a strict guideline, most of the scripts I make require deleting rows. But if it isn’t needed and you don’t have to delete rows, don’t give the user permission to. This will make it so that if you forgot to clean and he got the prefix, he can’t delete anything. Use it when you can, but don’t change a script to cater to this, it is little more then a final precaution.

Attack 2, forged data

What it is
Forged data is when you edit a cookie to make yourself look like an admin. The only way to let this happen is flawed design, generally static data or not confirming the data is question. As rare as it may seem, it is an error Ive seen allot, even in some scripts that are for sale. This one is more obscure, but it can happen if you encounter the wrong person. This can be worse then sql injection attacks because it is less apparent; you don’t have to destroy anything. Lastly, you normally have to have access to the script to do this, but not always.

How its done
Cookies can be edited easily, they are just text files on your computer. A cracker will go in and change the data to imitate what the script thinks is an admin. In firefox, to view the cookie all you have to do it go to tools -> options -> privacy -> view cookies. To change them you have to shut the browser off and go to C:\Documents and Settings\name\Application Data\Mozilla\Firefox\Profiles\profile name\cookies.txt

All your cookies reside in there, it is not encrypted so you can change anything you want to.

How to protect it

A lesson in how to make secure login cookies
What many scripts do is lay cookies like this
cookie 1:
username

cookie 2:
pass

cookie 3:
rank

This makes it easy to see if they are an admin or not, but there is one huge problem, all you have to do is change the rank cookie to what it looks for with an admin and you are in. This is a flawed design, what you have to do is this

cookie 1:
user ID

cookie 2:
user pass, encrypted via sha1 (you use sha1 when storing it in the db, right?)

When it sees a cookie, it will go into the database and see if the user with that id has that password. If it does, the user is who it says it is, you can see their rank in the database. If the data doesn’t match, you delete the cookie and they are off. The only way to hack this is to know the pass, and if the cracker knew that he wouldn’t be going in the back door.

Hold as little data as you can in the cookie, all it needs to do is provide solid data that the user in question is legit, not what rank they are.

I hope this article will help you make more secure PHP applications, if their is anything that has slipped my mind for this, please drop me a pm.

License

Feel free to post this anywhere as long as the below line is here
Originally written by Village Idiot of JustAnotherPortfolio.com

Understanding memory usage on Linux

This entry is for those people who have ever wondered, "Why the hell is a simple KDE text editor taking up 25 megabytes of memory?" Many people are led to believe that many Linux applications, especially KDE or Gnome programs, are "bloated" based solely upon what tools like ps report. While this may or may not be true, depending on the program, it is not generally true -- many programs are much more memory efficient than they seem.

What ps reports
The ps tool can output various pieces of information about a process, such as its process id, current running state, and resource utilization. Two of the possible outputs are VSZ and RSS, which stand for "virtual set size" and "resident set size", which are commonly used by geeks around the world to see how much memory processes are taking up.

For example, here is the output of ps aux for KEdit on my computer:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
dbunker 3468 0.0 2.7 25400 14452 ? S 20:19 0:00 kdeinit: kedit

According to ps, KEdit has a virtual size of about 25 megabytes and a resident size of about 14 megabytes (both numbers above are reported in kilobytes). It seems that most people like to randomly choose to accept one number or the other as representing the real memory usage of a process. I'm not going to explain the difference between VSZ and RSS right now but, needless to say, this is the wrong approach; neither number is an accurate picture of what the memory cost of running KEdit is.

Why ps is "wrong"
Depending on how you look at it, ps is not reporting the real memory usage of processes. What it is really doing is showing how much real memory each process would take up if it were the only process running. Of course, a typical Linux machine has several dozen processes running at any given time, which means that the VSZ and RSS numbers reported by ps are almost definitely "wrong". In order to understand why, it is necessary to learn how Linux handles shared libraries in programs.

Most major programs on Linux use shared libraries to facilitate certain functionality. For example, a KDE text editing program will use several KDE shared libraries (to allow for interaction with other KDE components), several X libraries (to allow it to display images and copy and pasting), and several general system libraries (to allow it to perform basic operations). Many of these shared libraries, especially commonly used ones like libc, are used by many of the programs running on a Linux system. Due to this sharing, Linux is able to use a great trick: it will load a single copy of the shared libraries into memory and use that one copy for every program that references it.

For better or worse, many tools don't care very much about this very common trick; they simply report how much memory a process uses, regardless of whether that memory is shared with other processes as well. Two programs could therefore use a large shared library and yet have its size count towards both of their memory usage totals; the library is being double-counted, which can be very misleading if you don't know what is going on.

Unfortunately, a perfect representation of process memory usage isn't easy to obtain. Not only do you need to understand how the system really works, but you need to decide how you want to deal with some hard questions. Should a shared library that is only needed for one process be counted in that process's memory usage? If a shared library is used my multiple processes, should its memory usage be evenly distributed among the different processes, or just ignored? There isn't a hard and fast rule here; you might have different answers depending on the situation you're facing. It's easy to see why ps doesn't try harder to report "correct" memory usage totals, given the ambiguity.

Seeing a process's memory map
Enough talk; let's see what the situation is with that "huge" KEdit process. To see what KEdit's memory looks like, we'll use the pmap program (with the -d flag):

Address Kbytes Mode Offset Device Mapping
08048000 40 r-x-- 0000000000000000 0fe:00000 kdeinit
08052000 4 rw--- 0000000000009000 0fe:00000 kdeinit
08053000 1164 rw--- 0000000008053000 000:00000 [ anon ]
40000000 84 r-x-- 0000000000000000 0fe:00000 ld-2.3.5.so
40015000 8 rw--- 0000000000014000 0fe:00000 ld-2.3.5.so
40017000 4 rw--- 0000000040017000 000:00000 [ anon ]
40018000 4 r-x-- 0000000000000000 0fe:00000 kedit.so
40019000 4 rw--- 0000000000000000 0fe:00000 kedit.so
40027000 252 r-x-- 0000000000000000 0fe:00000 libkparts.so.2.1.0
40066000 20 rw--- 000000000003e000 0fe:00000 libkparts.so.2.1.0
4006b000 3108 r-x-- 0000000000000000 0fe:00000 libkio.so.4.2.0
40374000 116 rw--- 0000000000309000 0fe:00000 libkio.so.4.2.0
40391000 8 rw--- 0000000040391000 000:00000 [ anon ]
40393000 2644 r-x-- 0000000000000000 0fe:00000 libkdeui.so.4.2.0
40628000 164 rw--- 0000000000295000 0fe:00000 libkdeui.so.4.2.0
40651000 4 rw--- 0000000040651000 000:00000 [ anon ]
40652000 100 r-x-- 0000000000000000 0fe:00000 libkdesu.so.4.2.0
4066b000 4 rw--- 0000000000019000 0fe:00000 libkdesu.so.4.2.0
4066c000 68 r-x-- 0000000000000000 0fe:00000 libkwalletclient.so.1.0.0
4067d000 4 rw--- 0000000000011000 0fe:00000 libkwalletclient.so.1.0.0
4067e000 4 rw--- 000000004067e000 000:00000 [ anon ]
4067f000 2148 r-x-- 0000000000000000 0fe:00000 libkdecore.so.4.2.0
40898000 64 rw--- 0000000000219000 0fe:00000 libkdecore.so.4.2.0
408a8000 8 rw--- 00000000408a8000 000:00000 [ anon ]
... (trimmed) ...
mapped: 25404K writeable/private: 2432K shared: 0K

I cut out a lot of the output; the rest is similar to what is shown. Even without the complete output, we can see some very interesting things. One important thing to note about the output is that each shared library is listed twice; once for its code segment and once for its data segment. The code segments have a mode of "r-x--", while the data is set to "rw---". The Kbytes, Mode, and Mapping columns are the only ones we will care about, as the rest are unimportant to the discussion.

If you go through the output, you will find that the lines with the largest Kbytes number are usually the code segments of the included shared libraries (the ones that start with "lib" are the shared libraries). What is great about that is that they are the ones that can be shared between processes. If you factor out all of the parts that are shared between processes, you end up with the "writeable/private" total, which is shown at the bottom of the output. This is what can be considered the incremental cost of this process, factoring out the shared libraries. Therefore, the cost to run this instance of KEdit (assuming that all of the shared libraries were already loaded) is around 2 megabytes. That is quite a different story from the 14 or 25 megabytes that ps reported.

What does it all mean?
The moral of this story is that process memory usage on Linux is a complex matter; you can't just run ps and know what is going on. This is especially true when you deal with programs that create a lot of identical children processes, like Apache. ps might report that each Apache process uses 10 megabytes of memory, when the reality might be that the marginal cost of each Apache process is 1 megabyte of memory. This information becomes critial when tuning Apache's MaxClients setting, which determines how many simultaneous requests your server can handle (although see one of my past postings for another way of increasing Apache's performance).

It also shows that it pays to stick with one desktop's software as much as possible. If you run KDE for your desktop, but mostly use Gnome applications, then you are paying a large price for a lot of redundant (but different) shared libraries. By sticking to just KDE or just Gnome apps as much as possible, you reduce your overall memory usage due to the reduced marginal memory cost of running new KDE or Gnome applications, which allows Linux to use more memory for other interesting things (like the file cache, which speeds up file accesses immensely).