Open Source Technology: 05/06/08

Tuesday, May 6, 2008

Optimising MYSQL

DO NOT CHANGE your setting without understanding what you do.
Make a backup of your my.cnf before editing the /etc/my.cnf
DO THESE CHANGES AT YOUR OWN RISKS.

My thread is only to help you to more well understand how tuning mysql.

Optimising mysql is very well commented on the net, and you’ll find huge information on how to do this. There is never “best parameters”, the best parameters is those fits your needs, box hardware, mysql usage…
So, I’ll not give the best parameters but rather how to define these ones. Make some tests, and you’ll quickly find your own parameters.

I’ll give you at the end of this post some web pointers which may help you.

There a lot of available parameters but only few one are very important to tweak your mysql box.

The most important variables are (for me, and it is not exhaustive)
- max_connections
- wait_timeout
- thread_cache_size
-
- table_cache
-
- key_buffer_size
- query_cache_size
- tmp_table_size

First of all, how to find your variable, and the mysql usage ?

*VARIABLES
from mysql :
show variables;

or from command line :
mysqladmin variables

*PROCESS / STATUS
from Mysql :
show status;

or from command line
mysqladmin –i10 processlist extended-status

*SOME USEFUL COMMAND FOR YOU BOX USAGE
>Top

>ps –axfu

>vmstat 1

* OPTIMISING MYSQL

To obtain the stat of your mysql server since it has been loaded, run mysqladmin processlist extended-status as mentionned above.

1 - The two most important variables : Table_cache and Key_buffer_size

* If Opened_tables is big, then your table_cache variable is probably
too small.
table_cache 64
Open_tables 64
Opened_tables 544468

This is the first serious problem. "The table_cache is the number of open
tables for all threads. MySQL, being multi-threaded, may be running many
queries on the table at one time, and each of these will open a table."
Therefore, even though we only have a few tables, we will need many more
open_tables.

The Opened_tables value is high and shows the number of
cache misses. Getting the table_cache size correct is one of the two best
things you can do to improve performance.

* If Key_reads is big, then your key_buffer_size variable is probably
too small. The cache hit rate can be calculated with
Key_reads/Key_read_requests.
key_buffer_size 16M
Key_read_requests 2973620399
Key_reads 8490571
(cache hit rate = 0.0028)

“The key_buffer_size affects the size of the index buffers and the speed
of index handling, particularly reading." The MySQL manual (and other sources) say that
"Key_reads/Key_read_request ratio should normally be ".

2 - Others important settings are : Wait_timeout, max_connexion, thread_cache

A little explanation :
Generaly you have a lot of mysql process that are sleeping because wait_timeout are not set low. So I make sure that the wait_timeout is set to a very low value: 15 seconds (for me) . That means MySQL would close any connection that was idle for more than 15 seconds.

The problem is you also have to increment your max_connexion (mine is set to 300) to be sure there is not a lot of idle clients holding connections and blocking out new clients from connecting and getting real work done.
The pbm is that the box has to create new threads (MySQL is a multi-threaded server) at a very high rate. That may sucks up a measurable amount of CPU time.

So the solution is to use the Thread_cache (from mysql doc) :
“How many threads we should keep in a cache for reuse. When a client disconnects, the client's threads are put in the cache if there aren't more than thread_cache_size threads from before. All new threads are first taken from the cache, and only when the cache is empty is a new thread created. This variable can be increased to improve performance if you have a lot of new connections. (Normally this doesn't give a notable performance improvement if you have a good thread implementation.) By examing the difference between the Connections and Threads_created you can see how efficient the current thread cache is for you.”

* If Threads_created is big, you may want to increase the
thread_cache_size variable. The cache hit rate can be calculated with
Threads_created/Connections.
thread_cache_size 0
Threads_created 150022
Connections 150023

This is the second problem that should be fixed. A cache size of zero is the default for my-medium.cnf but the recommended size in my-large.cnf is 8.

you may try this formula : table_cache = opened table / max_used_connection

3 - Finally, you may also have a look at : tmp_table_size and Handler_read_rnd / Handler_read_rnd_next

* If Created_tmp_disk_tables is big, you may want to increase the
tmp_table_size variable to get the temporary tables memory-based instead
of disk based.

tmp_table_size 32M
Created_tmp_disk_tables 3227
Created_tmp_tables 159832
Created_tmp_files 4444

Created_tmp_disk_tables are the "number of implicit temporary tables on
disk created while executing statements" and Created_tmp_tables are
memory-based. Obviously it is bad if you have to go to disk instead of
memory. About 2% of temp tables go to disk, which doesn't seem too bad
but increasing the tmp_table_size probably couldn't hurt either.

* If Handler_read_rnd is big, then you probably have a lot of queries
that require MySQL to scan whole tables or you have joins that don't use
keys properly.
Handler_read_rnd 27712353
Handler_read_rnd_next 283536234

These values are high, that we could probably stand to improve
the indexes and queries.

I hope this will help some of you to more understand how it is possible to optimise MYSQL to fit your needs, hardaware box, or mysql current usage.

I forgot to tell you two or three importants things like :

Used MySQL memory = key_buffer + max_connections * (join_buffer + record_buffer + sort_buffer + thread_stack + tmp_table_size)

Notice the max_connexion and the multiplier.
connexion increase = memory usage increase too.

Notice key_buffer
for a given memory :
more you add mem to key buffer, less connexion is
less is key buffer, more connexion is

If you change one of these settings for a high value, you system may swap.
If you system swap, try lot decrease these values

Also, about table_cache :
Increasing the size of the table cache may really help you.
But you must be careful not to make the value too large. All operating systems have a limit on the number "open file pointer" (sorry in french it is called pointer, maybe descriptors is the good translation) a single process may have.
If MySQL tries to open a lot of files, the OS may refuse it and MySQL will generate error message in the error log

Mysql: HowTo

How MySQL Uses Indexes

Indexes are used to find rows with specific column values fast. Without an index, MySQL has to start with the first record and then read through the whole table to find the relevant rows. The bigger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. If a table has 1,000 rows, this is at least 100 times faster than reading sequentially. Note that if you need to access almost all 1,000 rows, it is faster to read sequentially, because that minimizes disk seeks.

Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees. Exceptions are that indexes on spatial column types use R-trees, and MEMORY (HEAP) tables support hash indexes.

Strings are automatically prefix- and end-space compressed.

In general, indexes are used as described in the following discussion. Characteristics specific to hash indexes (as used in MEMORY tables) are described at the end of this section.

To quickly find the rows that match a WHERE clause.
To eliminate rows from consideration. If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows.
To retrieve rows from other tables when performing joins.
To find the MIN() or MAX() value for a specific indexed column key_col. This is optimized by a preprocessor that checks whether you are using WHERE key_part_# = constant on all key parts that occur before key_col in the index. In this case, MySQL will do a single key lookup for each MIN() or MAX() expression and replace it with a constant. If all expressions are replaced with constants, the query will return at once. For example:
```
SELECT MIN(key_part2),MAX(key_part2)
FROM tbl_name WHERE key_part1=10;
```
To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable key (for example, ORDER BY key_part1, key_part2). If all key parts are followed by DESC, the key is read in reverse order. See Section 6.2.9, "How MySQL Optimizes ORDER BY."
In some cases, a query can be optimized to retrieve values without consulting the data rows. If a query uses only columns from a table that are numeric and that form a leftmost prefix for some key, the selected values may be retrieved from the index tree for greater speed:
```
SELECT key_part3 FROM tbl_name WHERE key_part1=1
```

Suppose that you issue the following SELECT statement:

mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer tries to find the most restrictive index by deciding which index will find fewer rows and using that index to fetch the rows.

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL can't use a partial index if the columns don't form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here:

SELECT * FROM tbl_name WHERE col1=val1;
SELECT * FROM tbl_name WHERE col2=val2;
SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;

If an index exists on (col1, col2, col3), only the first of the preceding queries uses the index. The second and third queries do involve indexed columns, but (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).

An index is used for columns that you compare with the =, >, >=, <, <=, or BETWEEN operators.

MySQL also uses indexes for LIKE comparisons if the argument to LIKE is a constant string that doesn't start with a wildcard character. For example, the following SELECT statements use indexes:

SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
SELECT * FROM tbl_name WHERE key_col LIKE 'Pat%_ck%';

In the first statement, only rows with 'Patrick' <= key_col < 'Patricl' are considered. In the second statement, only rows with 'Pat' <= key_col < 'Pau' are considered.

The following SELECT statements will not use indexes:

SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
SELECT * FROM tbl_name WHERE key_col LIKE other_col;

In the first statement, the LIKE value begins with a wildcard character. In the second statement, the LIKE value is not a constant.

MySQL 4.0 and up performs an additional LIKE optimization. If you use ... LIKE '%string%' and string is longer than three characters, MySQL will use the Turbo Boyer-Moore algorithm to initialize the pattern for the string and then use this pattern to perform the search quicker.

Searching using col_name IS NULL will use indexes if col_name is indexed.

Any index that doesn't span all AND levels in the WHERE clause is not used to optimize the query. In other words, to be able to use an index, a prefix of the index must be used in every AND group.

The following WHERE clauses use indexes:

... WHERE index_part1=1 AND index_part2=2 AND other_column=3
/* index = 1 OR index = 2 */
... WHERE index=1 OR A=10 AND index=2
/* optimized like "index_part1='hello'" */
... WHERE index_part1='hello' AND index_part3=5
/* Can use index on index1 but not on index2 or index3 */
... WHERE index1=1 AND index2=2 OR index1=3 AND index3=3;

These WHERE clauses do not use indexes:

  /* index_part1 is not used */
... WHERE index_part2=1 AND index_part3=2
/* Index is not used in both AND parts */
... WHERE index=1 OR A=10
/* No index spans all rows */
... WHERE index_part1=1 OR index_part2=10

Sometimes MySQL will not use an index, even if one is available. One way this occurs is when the optimizer estimates that using the index would require MySQL to access a large percentage of the rows in the table. (In this case, a table scan is probably much faster, because it will require many fewer seeks.) However, if such a query uses LIMIT to only retrieve part of the rows, MySQL will use an index anyway, because it can much more quickly find the few rows to return in the result.

Hash indexes have somewhat different characteristics than those just discussed:

They are used only for = or <=> comparisons (but are very fast).
The optimizer cannot use a hash index to speed up ORDER BY operations. (This type of index cannot be used to search for the next entry in order.)
MySQL cannot determine approximately how many rows there are between two values (this is used by the range optimizer to decide which index to use). This may affect some queries if you change a MyISAM table to a hash-indexed MEMORY table.
Only whole keys can be used to search for a row. (With a B-tree index, any prefix of the key can be used to find rows.)

The MyISAM Key Cache

To minimize disk I/O, the MyISAM storage engine employs a strategy that is used by many database management systems. It exploits a cache mechanism to keep the most frequently accessed table blocks in memory:

For index blocks, a special structure called the key cache (key buffer) is maintained. The structure contains a number of block buffers where the most-used index blocks are placed.
For data blocks, MySQL uses no special cache. Instead it relies on the native operating system filesystem cache.

This section first describes the basic operation of the MyISAM key cache. Then it discusses changes made in MySQL 4.1 that improve key cache performance and that enable you to better control cache operation:

Access to the key cache no longer is serialized among threads. Multiple threads can access the cache concurrently.
You can set up multiple key caches and assign table indexes to specific caches.

The key cache mechanism also is used for ISAM tables. However, the significance of this fact is on the wane. ISAM table use has been decreasing since MySQL 3.23 when MyISAM was introduced. MySQL 4.1 carries this trend further; the ISAM storage engine is disabled by default.

You can control the size of the key cache by means of the key_buffer_size system variable. If this variable is set equal to zero, no key cache is used. The key cache also is not used if the key_buffer_size value is too small to allocate the minimal number of block buffers (8).

When the key cache is not operational, index files are accessed using only the native filesystem buffering provided by the operating system. (In other words, table index blocks are accessed using the same strategy as that employed for table data blocks.)

An index block is a contiguous unit of access to the MyISAM index files. Usually the size of an index block is equal to the size of nodes of the index B-tree. (Indexes are represented on disk using a B-tree data structure. Nodes at the bottom of the tree are leaf nodes. Nodes above the leaf nodes are non-leaf nodes.)

All block buffers in a key cache structure are the same size. This size can be equal to, greater than, or less than the size of a table index block. Usually one of these two values is a multiple of the other.

When data from any table index block must be accessed, the server first checks whether it is available in some block buffer of the key cache. If it is, the server accesses data in the key cache rather than on disk. That is, it reads from the cache or writes into it rather than reading from or writing to disk. Otherwise, the server chooses a cache block buffer containing a different table index block (or blocks) and replaces the data there by a copy of required table index block. As soon as the new index block is in the cache, the index data can be accessed.

If it happens that a block selected for replacement has been modified, the block is considered "dirty." In this case, before being replaced, its contents are flushed to the table index from which it came.

Usually the server follows an LRU (Least Recently Used) strategy: When choosing a block for replacement, it selects the least recently used index block. To be able to make such a choice easy, the key cache module maintains a special queue (LRU chain) of all used blocks. When a block is accessed, it is placed at the end of the queue. When blocks need to be replaced, blocks at the beginning of the queue are the least recently used and become the first candidates for eviction.

Key Cache Block Size

MySQL 4.1 introduces a new key_cache_block_size variable on a per-key cache basis. This variable specifies the size of the block buffers for a key cache. It is intended to allow tuning of the performance of I/O operations for index files.

The best performance for I/O operations is achieved when the size of read buffers is equal to the size of the native operating system I/O buffers. But setting the size of key nodes equal to the size of the I/O buffer does not always ensure the best overall performance. When reading the big leaf nodes, the server pulls in a lot of unnecessary data, effectively preventing reading other leaf nodes.

Currently, you cannot control the size of the index blocks in a table. This size is set by the server when the .MYI index file is created, depending on the size of the keys in the indexes present in the table definition. In most cases, it is set equal to the I/O buffer size. In the future, this will be changed and then key_cache_block_size variable will be fully employed.

Restructuring a Key Cache

A key cache can be restructured at any time by updating its parameter values. For example:

mysql> SET GLOBAL cold_cache.key_buffer_size=4*1024*1024;

If you assign to either the key_buffer_size or key_cache_block_size key cache component a value that differs from the component's current value, the server destroys the cache's old structure and creates a new one based on the new values. If the cache contains any dirty blocks, the server saves them to disk before destroying and re-creating the cache. Restructuring does not occur if you set other key cache parameters.

When restructuring a key cache, the server first flushes the contents of any dirty buffers to disk. After that, the cache contents become unavailable. However, restructuring does not block queries that need to use indexes assigned to the cache. Instead, the server directly accesses the table indexes using native filesystem caching. Filesystem caching is not as efficient as using a key cache, so although queries will execute, a slowdown can be anticipated. Once the cache has been restructured, it becomes available again for caching indexes assigned to it, and the use of filesystem caching for the indexes ceases.

How MySQL Counts Open Tables

When you execute a mysqladmin status command, you'll see something like this:

Uptime: 426 Running threads: 1 Questions: 11082
Reloads: 1 Open tables: 12

The Open tables value of 12 can be somewhat puzzling if you have only six tables.

MySQL is multi-threaded, so there may be many clients issuing queries for a given table simultaneously. To minimize the problem with multiple client threads having different states on the same file, the table is opened independently by each concurrent thread. This takes some memory but normally increases performance. With MyISAM tables, one extra file descriptor is required for the data file for each client that has the table open. (By contrast, the index file descriptor is shared between all threads.) The ISAM storage engine shares this behavior.

You can read more about this topic in the next section. See Section 6.4.8, "How MySQL Opens and Closes Tables."

How MySQL Opens and Closes Tables

The table_cache, max_connections, and max_tmp_tables system variables affect the maximum number of files the server keeps open. If you increase one or more of these values, you may run up against a limit imposed by your operating system on the per-process number of open file descriptors. Many operating systems allow you to increase the open-files limit, although the method varies widely from system to system. Consult your operating system documentation to determine whether it is possible to increase the limit and how to do so.

table_cache is related to max_connections. For example, for 200 concurrent running connections, you should have a table cache size of at least 200 * N, where N is the maximum number of tables in a join. You also need to reserve some extra file descriptors for temporary tables and files.

Make sure that your operating system can handle the number of open file descriptors implied by the table_cache setting. If table_cache is set too high, MySQL may run out of file descriptors and refuse connections, fail to perform queries, and be very unreliable. You also have to take into account that the MyISAM storage engine needs two file descriptors for each unique open table. You can increase the number of file descriptors available for MySQL with the --open-files-limit startup option to mysqld_safe. See Section A.2.17, "File Not Found."

The cache of open tables will be kept at a level of table_cache entries. The default value is 64; this can be changed with the --table_cache option to mysqld. Note that MySQL may temporarily open even more tables to be able to execute queries.

An unused table is closed and removed from the table cache under the following circumstances:

When the cache is full and a thread tries to open a table that is not in the cache.
When the cache contains more than table_cache entries and a thread is no longer using a table.
When a table flushing operation occurs. This happens when someone issues a FLUSH TABLES statement or executes a mysqladmin flush-tables or mysqladmin refresh command.

When the table cache fills up, the server uses the following procedure to locate a cache entry to use:

Tables that are not currently in use are released, in least recently used order.
If a new table needs to be opened, but the cache is full and no tables can be released, the cache is temporarily extended as necessary.

When the cache is in a temporarily extended state and a table goes from a used to unused state, the table is closed and released from the cache.

A table is opened for each concurrent access. This means the table needs to be opened twice if two threads access the same table or if a thread accesses the table twice in the same query (for example, by joining the table to itself). Each concurrent open requires an entry in the table cache. The first open of any table takes two file descriptors: one for the data file and one for the index file. Each additional use of the table takes only one file descriptor, for the data file. The index file descriptor is shared among all threads.

If you are opening a table with the HANDLER tbl_name OPEN statement, a dedicated table object is allocated for the thread. This table object is not shared by other threads and is not closed until the thread calls HANDLER tbl_name CLOSE or the thread terminates. When this happens, the table is put back in the table cache (if the cache isn't full).

You can determine whether your table cache is too small by checking the mysqld status variable Opened_tables:

mysql> SHOW STATUS LIKE 'Opened_tables';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Opened_tables | 2741  |
+---------------+-------+

If the value is quite big, even when you haven't issued a lot of FLUSH TABLES statements, you should increase your table cache size. See Section 4.2.3, "Server System Variables," and Section 4.2.4, "Server Status Variables."

Drawbacks to Creating Many Tables in the Same Database

If you have many MyISAM or ISAM tables in a database directory, open, close, and create operations will be slow. If you execute SELECT statements on many different tables, there will be a little overhead when the table cache is full, because for every table that has to be opened, another must be closed. You can reduce this overhead by making the table cache larger.