Level: Intermediate
Sean A. Walberg
(sean@ertw.com),
Senior Network Engineer
07 Jun 2007
Applications using the LAMP (Linux®,
Apache, MySQL, PHP/Perl) architecture are constantly being developed
and deployed. But often the server administrator has little control
over the application itself because it's written by someone else. This series
of three articles discusses many of the server configuration
items that can make or break an application's performance. This third
article, the last in the series, focuses on tuning the database layer
for maximum efficiency.
About MySQL tuning
You can do three things to make your MySQL server
faster, from least effective to most effective:
- Throw hardware at the problem.
- Tune the settings of the MySQL process.
- Optimize your queries.
Throwing hardware at a problem is often the first
thought, especially because databases are resource hogs. This solution
can only take you so far, though. In practical terms, you can usually
double your central processing unit (CPU) or disk speed, and maybe
increase your memory by a factor of 4 or 8.
The second best thing to do is to tune the MySQL
server, also called mysqld. Tuning the
process means allocating memory to the right places and giving mysqld
an idea of what type of load to expect. Rather than make the disks
faster, it's better to reduce the number of disk accesses needed.
Similarly, making sure the MySQL process is operating correctly means
it can spend more time servicing queries than taking care of background
tasks like temporary disk tables and opening and closing files. Tuning mysqld
is the focus of this article.
The best thing you can do is make sure your
queries are optimized. This means the proper indexes are applied to
tables, and queries are written in such a way that they take advantage
of MySQL's strengths. Even though this article doesn't cover query
tuning (books have been written on the subject), it configures mysqld
to report queries that may need tuning.
Just because these tasks have been assigned an
order doesn't mean you can ignore the hardware and mysqld
settings in favor of properly tuned queries. A slow machine is a slow
machine, and I've seen fast machines with well-written queries fail
under load because mysqld was consumed with
busy-work instead of servicing queries.
Log slow queries
In a SQL server, the data tables sit on disk.
Indexes provide a means for the server to find a particular row of data
in the table without having to search the entire table. When the entire
table has to be searched, it's called a table scan.
Most often, you want only a small subset of the data in the table, so a
full table scan wastes a lot of disk I/O and therefore time.This
problem is compounded when data must be joined, because many more rows
must be compared between the two sides of the join.
Of course, table scans aren't always a sign of a
problem; sometimes it's more efficient to read the whole table than it
is to pick through it (making these decisions is the job of the query
planner in the server process). Inefficient use of indexes, or not
being able to use indexes at all, slows the queries, and this issue
becomes more pronounced as the load on the server and the size of the
tables increases. Queries that take more than a given amount of time to
execute are called slow queries.
You can configure mysqld
to log slow queries in the aptly named slow query log. Administrators
then look at this log to help them determine which parts of the
application need further investigation. Listing 1 shows the
configuration required in my.cnf to enable the slow query log.
Listing 1. Enable
the MySQL slow query log
[mysqld] ; enable the slow query log, default 10 seconds log-slow-queries ; log queries taking longer than 5 seconds long_query_time = 5 ; log queries that don't use indexes even if they take less than long_query_time ; MySQL 4.1 and newer only log-queries-not-using-indexes
|
These three settings, used together, log any
queries that take longer than 5 seconds and any queries that don't use
indexes. Note the caveat about log-queries-not-using-indexes:
You must have MySQL 4.1 or newer. The slow query log is in your MySQL
data directory and is called hostname-slow.log. If
you'd rather use a different name or path, you can do so with log-slow-queries
= /new/path/to/file in my.cnf.
Reading through the slow query log is best done
with the mysqldumpslow command. Specify the
path to the logfile, and you're given a sorted list of the slow
queries, along with how many times they're found in the log. One
helpful feature is that mysqldumpslow removes
any user-specified data before collating the results, so different
invocations of the same query are counted as one; this helps point out
queries in need of the most work.
Cache queries
Many LAMP applications rely heavily on the
database but make the same queries over and over. Each time the query
is made, the database must do the same work -- parse the query,
determine how to execute it, load information from disk, and return it
to the client. MySQL has a feature called the query cache
that stores the result of a query in memory, should it be needed again.
In many instances, this increases performance drastically. The catch,
though, is that the query cache is disabled by default.
Adding query_cache_size = 32M
to /etc/my.conf enables a 32MB query cache.
Monitor the query cache
After you enable the query cache, it's important
to understand whether it's being used effectively. MySQL has several
variables you can watch to see how things are going in the cache.
Listing 2 shows the status of the cache.
Listing 2. Display
the query cache statistics
mysql> SHOW STATUS LIKE 'qcache%'; +-------------------------+------------+ | Variable_name | Value | +-------------------------+------------+ | Qcache_free_blocks | 5216 | | Qcache_free_memory | 14640664 | | Qcache_hits | 2581646882 | | Qcache_inserts | 360210964 | | Qcache_lowmem_prunes | 281680433 | | Qcache_not_cached | 79740667 | | Qcache_queries_in_cache | 16927 | | Qcache_total_blocks | 47042 | +-------------------------+------------+ 8 rows in set (0.00 sec)
|
The breakdown of these items is shown in Table 1.
Table 1. MySQL query
cache variables
| Variable name |
Description |
Qcache_free_blocks
|
The number of contiguous memory blocks in
the cache. Higher numbers are a sign of fragmentation. FLUSH
QUERY CACHE defragments the cache so there's one free
block. |
Qcache_free_memory
|
The free memory in the cache. |
Qcache_hits |
Incremented each time a query is served
from the cache. |
Qcache_inserts |
Incremented each time a query is inserted.
Divide the number of inserts by the hits to get your miss rate;
subtract this value from 1 to get your hit rate. In the previous
example, approximately 87% of the queries are getting served from cache. |
Qcache_lowmem_prunes
|
How many times the cache ran out of memory
and had to be cleaned up to make room for more queries. This number is
best looked at over time; if it's increasing, it's a sign that either
fragmentation is serious or memory is low (free_blocks
and free_memory, above, tell you which it is). |
Qcache_not_cached
|
The number of queries that weren't
candidates for caching, usually because they weren't SELECT
statements. |
Qcache_queries_in_cache
|
The number of queries (and responses)
current cached. |
Qcache_total_blocks
|
The number of blocks in the cache. |
Often, showing the variables several seconds apart
indicates change, which helps determine whether the cache is being used
effectively. Running FLUSH STATUS resets some
of the counters, which is helpful if the server has been running for a
while.
It's tempting to make an excessively large query
cache in the hopes of caching everything. Because mysqld
must perform maintenance on the cache, such as pruning when memory
becomes low, the server can get bogged down trying to manage the cache.
As a rule, if FLUSH QUERY CACHE takes a long
time, the cache is too large.
Enforce limits
You should enforce a few limits in mysqld
to ensure that the system load doesn't cause resource starvations.
Listing 3 shows some important resource-related settings from my.cnf.
Listing 3. MySQL
resource settings
set-variable=max_connections=500 set-variable=wait_timeout=10 max_connect_errors = 100
|
The maximum connections are governed in the first
line. Like MaxClients from Apache, the idea
is to make sure only the number of connections you can serve are
allowed. To determine the maximum number of connections your server has
seen so far, execute SHOW STATUS LIKE
'max_used_connections'.
The second line tells mysqld
to terminate any connections that have been idle for more than 10
seconds. In LAMP applications, the connection to the database is
usually only as long as the Web server takes to process the request.
Sometimes, under load, connections hang around and take up connection
table space. If you have many interactive users or use persistent
connections to the database, then setting this low isn't a good idea!
The final line is a safety measure. If a host has
problems connecting to the server and ends up aborting the request too
many times, the host is locked until FLUSH HOSTS
can be run. By default, 10 failures are enough to cause blocking.
Changing this value to 100 gives the server enough time to recover from
whatever problems it has. Using a higher value doesn't help you much
because if the server can't connect once in 100 tries, chances are it's
not going to connect at all.
Buffers and caches
MySQL supports well over 100 tunable settings; but
luckily, mastering a small handful will take care of most needs.
Finding the right value for these settings involves looking at status
variables via the SHOW STATUS command and,
from that, determining whether mysqld is
behaving as you wish. You can't allocate more memory to buffers and
caches than exists in the system, so tuning often involves making
compromises.
MySQL tunables apply to either the whole mysqld
process or each individual client session.
Server-wide settings
Each table is represented as a file on disk and
must be opened before it can be read. To speed up the process of
reading from the file, mysqld caches these
open files up to the limit specified by table_cache
in /etc/mysqld.conf. Listing 4 shows how to display the activity
associated with opening tables.
Listing 4. Display
table-open activity
mysql> SHOW STATUS LIKE 'open%tables'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Open_tables | 5000 | | Opened_tables | 195 | +---------------+-------+ 2 rows in set (0.00 sec)
|
Listing 4 shows that 5,000 tables are currently
open and that 195 tables had to be opened because there was no
available file descriptor in the cache (the statistics were cleared
earlier, so it's feasible to have 5,000 open tables with a history of
only 195 opens). If Opened_tables increases
quickly as you rerun the SHOW STATUS command,
you aren't getting enough hits out of your cache. If Open_tables
is much lower than your table_cache setting,
you have too many (some room to grow is never a bad thing, though).
Adjust your table cache with table_cache = 5000,
for example.
Like the table cache, there is also a cache for
threads. mysqld spawns threads as needed when
receiving connections. On a busy server where connections are torn up
and down quickly, caching threads for use later speeds up the initial
connection.
Listing 5 shows how to determine if you have
enough threads cached.
Listing 5. Show
thread-usage statistics
mysql> SHOW STATUS LIKE 'threads%'; +-------------------+--------+ | Variable_name | Value | +-------------------+--------+ | Threads_cached | 27 | | Threads_connected | 15 | | Threads_created | 838610 | | Threads_running | 3 | +-------------------+--------+ 4 rows in set (0.00 sec)
|
The important value here is Threads_created,
which is incremented each time mysqld has to
create a new thread. If this number increases quickly between
successive SHOW STATUS commands, you should
look at increasing your thread cache. You do this with thread_cache
= 40, for example, in my.cnf.
The key buffer stores index blocks for MyISAM
tables. Ideally, requests for these blocks should come from memory
instead of disk. Listing 6 shows how to determine how many blocks were
read from disk versus those from memory.
Listing 6.
Determine key efficiency
mysql> show status like '%key_read%'; +-------------------+-----------+ | Variable_name | Value | +-------------------+-----------+ | Key_read_requests | 163554268 | | Key_reads | 98247 | +-------------------+-----------+ 2 rows in set (0.00 sec)
|
Key_reads represents
the number of requests that hit disk, and Key_read_requests
is the total number. Dividing the reads by the read requests gives the
miss rate -- in this case, 0.6 misses per 1,000 requests. If you're
missing more than 1 per 1,000 requests, you should consider increasing
your key buffer. key_buffer = 384M, for
example, sets the buffer to 384MB.
Temporary tables are used in more advanced queries
where data must be stored temporarily before further processing
happens, such as in GROUP BY clauses.
Ideally, such tables are created in memory; but if a temporary table
gets too large, it's written to disk. Listing 7 shows the statistics
associated with temporary-table creation.
Listing 7.
Determine temporary-table usage
mysql> SHOW STATUS LIKE 'created_tmp%'; +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Created_tmp_disk_tables | 30660 | | Created_tmp_files | 2 | | Created_tmp_tables | 32912 | +-------------------------+-------+ 3 rows in set (0.00 sec)
|
Each use of a temporary table increases Created_tmp_tables;
disk-based tables also increment Created_tmp_disk_tables.
There is no hard-and-fast rule for the ratio because it depends on the
queries involved. Watching Created_tmp_disk_tables
over time shows the rate of created disk tables, and you can determine
the effectiveness of the settings. Both tmp_table_size
and max_heap_table_size control the maximum
size of temporary tables, so make sure you set them both in my.cnf.
Per-session settings
The following settings are per session. Take care
when you set these numbers, because when multiplied by the number of
potential connections, these options represent a lot of memory! You can
change these numbers in the session through code or for all sessions in
my.cnf.
When MySQL must perform a sort, it allocates a
sort buffer to store the rows as they're read from disk. If the size of
the data to sort is too large, the data must go to temporary files on
disk and be sorted again. If the sort_merge_passes
status variable is high, this is an indication of this disk activity.
Listing 8 shows some of the sort-related status counters.
Listing 8.
Show-sort statistics
mysql> SHOW STATUS LIKE "sort%"; +-------------------+---------+ | Variable_name | Value | +-------------------+---------+ | Sort_merge_passes | 1 | | Sort_range | 79192 | | Sort_rows | 2066532 | | Sort_scan | 44006 | +-------------------+---------+ 4 rows in set (0.00 sec)
|
If sort_merge_passes is
high, this is an indication that sort_buffer_size
needs attention. For example, sort_buffer_size = 4M
sets the sort buffer to 4MB.
MySQL also allocates memory to read tables.
Ideally, the indexes provide enough information to read in only the
needed rows, but sometimes queries (through poor design or the nature
of the data) require large chunks of the table to be read. To
understand this behavior, you need to know how many SELECT
statements were run and the number of times you had to read the next
row in the table (rather than a direct access through an index). The
commands to do so are shown in Listing 9.
Listing 9.
Determine table-scan ratio
mysql> SHOW STATUS LIKE "com_select"; +---------------+--------+ | Variable_name | Value | +---------------+--------+ | Com_select | 318243 | +---------------+--------+ 1 row in set (0.00 sec)
mysql> SHOW STATUS LIKE "handler_read_rnd_next"; +-----------------------+-----------+ | Variable_name | Value | +-----------------------+-----------+ | Handler_read_rnd_next | 165959471 | +-----------------------+-----------+ 1 row in set (0.00 sec)
|
The Handler_read_rnd_next
/ Com_select gives your table-scan ratio --
in this case, 521:1. Anything over 4000, and you should look at your read_buffer_size,
such as read_buffer_size = 4M. If you're
growing this number beyond 8M, it's time to talk to your developers
about tuning those queries!
Three must-have tools
Even though the SHOW STATUS
commands are helpful when you're drilling down into specific settings,
you need some tools to help you interpret the vast amounts of data
provided by mysqld. I've found three tools to
be indispensable; you can find links in the Resources
section.
Most sysadmins are familiar with the top
command, which provides a constantly updated view of the CPU and memory
consumed by tasks. mytop is modelled after top;
it provides a view of all the connected clients along with any queries
they're currently running. mytop also
provides real-time and historical data about key-buffer and query-cache
efficiency, and statistics about the queries being run. It's a useful
tool to see what's going on -- within 10 seconds, you can get a view of
the server's health and display any connections that are causing
problems.
mysqlard is a daemon
that connects to the MySQL server and collects data every 5 minutes,
storing it in a Round Robin Database backend. A Web page displays the
data, such as table-cache usage, key efficiency, connected clients, and
temporary-table usage. Whereas mytop provides
a snapshot of server health, mysqlard
provides long-term health information. As a bonus, mysqlard
uses some of the information it collects to make suggestions about how
to tune your server.
Another tool for collecting SHOW
STATUS information is mysqlreport.
It's far more verbose in its reporting than mysqlard
because it analyzes every facet of the server. It's an excellent tool
for tuning a server because it performs the appropriate calculations on
the status variables to help you determine what needs fixing.
Summary
This article covered the basics of MySQL tuning
and concludes this three-part series on tuning LAMP components. Tuning
is largely about understanding how things work, determining if they're
working properly, making adjustments, and re-evaluating. Each component
-- Linux, Apache, PHP, or MySQL -- has various needs. Understanding
them individually helps eliminate the bottlenecks that can slow your
application.
Resources
Learn
- "Migrate
from MySQL or PostgreSQL to DB2 Express-C" (developerWorks,
June 2006) offers an easy way to migrate from MySQL to DB2 Express-C.
- IBM also provides help for MySQL
administrators who are moving to DB2 Express-C by means of "Leveraging
MySQL skills to learn DB2 Express" (developerWorks, Feb 2006)
and other articles in the series.
- "Using
MySQL in a federated database environment" (developerWorks,
Dec 2004) is a tutorial on accessing data stored in a MySQL database
from WebSphere. IBM makes sure that WebSphere® software works well with
MySQL.
- SHOW
VARIABLES and SHOW
STATUS are defined well in the MySQL documentation.
- If you like blogs, MySQL
Performance Blog, Xaprb,
and MySQL
DBA are worth reading.
- In the Architecture
area on developerWorks, get the resources you need to advance
your skills in the architecture arena. Developing the proper
architecture is the key to scaling LAMP applications.
- In the developerWorks
Linux zone, find more resources for Linux developers,
including Linux
tutorials, as well as our
readers' favorite Linux articles and tutorials over the last
month.
- Stay current with developerWorks
technical events and Webcasts.
Get products and technologies
- Despite being three years old by now,
High
Performance MySQL is still a valuable book. The
author also has a Web site with various articles
about MySQL.
- mytop
tells you exactly what's going on with your MySQL server at that
instant and provides some key statistics. This is the first program I
turn to when I hear of database problems.
- mysqlard
graphs key performance indicators in MySQL servers and provides tuning
advice.
- mysqlreport
is a must-have tool. It analyzes
SHOW STATUS
variables on your behalf.
- A MySQL article isn't complete without a link
to phpMyAdmin.
Even though some interpretation of the status variables is provided,
this product's strength is how easy it makes administration.
- Order
the SEK for Linux, a two-DVD set containing the latest IBM
trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and
WebSphere®.
- With IBM
trial software, available for download directly from
developerWorks, build your next development project on Linux.
Discuss
About the author
 |
|

|
 |
Sean
Walberg has been working with Linux and UNIX since 1994 in academic,
corporate, and Internet service provider environments. He has written
extensively about systems administration over the past several years.
|
|