Recently, we had a situation at work where one of our myISAM tables was causing search degradation by response times. The sheer volume of data was causing single-join queries to appear in the slow-query log. Even though the table was well-indexed, and the cardinality of the indexes were high, searches were taking quite a bit of time to execute. A teammate proposed that instead of searching for a particular tuple of data, we batch request and gain improvement by reducing the number of calls to the database.
[cc lang='mysql' ] SELECT * FROM table WHERE STATUS = active AND ID = some_value; [/cc]
and replacing that with: [cc lang='mysql' ] SELECT * FROM table WHERE STATUS = active AND ID IN (value1, ..., valueN) [/cc]
Where (1 > N < 50). In test runs, the first query usually executed in about 0.06 seconds, where as the second query executed in about 0.64 seconds. The argument in favor of the second query was that (in this particular example, N=30), that we were returning 30x data in 10x time. Since the query is an off-line (as opposed to real-time in response to a customer request), the increased time was favored because we were returning more data.
First, I'm not debating, pondering, questioning, or attacking the validity of either query - that's not the point of this post.
The point of the post is that, IMO, first this data belongs in a myISAM table as the data is WORM. Second, while there are several indexes on the table, the STATUS table is not indexed as the cardinality of said index would be equal to two. It may be viable to index the table on this column, but I happen to know that about 98% of the data is of active STATUS anyway so you'd still be performing near to a full table scan based on that column. You could argue for making this table an INNODB type table since there's only three single-key indexes on the table...but you're going to need a butt-load of disk-space...
The table in question had about 20-million rows. The production version of the same table had considerably more - but this was a reduced version suitable for my laptop.
In order to create a mySQL MERGE table, the first requirement is that your target tables be of type myISAM. (so far, so good.) To convert your tables over to type MERGE, you have to create new tables that have exactly the same data-definition and indexes. In order to do this, I arbitrarily decided to divide my 20-million row table into 10 tables of 2 million rows each. So, I created 10 new tables first by executing the DDL mysql, then by populating the table.
[cc lang='mysql' ]
CREATE TABLE `foo_10_table` ( `id` int(11) NOT NULL AUTO_INCREMENT, `a` int(11) NOT NULL, `b` int(11) NOT NULL, `c` int(11) NOT NULL, `d` int(11) NOT NULL, `e` int(11) NOT NULL, `f` mediumtext NOT NULL, `g` enum('active','inactive') NOT NULL DEFAULT 'active', `h` datetime NOT NULL, `i` datetime NOT NULL, `j` int(11) NOT NULL, `k` varchar(100) NOT NULL, `l` enum('active','inactive') NOT NULL DEFAULT 'active', PRIMARY KEY (`id`), KEY `a` (`a`), KEY `b` (`b`), KEY `j` (`j`) ) ENGINE=MyISAM AUTO_INCREMENT=19538530 DEFAULT CHARSET=latin1;
insert into foo_10_table (select * from foo_table where id > 17584677);
And you do this ten times, adjusting your row-count for the id value accordingly.
The impact on my filesystem was that I used an additional appx 700 MB to create the merge tables. Now it was time to create the actual merge table.
A merge storage engine, according to the mySQL website, "is a collection of identical
MyISAM tables that can be used as one. “Identical” means that all tables have identical column and index information." You create the merge table by creating a new table (a unique table name) with the engine type = MERGE, that has identical structure to your previously-created myISAM tables, and is composed as a union of those tables:
CREATE TABLE `foo_all_table` ( `id` int(11) NOT NULL AUTO_INCREMENT, `a` int(11) NOT NULL, `b` int(11) NOT NULL, `c` int(11) NOT NULL, `d` int(11) NOT NULL, `e` int(11) NOT NULL, `f` mediumtext NOT NULL, `g` enum('active','inactive') NOT NULL DEFAULT 'active', `h` datetime NOT NULL, `i` datetime NOT NULL, `j` int(11) NOT NULL, `k` varchar(100) NOT NULL, `l` enum('active','inactive') NOT NULL DEFAULT 'active', PRIMARY KEY (`id`), KEY `a` (`a`), KEY `b` (`b`), KEY `j` (`j`) ) ENGINE=merge union=(foo_10_table,foo_9_table, foo_8_table, foo_7_table, foo_6_table, foo_5_table, foo_4_table, foo_3_table foo_2_table, foo_1_table) INSERT_METHOD=FIRST, AUTO_INCREMENT=19538530 DEFAULT CHARSET=latin1;
The merge table itself (foo_all_table) isn't a real table, per-se. It's mySQL's gateway into your data where you could think of it as a collection of hash keys and pointers to your merge tables.
Once of the advantages of using merge tables is that, in this particular example, I have 10 tables that have been sub-divided off from my original table. Let's say that the bottom 10 tables contain data that's more than a year old. While I need to maintain the data, I know since the data is WORM, that I'll never need to modify the data. Ever. So, why not compress the data to save myself some disk space?
Merge tables support myisampack -- a utility which allows you to compress myISAM tables saving, on average, between 40-70% by compression. In my testing, compressing the oldest five tables, I saw appx 54.5% compression which returned almost 256 MB of disk storage.
sh-3.2# myisampack5 foo_6_table
Compressing foo_6_table.MYD: (1953853 records)
- Calculating statistics
- Compressing file
Remember to run myisamchk -rq on compressed tables
sh-3.2# myisamchk5 -rq foo_6_table
- check record delete-chain
- recovering (with sort) MyISAM-table 'mp_record_component6_table'
Data records: 1953853
- Fixing index 1
- Fixing index 2
- Fixing index 3
- Fixing index 4
The advantage of converting a large myISAM table into a MERGE table is that your searches will take significantly less time because you're scanning less data, locking less data.
For my example, when I executed the query that required 0.64 seconds to complete, the same query executed against my MERGE structure required 0.46 seconds to complete, a 28% savings in time.
Can this be improved? Certainly! However, the point of the exercise was only to demonstrate the efficiency and speed of MERGE tables, and how easy it is to set up. There's additional "tricks" for speeding up data access via MERGE tables, but I leave that to your own research.
Recommended Reading: Merge Tables: Advantages and Disadvantages