How to find duplicate and redundant indexes in MySQL
Peter Zaitsev over at the excellent MySQL Performance Blog recently wrote an article on duplicated and redundant indexes — any indexes which cover exactly the same columns as another index, or cover a leftmost prefix of another index. While there are subtleties, such as FULLTEXT indexes not being the same as non-FULLTEXT, for the most part this is sufficient criteria to raise possible duplicates to a DBA’s attention. I opened my big mouth in the comments and said I could write a quick Perl script to discover possible offenders in just a few lines of code. Once I did that, I had to do it and give you the script. Here it is.
The reason this is really easy to do in Perl is that the output of SHOW CREATE TABLE lists each index with its columns in order, in an easy-to-parse way, and therefore all one needs to do is compare the string that defines each index with each other index to find duplication and redundancy. Note: you just need to compare the string definition! You don’t need to actually parse out the columns and do any advanced computer science on them. And a quick regular expression to anchor each index definition to the beginning of the one to which you’re comparing it will satisfy the “leftmost prefix” requirement.
Why use SHOW CREATE TABLE’s output? Why not query SHOW INDEXES FROM ____ and use that instead? Well, first of all it’s way faster, as I also said in the comments on Peter’s blog. When I do something like this I like it to be zippy. SHOW INDEXES can take a long time, as it has to calculate stats on the indexes. Plus, even if I did use SHOW INDEXES, or query the INFORMATION_SCHEMA tables (also slow) I’d then have a result set of individual columns, which frankly I’d just concatenate together and do a string comparison on.
OK, on to my “advanced, patented algorithm.” Here’s a sample SHOW CREATE statement (I’m using a table from my recent article on role-based access control for an example):
mysql > show create table t_privilege\G
*************************** 1. row ***************************
Table: t_privilege
Create Table: CREATE TABLE `t_privilege` (
`c_role` varchar(30) NOT NULL default 'other',
`c_who` int(11) NOT NULL default '0',
`c_action` varchar(100) NOT NULL,
`c_type` varchar(30) NOT NULL default '',
`c_related_table` varchar(100) NOT NULL default '',
`c_related_uid` int(11) NOT NULL default '0',
PRIMARY KEY (`c_role`,`c_who`,`c_action`,`c_type`,`c_related_table`,`c_related_uid`),
KEY `c_role` (`c_role`,`c_who`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
You’ll notice I added a key on (c_role, c_who) which is a leftmost prefix of the primary key. In general, indexes always appear in this output as KEY (column names), with a possible PRIMARY or UNIQUE in front (update: it should not have FOREIGN in front, because that’s not an index). That’s pretty easy to parse with a regular expression, and grab just the columns. A global match captures every index into an array. Then it’s just a matter of looping through the array and comparing. Here is the code:
foreach my $table ( @tables ) {
my $ddl = $dbh->selectall_arrayref("show create table $table")
->[0]->[1];
my @indexes = $ddl =~ m/(?<!FOREIGN) KEY .*?\((.*?)\)[^\)]*$/mg;
my $has_dupes = 0;
foreach my $i ( 0..$#indexes ) {
my $index = $indexes[$i];
foreach my $j ( 0..$#indexes ) {
next if $i == $j;
my $other_index = $indexes[$j];
if ( $index =~ m/^$other_index/ ) {
print "Table $table has possible duplicate indexes:\n",
"\t$index\n\t$other_index\n";
$has_dupes = 1;
}
}
}
if ( $has_dupes ) {
print "Here is the CREATE TABLE statement:\n$ddl\n\n";
}
}
I literally wrote that in five minutes, so it may not catch everything, but it caught the duplicate I defined above:
Table t_privilege has possible duplicate indexes:
`c_role`,`c_who`,`c_action`,`c_type`,`c_related_table`,`c_related_uid`
`c_role`,`c_who`
Here is the CREATE TABLE statement:
CREATE TABLE `t_privilege` (
`c_role` varchar(30) NOT NULL default 'other',
`c_who` int(11) NOT NULL default '0',
`c_action` varchar(100) NOT NULL,
`c_type` varchar(30) NOT NULL default '',
`c_related_table` varchar(100) NOT NULL default '',
`c_related_uid` int(11) NOT NULL default '0',
PRIMARY KEY (`c_role`,`c_who`,`c_action`,`c_type`,`c_related_table`,`c_related_uid`),
KEY `c_role` (`c_role`,`c_who`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
As I said in my comments on Peter’s blog, I don’t really need to have something generate statements that can correct the problem for me, or anything like that. It’s nice, but it’s not essential. First of all, I’d never just trust a tool to go “fix” my tables for me. I’d want it to tell me where it found potential problems. Then I’d go inspect and alter the table by myself if I want to.
With my program, I don’t really have to “go inspect the table,” since it’s kind enough to print out the SHOW CREATE statement for me :-) Its output has everything I need to make a good decision about the table, unless it’s someone else’s table which I don’t understand well.
I wrapped the above Perl code into a script you can run from the command-line with familiar command-line arguments (plus it reads from your .my.cnf file to get defaults). You can download it and have fun with it. Execute perldoc duplicate-index-checker for all the gory details, or just use the --help command-line argument. Let me know if you want me to tweak it — I’m happy to. If you find a scenario it doesn’t work for, please put the SHOW CREATE statement in your comment.
One thing I also want to make it do, but it’s past my bedtime so I won’t do it tonight, is report duplicate foreign keys. I sometimes find this (actually I found a lot of them at my current employer). Maybe later this week.
For those who want more features, or don’t like Perl, check out a nice (and far more mature) Java implementation of a similar tool: MySQL Index Analyzer.



Thank you.
Great! You could also add it to forge.mysql.com (as well as innotop actually)
Peter Zaitsev
29 Aug 06 at 10:52 am
Yeah, I’m adding it to forge, but it doesn’t feel right to add it to innotop, because innotop is about monitoring, not ’static analysis’ or whatever you’d call this.
Xaprb
7 Sep 06 at 10:14 pm
Xaprb,
Great work man. Keep it up ;)
Frank
Frank Mash
8 Sep 06 at 10:48 am
Nice tool, thanks.
A little touch: it makes sense to process different types of indexes, full-text indexes in particular, separately.
For example, a complex full-text index cannot replace a simple index:
Script output:
Table test.user has possible duplicate indexes: `last_name`,`first_name',`email` `last_name`Dmitri Mikhailov
8 Sep 06 at 2:56 pm
Hi Xaprb/Baron,
Great tool. I have a question though. I was wondering why the project appears as “Duplicate index/foreign key finder” on MySQLForge.
I really know next to nothing about perl so I might miss an important point by not understanding the code, but it seems to me that if this script really compares the strings from the
SHOW CREATE TABLEoutput like described above, it can in no way spot all duplicate or redundant foreign key definitions.It depends on what you call a duplicate foreign key of course. I mean, suppose the (innoDB) parent table would have an index on (col1,col2) then I can create two foreign key constraints in the child table, one that references just (col1), and one that references (col1,col2). I suppose the script would catch these cases. However, I would not call those ‘duplicate’ foreign keys. I would prefer ’spurious’ or ‘redundant’. Take a look at this example:
(You can get this situation only if you create corresponding indexes (name,seq) and (seq,name) on the parent table as well of course. I used a primary key for (name,seq) and a unique constraint for (seq,name) to get this effect.)
Now that’s what I would call duplicate foreign keys – there really is no functional difference between these foreign key definitions (but there is for the aforementioned cases): they impose identical restrictions on the allowed input in the (name,seq) columns.
Of course, the involved indexes are not equivalent, and arguably not duplicates (because they potentially lead to different query execution plans).
Roland Bouman
17 Sep 06 at 11:27 am
Hi Dmitri, sorry I didn’t respond promptly. I lost track of which comments I needed to respond to. You have a good point about FULLTEXT. I have an idea about how to exclude this pretty easily. Of course, as I said above, even if it gives some false positives you can still notice them. Indexes are so tricky that I don’t think a computer could be smart enough to replace a good DBA, so it suits me OK to have some false positives. Better that than false negatives.
Hi Roland, that’s a very good point. I should clarify something. The script compares prefixes on indexes, but for foreign keys it compares the entire definition (excluding the name) so it won’t think FKs to (col1) and (col1, col2) are duplicates of each other. This is as it should be, as you point out.
See, my Advanced Patented Algorithm ™ for detecting foreign keys is different from indexes! Tricky! ;-)
But your second point is true, too. A FK on multiple columns is functionally the same regardless of column order. String comparison can’t catch those. My first thought was “well, I’m going to have to parse out column names and do a bunch of fancy stuff…” but there’s an easier way that occurs to me now.
I’ll make these improvements and write another article about them in a bit.
Xaprb
17 Sep 06 at 12:25 pm
Roland has written a pure-SQL duplicate index finder and posted it on MySQL Forge. Nice job Roland!
Xaprb
26 Sep 06 at 8:31 am
so
its clear that in a table we have multiple indexes on same column
suyog
3 May 07 at 3:12 am
what is loss of having
multiple indexes on same column?
suyog kale
3 May 07 at 3:13 am
Indexes require space, time and other work to maintain. They slow down writes. You can read more by following the link to Peter Zaitsev’s article at the top of the page.
Xaprb
3 May 07 at 9:16 am
A simple way to find out duplicate indexes…
SELECT * FROM information_schema.statistics WHERE table_schema = ‘?’ GROUP BY table_schema, table_name, column_name HAVING count(column_name) > 1;
In the above query bind the ‘database name’ parameter.
-Bog
Vaibogam S
17 Mar 08 at 10:56 am