Making changes to many tables at once
As an alternative to another recent blog post that answered the question “how can I truncate all the tables in my database,” I thought I’d show another way to do it, which does not use the INFORMATION_SCHEMA.
$ wget http://www.maatkit.org/get/mk-find
$ perl mk-find --exec 'TRUNCATE TABLE %D.%N'
The other example is how to alter MyISAM tables to be InnoDB. That one’s easy, too. Let’s alter all MyISAM tables in the ‘test’ database:
$ wget http://www.maatkit.org/get/mk-find
$ perl mk-find test --engine MyISAM --exec 'ALTER TABLE %D.%N ENGINE=InnoDB'
If you want to print out the commands instead of executing them, you can just use –printf instead of –exec.
Why would you do it this way instead of through the INFORMATION_SCHEMA database? I don’t think this can be said too often: querying the INFORMATION_SCHEMA database on MySQL can completely lock a busy server for a long time. It can even crash it. It is very dangerous. So whenever I mention it, I mention the dangers of using it. I use it too sometimes, but only when I know the server I’m working on.



Another way to do it: find the .frm files in the directory, cut off the .frm extension, and call mysql -e from commandline….all with a simple shell script.
Sheeri K. Cabral
29 Oct 09 at 7:27 pm
BTW, I didn’t see a comment from you on Trent’s post, so I put one there about
But why would someone be truncating (or doing any action) all the tables on a *very busy* server? The caveat is good, but you didn’t put it where folks seeing Trent’s post would see it…please make sure to do that.
Along those lines, is there any open bug in Ma’atkit changing mk-table-checksum not to use the INFORMATION_SCHEMA to find all the tables?
Because that took forever on one of our client’s servers, and we had to write a wrapper script and call mk-table-checksum on one database at a time because of it.
I know you feel strongly about being able to checksum data, so I’m sure this is in the works, but I wanted to check, since I didn’t recall seeing a bug about it, but I haven’t looked recently (and I could be wrong, there could be a bug report already about that).
Sheeri K. Cabral
29 Oct 09 at 7:39 pm
Hi Sheeri,
AFAIK mk-table-checksum doesn’t touch INFORMATION_SCHEMA, can you elaborate? It does not touch it with any queries etc, and all the code that iterates over databases excludes it (and lost+found too). So if there’s a way it’s messing with it, we must stop that.
Xaprb
29 Oct 09 at 8:31 pm
Hi!
I have a lot (aprox. 1500) databases on my server. From time to time I have to alter one specific table from all databases (add coolumns, for example).
Currently we achieve this by running a simple script to print the “Alter table” command for all databases and we run the output of this script when the server is not too busy.
My question is: if I use this tool (mk-find) to do this job, does it lock all the tables while it runs?
Celso Endo
30 Oct 09 at 10:50 am
Baron — it uses SHOW commands, which can use the same code path as INFORMATION_SCHEMA.
We had the actual experience that trying to run mk-table-checksum (using chunking) on a busy server with thousands of tables was working for over an hour just on getting database and table names.
SHOW DATABASES
and
SHOW FULL TABLES FROM db
are both used….
Sheeri K. Cabral
30 Oct 09 at 11:12 am
Celso — ALTER TABLE is going to lock each table as it runs, there’s no getting around that.
Sheeri K. Cabral
30 Oct 09 at 11:15 am
Sheeri, SHOW DATABASES and SHOW TABLES shouldn’t lock the server the way a query to INFORMATION_SCHEMA.TABLES does, afaik. However, what the tool does is pretty brain dead from the user point of view: it collects all the information it needs before it starts to do anything. It should gather one table, work on it, get the next, etc. This is an open issue that we’ve started work on. [I believe] it is simply a user experience problem, not a server load problem. However I am wholly willing to be wrong — unfortunately, if I am wrong, I don’t see a thing that can be done about it.
(In some cases it’ll also use SHOW TABLE STATUS a table at a time, which I’d also like to avoid where possible. Some of the features require it, though. INFORMATION_SCHEMA.TABLES basically does SHOW TABLE STATUS for all tables in all databases, apparently with a frickin’ global mutex to boot. That is the real killer.)
Xaprb
30 Oct 09 at 1:43 pm
However, what the tool does is pretty brain dead from the user point of view: it collects all the information it needs before it starts to do anything. It should gather one table, work on it, get the next, etc.
Which is it — does it collect all the information first, then start work, or does it get the information for one table, then works on it, then gets the information for another?
It’s still a problem that it takes a long time on a busy server.
it is simply a user experience problem, not a server load problem.
Um, what does that mean? Yes, I experienced that mk-table-checksum took a very long time going through each database in SHOW DATABASES and running SHOW TABLES.
That’s why I mentioned the solution of using “ls” in data directories — you can easily do
ls $DATADIR/*.frm
and find all the tables (the same way MySQL does!). It doesn’t replace every single use case for INFORMATION_SCHEMA, and you have to have the right filesystem access, but doing an “ls” takes even less time than mk-table-checksum’s way of using SHOW DATABASES and SHOW TABLES.
I see that’s how mk-find works too, so that would take a long time on servers with thousands of tables, too. :(
Sheeri K. Cabral
30 Oct 09 at 1:52 pm
Sorry for being unclear. It currently collects everything and calculates chunks, etc etc — and then starts. It should do a table at a time.
Did you see SHOW TABLES itself running very long? I know I’ve seen that running long too, but have never seen it blocking anything else that’s going on. In directories with tens of thousands of files, even “ls” will run slowly depending on the filesystem.
Running a filesystem command is a non-starter, there are way too many scenarios where that is not possible.
Xaprb
30 Oct 09 at 2:41 pm
I’ve elaborated at http://www.pythian.com/news/8633/how-to-tell-when-using-information_schema-might-crash-your-database/.
Baron — you say that SHOW TABLES shouldn’t lock the way an I_S query does, and hopefully this gives more information — underlying, SHOW TABLES looks at the data and frm files, so an optimal I_S query won’t look at any more data than a SHOW TABLES query does.
Sheeri Cabral
16 Feb 10 at 5:58 pm
AFAIK, SHOW TABLES can take a long time for InnoDB tables when it samples statistics and I assume the need to sample depends on whether a handler instance is created for a table or can be found in the cache. There was a bug a few years back where ha_innobase::info did much more work than needed in one case, so maybe it is less of a problem.
And maybe I can be less ambiguous about this in the future.
Mark Callaghan
16 Feb 10 at 7:27 pm
I just read http://dev.mysql.com/doc/refman/5.1/en/information-schema-optimization.html. This will be a huge source of problems. I am disappointed that MySQL would publish a feature like this that requires me to know so much trivia to avoid performance problems.
Mark Callaghan
16 Feb 10 at 11:46 pm
Mark – I can’t disagree that it’s annoying to have to learn more. The alternative is to live in fear of INFORMATION_SCHEMA and not use it because it might crash the server. If that’s your choice, you can do what Baron does and avoid I_S altogether.
Also it is unclear to me whether what was added in 5.1.21 is just the “extra” values so we can see what is going on, or if the possibility of granularity itself was added. I.e. It may be that the ability to skip opening any files was new in 5.1.21, in which case I am definitely glad to have that new feature.
It’s not a big deal to me because instead of saying OMG AVOID I_S!!!! the rules are the same as with any query: don’t SELECY what you don’t need, and EXPLAIN your queries so you don’t get yourself into unexpected trouble.
Sheeri
17 Feb 10 at 2:26 am
SHOW TABLES is not a huge source of trouble in most cases, because it basically lists the directory and trims off .frm and shows the result. That is slow in very few cases. What’s incredibly slow on an increasing number of servers these days is SHOW TABLE STATUS, which lists the directory, iterates the .frm files, creates a handler for each table, and calls ::info() on each handler. And that is also what INFORMATION_SCHEMA.TABLES does — not the same thing as SHOW TABLES.
Xaprb
18 Feb 10 at 10:33 pm
At what number of tables does this become a problem? 10,000? 100,000? 1,000,000?
Mark Callaghan
18 Feb 10 at 10:36 pm
I just read the link Sheeri listed. I’m dismayed, too. I wish the data dictionary were stored in InnoDB. I know, Drizzle.
Xaprb
18 Feb 10 at 10:39 pm
In my opinion SHOW TABLES is a problem when the filesystem starts to have a problem, which is fs-dependent but I don’t think 10k is a huge problem. SHOW TABLE STATUS depends both on the number of tables and the size of data, in case InnoDB is used and ::info() is called and statistics are re-sampled (which in my experience they usually seem to be, but I haven’t tried to prove that). Most of what I’ve said on this topic is centered around InnoDB. I noticed that the manual page about I_S optimizations was not overly InnoDB-conscious.
Xaprb
18 Feb 10 at 10:47 pm
I just noticed queries like this in the slow query log. I use 5.0, so the IS query optimization chapter don’t apply. This server has ~10,000 tables, so I assume this query requires ~10,000 open table handler instances.
# Query_time: 6 Lock_time: 0 Rows_sent: 313 Rows_examined: 313
use foobar;
select TABLE_SCHEMA,TABLE_NAME,AVG_ROW_LENGTH,DATA_LENGTH,
TABLE_ROWS,INDEX_LENGTH
from information_schema.tables
where table_schema=database();
Mark Callaghan
21 Feb 10 at 1:25 pm