An opportunity to participate in MySQL research
I’m researching algorithms for automatic fault detection in MySQL (see my previous post for context). I need real-world data samples to test the algorithm. Can you help by sending me a bit of data from your production server?
The end goal is an open-source tool that will be a standard part of a typical MySQL installation. The problem I’m trying to solve for all MySQL users is this: something went wrong, what was it? Most of the time there’s no way to answer that; you have to set up a set of tools and hope you capture enough information to diagnose the problem next time. We need a tool that just runs all the time even when you don’t think anything is going to go wrong.
You can help build this tool. I need samples from a wide variety of healthy and sick servers, both heavily and lightly loaded. I need samples that are between a few hours and a week or so long. Here is a script that will gather what I need:
$ mysqladmin ext -i1 | awk '
/Queries/{q=$4-qp;qp=$4}
/Threads_connected/{tc=$4}
/Threads_running/{print q, tc, $4}'
The output should look something like this:
2147483647 136 7
798 136 7
767 134 9
828 134 7
683 134 7
784 135 7
614 134 7
108 134 24
187 134 31
179 134 28
1179 134 7
1151 134 7
1240 135 7
1000 135 7
Please save this output to a file, and contact me at moc.anocrep@norab (reversed) if you would like to offer a dataset for us to test on. If you need any help setting up the data collection, you can use the same email. I’d also appreciate if you’d help spread the word about this via Twitter or other means. Thanks very much!
Further Reading:






Yessir!
Logging started.
Roy
6 Dec 11 at 7:11 pm
Sir how do I save the output in a file
>
>>
| tee log
&> log
No one works…
Roy
6 Dec 11 at 7:30 pm
One solution for checking if everything is on track is to use the concept of a Watchdog. Have a look at Jack Ganssle (2004): Great Watchdogs (especially the section WDTs for Multitasking is worth reading).
Jens
7 Dec 11 at 3:35 am
Count me in
Mark Callaghan
7 Dec 11 at 8:17 pm
Would it also help you to get data from a server that is not used too much but has queries from time to time? Such as the server of a blog with around 5000 called pages a day?
This would look more or less like this:
1 4 1
1 4 1
1 4 1
29 5 1
56 5 1
41 4 1
1 4 1
1 4 1
1 4 1
1 4 1
1 4 1
1 4 1
17 4 1
1 4 1
1 4 1
1 4 1
1 4 1
1 4 1
1 4 1
36 7 1
35 4 1
1 4 1
11 4 1
1 4 1
24 4 1
60 4 1
1 4 1
1 4 1
1 4 1
1 4 1
That might be another extreme for your algorithm ;) Interested?
Cheers,
Uli
Uli
9 Dec 11 at 5:12 am
Uli, yes that would be good, thanks.
Xaprb
9 Dec 11 at 4:10 pm
No Problem, will log this.
Uli
9 Dec 11 at 4:32 pm
OK to log, but how ?
I use only ssh to log, and i can’t left the shell open…
Roy
9 Dec 11 at 6:18 pm
See http://www.xaprb.com/blog/2008/08/01/how-to-leave-a-program-running-after-you-log-out/
I’d use screen, myself. It is one of the most important tools ever invented for Unix. I can’t live without it.
Xaprb
9 Dec 11 at 6:42 pm
Thank YOU!
Now i’m logging data (from filepremier.com)
here a sample
480 5 2
535 5 2
451 3 2
362 3 3
573 3 2
524 4 2
506 3 3
405 4 2
465 5 2
549 3 2
489 3 2
565 5 3
719 2 2
465 3 2
485 5 2
380 2 2
378 3 2
429 9 2
442 2 2
438 2 2
639 5 2
456 2 2
455 4 3
518 5 2
552 3 2
359 3 2
459 6 2
689 7 3
557 5 2
464 7 2
501 9 2
569 11 3
496 13 2
538 10 2
457 9 2
448 12 3
709 5 2
394 5 2
Roy
9 Dec 11 at 7:06 pm
For those wondering why nothing is logged with ‘>’ or ‘| tee’ – awk buffers its output and will print it only after mysqladmin ends.
You could use mysqladmin’s ‘–count’ switch and wait till it ends or, if you’re hit by this bug[1] (like me) you may want to force printing awk’s output with fflush() or system()[2]:
mysqladmin ext -i1 | awk ‘/Queries/{q=$4-qp;qp=$4} /Threads_connected/{tc=$4} /Threads_running/{print q, tc, $4; fflush()}’ | tee output
[1] http://bugs.mysql.com/bug.php?id=58221
[2] http://www.gnu.org/s/gawk/manual/html_node/I_002fO-Functions.html
Karol Kwiatkowski
12 Dec 11 at 4:19 am
Thank you all for participating! I got a great response, with some large hosting providers sending me literally gigabytes of data. The diversity of data I’ve gotten will really help make this tool much better. When Daniel is back from vacation we will start developing the algorithm.
Xaprb
5 Jan 12 at 4:52 pm