Xaprb

Stay curious!

An opportunity to participate in MySQL research

with 12 comments

I’m researching algorithms for automatic fault detection in MySQL (see my previous post for context). I need real-world data samples to test the algorithm. Can you help by sending me a bit of data from your production server?

The end goal is an open-source tool that will be a standard part of a typical MySQL installation. The problem I’m trying to solve for all MySQL users is this: something went wrong, what was it? Most of the time there’s no way to answer that; you have to set up a set of tools and hope you capture enough information to diagnose the problem next time. We need a tool that just runs all the time even when you don’t think anything is going to go wrong.

You can help build this tool. I need samples from a wide variety of healthy and sick servers, both heavily and lightly loaded. I need samples that are between a few hours and a week or so long. Here is a script that will gather what I need:

$ mysqladmin ext -i1 | awk '
	/Queries/{q=$4-qp;qp=$4}
	/Threads_connected/{tc=$4}
	/Threads_running/{print q, tc, $4}'

The output should look something like this:

2147483647   136     7
  798   136     7
  767   134     9
  828   134     7
  683   134     7
  784   135     7
  614   134     7
  108   134    24
  187   134    31
  179   134    28
 1179   134     7
 1151   134     7
 1240   135     7
 1000   135     7

Please save this output to a file, and contact me at moc.anocrep@norab (reversed) if you would like to offer a dataset for us to test on. If you need any help setting up the data collection, you can use the same email. I’d also appreciate if you’d help spread the word about this via Twitter or other means. Thanks very much!

Further Reading:

Written by Xaprb

December 6th, 2011 at 6:44 pm

Posted in SQL

12 Responses to 'An opportunity to participate in MySQL research'

Subscribe to comments with RSS

  1. Yessir!
    Logging started.

    Roy

    6 Dec 11 at 7:11 pm

  2. Sir how do I save the output in a file

    >
    >>
    | tee log
    &> log

    No one works…

    Roy

    6 Dec 11 at 7:30 pm

  3. One solution for checking if everything is on track is to use the concept of a Watchdog. Have a look at Jack Ganssle (2004): Great Watchdogs (especially the section WDTs for Multitasking is worth reading).

    Jens

    7 Dec 11 at 3:35 am

  4. Count me in

    Mark Callaghan

    7 Dec 11 at 8:17 pm

  5. Would it also help you to get data from a server that is not used too much but has queries from time to time? Such as the server of a blog with around 5000 called pages a day?

    This would look more or less like this:

    1 4 1
    1 4 1
    1 4 1
    29 5 1
    56 5 1
    41 4 1
    1 4 1
    1 4 1
    1 4 1
    1 4 1
    1 4 1
    1 4 1
    17 4 1
    1 4 1
    1 4 1
    1 4 1
    1 4 1
    1 4 1
    1 4 1
    36 7 1
    35 4 1
    1 4 1
    11 4 1
    1 4 1
    24 4 1
    60 4 1
    1 4 1
    1 4 1
    1 4 1
    1 4 1

    That might be another extreme for your algorithm ;) Interested?

    Cheers,
    Uli

    Uli

    9 Dec 11 at 5:12 am

  6. Uli, yes that would be good, thanks.

    Xaprb

    9 Dec 11 at 4:10 pm

  7. No Problem, will log this.

    Uli

    9 Dec 11 at 4:32 pm

  8. OK to log, but how ?
    I use only ssh to log, and i can’t left the shell open…

    Roy

    9 Dec 11 at 6:18 pm

  9. See http://www.xaprb.com/blog/2008/08/01/how-to-leave-a-program-running-after-you-log-out/

    I’d use screen, myself. It is one of the most important tools ever invented for Unix. I can’t live without it.

    Xaprb

    9 Dec 11 at 6:42 pm

  10. Thank YOU!
    Now i’m logging data (from filepremier.com)
    here a sample

    480 5 2
    535 5 2
    451 3 2
    362 3 3
    573 3 2
    524 4 2
    506 3 3
    405 4 2
    465 5 2
    549 3 2
    489 3 2
    565 5 3
    719 2 2
    465 3 2
    485 5 2
    380 2 2
    378 3 2
    429 9 2
    442 2 2
    438 2 2
    639 5 2
    456 2 2
    455 4 3
    518 5 2
    552 3 2
    359 3 2
    459 6 2
    689 7 3
    557 5 2
    464 7 2
    501 9 2
    569 11 3
    496 13 2
    538 10 2
    457 9 2
    448 12 3
    709 5 2
    394 5 2

    Roy

    9 Dec 11 at 7:06 pm

  11. For those wondering why nothing is logged with ‘>’ or ‘| tee’ – awk buffers its output and will print it only after mysqladmin ends.
    You could use mysqladmin’s ‘–count’ switch and wait till it ends or, if you’re hit by this bug[1] (like me) you may want to force printing awk’s output with fflush() or system()[2]:

    mysqladmin ext -i1 | awk ‘/Queries/{q=$4-qp;qp=$4} /Threads_connected/{tc=$4} /Threads_running/{print q, tc, $4; fflush()}’ | tee output

    [1] http://bugs.mysql.com/bug.php?id=58221
    [2] http://www.gnu.org/s/gawk/manual/html_node/I_002fO-Functions.html

    Karol Kwiatkowski

    12 Dec 11 at 4:19 am

  12. Thank you all for participating! I got a great response, with some large hosting providers sending me literally gigabytes of data. The diversity of data I’ve gotten will really help make this tool much better. When Daniel is back from vacation we will start developing the algorithm.

    Xaprb

    5 Jan 12 at 4:52 pm

Leave a Reply