Xaprb

Stay curious!

Brian Aker: 20GB doesn’t fit on a single server

with 20 comments

Brian got interviewed by O’Relly recently, and part of it quoted him as saying this:

When everything doesn’t fit onto a computer, you have to be able to migrate data to multiple nodes. You need some sort of scaling solution there… MapReduce works as a solution when your queries are operating over a lot of data; Google sizes of data. Few companies have Google-sized datasets though. The average sites you see, they’re 10-20 gigs of data.

Users shouldn’t need to put that data onto multiple machines anyway. In fact, I don’t think we need a multi-machine solution for the common case at all. We need software that can scale up with today’s hardware. 37signals likes to run boxes with half a terabyte of RAM. Are we there yet with MySQL and InnoDB? No. Postgres? No. Anything open-source? Not that I know of. We’ve got database software that can only do a fraction of what it should be able to on that size of server.

I think we have to be clear about the use case for a solution that partitions data across multiple machines. It isn’t 20GB of data, and in my opinion it shouldn’t even be half a terabyte. I think that in the ideal world, we should be thinking about that for terabytes and larger — and in a few years, single-server datasets should be even larger.

I say should because today’s database software obviously has a lot of catching up to do.

Written by Xaprb

April 10th, 2010 at 10:07 am

Posted in Commentary, SQL

Tagged with

20 Responses to 'Brian Aker: 20GB doesn’t fit on a single server'

Subscribe to comments with RSS

  1. Hey Brian, great practical post that hopefully saves a few people from overcomplicating their lives.

    Out of curiosity, if not MySQL or PostgreSQL, what is 37signals using? Oracle? DB2? Or is your point that they’re using one of the opensource solutions and just getting less than optimal performance for that much hardware?

    Mike

    10 Apr 10 at 11:13 am

  2. We have seen MySQL Cluster installations with 256GB per data node, I don’t see any reason for it not to work with half a TB (I know, not InnoDB, but it is out there).

    LinuxJedi

    10 Apr 10 at 11:35 am

  3. 37signals use MySQL + InnoDB. What I meant is that they’re not getting full advantage of that hardware.

    NDB Cluster is a different beast :-) How big does it really scale per data node? Has anyone found a limit where it starts to fall down? In your opinion, does it really get 100% of the performance it could on bigger hardware? Multi-threading is relatively new in NDB, how good is it?

    Xaprb

    10 Apr 10 at 11:40 am

  4. To be honest we haven’t tried beyond 256MB per node for in memory tables (no idea with disk tables, I believe upwards of 256GB is in production for that too). It did struggle with allocation with large amounts of RAM, causing watchdog timeouts but we fixed that (allocation now happens in a separate thread and some other improvements).

    Benchmarks are a bit thin on the ground, if I had more time I would do some benchmarks to find out the answers too this. I can’t think of a reason why there would be any performance problems at least. Jonas has done some benchmarks recently and knows a lot about the memory handling, he may be better equipped to answer. He will be at the UC (as will I) so it may be good to chat with him there.

    Multi-threaded data nodes did have problems to start with in early 6.4/7.0 releases but they seem to be pretty stable now and we have large customers using them in production. It would be nice if the TC blocks were multi-threaded as well as the LQH blocks but it is a good first-step and in general gives a good performance improvement on a busy cluster.

    LinuxJedi

    10 Apr 10 at 12:08 pm

  5. Doh! I mean 256GB in that first sentence ;)

    LinuxJedi

    10 Apr 10 at 12:09 pm

  6. Hi!

    20gb easily fits on to a single machine, that was my point. You don’t multiple computers for that (and you obviously don’t need the headache of managing multiple machines for that).

    Cheers,
    -Brian

    Brian Aker

    10 Apr 10 at 12:48 pm

  7. What would be the use cases for multiple data storage nodes in a cluster?
    High availability?
    High write throughput?
    High read throughput?
    Silos for paranoid data security teams?
    Job security?

    MikeD

    10 Apr 10 at 2:27 pm

  8. Hi MikeD,

    With only one data node there is at the very least a risk of a small amount of data loss (up to 2 seconds of data) if that node fails for any reason. This is because the REDO data is committed to disk every 2 seconds using a GCP (Global CheckPoint).

    But as long as you are doing equality primary key lookups then yes, higher read/write throughput. Although ordered indexed scans will actually start to perform slightly worse as you add nodes.

    Also adding nodes will give HA. If a server bursts into flames you will have no downtime ;)

    LinuxJedi

    10 Apr 10 at 2:55 pm

  9. Hi Baron!

    Actually, I think Brian was making the point that 10-20 GB (a typical website size) doesn’t need to be spread across multiple nodes…

    -jay

    Jay Pipes

    10 Apr 10 at 8:46 pm

  10. I’ve for several years seen this as the primary goal for future MySQL development. So thanks for mentioning it!

    What are in your experience the most important places to improve? Multi-core scalability? I/O scalability? Multi-threaded slave? Buffer pool management? Do we need new storage engines with different performance characteristics? Or something else?

    Kristian Nielsen

    11 Apr 10 at 1:40 am

  11. Brian, you might want to ask O’Reilly to edit that article. It is confusing as hell. I think a reasonable person can read it as “10-20 gig datasets need to be distributed, and MapReduce makes no sense for distributing that amount of data.”

    Xaprb

    11 Apr 10 at 5:21 am

  12. So, I’m unreasonable now? ;P

    Jay Pipes

    11 Apr 10 at 12:07 pm

  13. Dude, I don’t see how to interpret that article in a way that agrees with either you or Brian. I’m sure we’re all reasonable but there is something about it that I have tunnel vision on.

    Xaprb

    11 Apr 10 at 1:46 pm

  14. Hehe, no worries, mate! We’ll have to have a cage-match showdown at the UC. BTW, you here yet? Stewart and I are hanging out at the Hilton lounge…

    Jay Pipes

    11 Apr 10 at 2:09 pm

  15. You and me against Monty and Josh Berkus, or however that goes. I’ll be there at 4 if I get on this flight.

    Xaprb

    11 Apr 10 at 2:32 pm

  16. see you soon :)

    /me goes off to hit the weights for the cage match.

    Jay Pipes

    11 Apr 10 at 2:35 pm

  17. PS, note: I know Brian is smart, and knows that 20gb fits onto a computer, which is why I said the interview “quoted him as saying” instead of directly attributing it to him :-)

    Xaprb

    11 Apr 10 at 2:36 pm

  18. A few years ago, I came up with an idea here at work, and led a team of 3 (myself included) to develop a distributed processing system in SQL Server. From design to implementation was 9 months. It’s currently one of our most important systems. We get about 6 million new records per day, and run about 300 SQL statements on each of them on average. I looked into existing solutions like Map Reduce, and found that for what most database systems need, they aren’t well suited. I’ll qualify that by stating that they weren’t well suited based on my understanding of them.

    Our processing requirements required doing lookups on many other tables, both small and large. We weren’t lucky enough to do simple aggregates (this word is on that webpage 9 times). A lot of the lookups are slightly complex (get the newest lookup record that was in place on this date at this time, etc.).

    I would agree that 20G of data isn’t nearly enough to require distributing it, unless you are expecting explosive growth in the near future. The worst time to switch do a larger/better solution is when you “need” do. I’d always prefer doing that ahead of the data flood when possible.

    Bill

    12 Apr 10 at 10:12 am

  19. I agree with Barron. I had to re-read that article a few times. Its pretty clear to me that it implies Brian doesn’t think it can handle 20 GB. I was left to conclude that the author misquoted/misrepresented what he was saying, or there is an interesting use case ( maybe heavy write /update use case) where even 20 GB doesn’t work on a single box.

    William

    12 Apr 10 at 1:12 pm

  20. Just a counter point, we have servers running postgres on boxes with 32 cores and 512GB of ram. From our experience, it scales pretty well on that size hardware; by far our largest problems are (still) disk related. Now, if your argument is that you could scale so much better on Oracle on that kind of hardware, I’m not going to argue that point, but Postgres on that kind of hardware would do the job for 99% of the people out there.

    Robert Treat

    16 Apr 10 at 12:40 pm

Leave a Reply