Tag Archive for 'sphinx'

Sphinx 0.9.8 is released!

The Sphinx project just released version 0.9.8, with many enhancements since the previous release. There’s never been a better time to try it out. It’s really cool technology.

What is Sphinx? Glad you asked. It’s fast, efficient, scalable, relevant full-text searching and a heck of a lot more. In fact, Sphinx complements MySQL for a lot of non-search queries that MySQL frankly isn’t very good at, including WHERE clauses on low-selectivity columns, ORDER BY with a LIMIT and OFFSET, and GROUP BY. A lot of you are probably running fairly simple queries with these constructs and getting really bad performance in MySQL. I see it a lot when I’m working with clients, and there’s often not much room for optimization. Sphinx can execute a subset of such queries very efficiently, due to its smart I/O algorithms and the way it uses memory. By “subset” I mean you don’t get the full complexity of SQL, but you get enough functionality for lots of the poorly-performing queries I see in the wild. It’s a 95% solution.

Is Sphinx for you? Good question. You can find answers in Appendix C in High Performance MySQL. And yes, that is why I wrote this blog post — to put in a plug for the book. *grin* But before I go, let me put in another plug for Sphinx: go vote for it on Sourceforge! If it’s voted as one of the Community Choice projects of the year, that will be fantastic.

Technorati Tags:, ,

You might also like:

  1. Progress on High Performance MySQL, Second Edition

MySQL Conference and Expo 2008, Day One

Today is the first day at the conference (aside from the tutorials, which were yesterday). Here’s what I went to:

New Subquery Optimizations in 6.0

By Sergey Petrunia. This was a similar session to one I went to last year. MySQL has a few cases where subqueries are badly optimized, and this session went into the details of how this is being addressed in MySQL 6.0. There are several new optimization techniques for all types of subqueries, such as inside-out subqueries, materialization, and converting to joins. The optimizations apply to scalar subqueries and subqueries in the FROM clause. Performance results are very good, depending on which data you choose to illustrate. The overall point is that the worst-case subquery nastiness should be resolved. I’m speaking of WHERE NOT IN(SELECT…) and friends. It remains to be seen how this shakes out as 6.0 matures, and what edge cases will pop up.

The Lost Art Of the Self Join

This was just great. Among many other things, Beat Vontobel showed how a Su Doku can be solved entirely with declarative queries: a very large self-join query against a table of digits and a table of the board’s initial state. I had been promoting this session because last year’s was so very good. I can’t wait to see what he comes up with for next year. Can he find another creative idea? Time will tell.

He wasn’t able to solve a 9×9 puzzle with MySQL because of the limitation on the number of joins, but PostgreSQL had no trouble doing it.

EXPLAIN Demystified

This was my session, of course. (Slides will be on the O’Reilly conference site, if they aren’t already). It went great, I thought. The room was full and people were standing in the back of the room and in the door. The questions came fast and furious; all really good questions. I think we ended up exploring a lot of the MySQL query execution method, strengths, and weaknesses by the time we were through. And I gave away all the remaining Maatkit t-shirts. Hopefully the people who took them will wear them tomorrow and the conference will be sea of deep, rich red shirts.

Someone did an audio recording of the session, but I don’t recall who it was.

Investigating InnoDB Scalability Limits

This session was given by Peter Zaitsev (disclosure: I now work for Percona, the company he co-founded). Peter and Vadim Tkachenko spent a lot of time over the last weeks and months running a dizzying array of benchmarks on MySQL 5.0.22, 5.0.51, and 5.1.24 (if I recall the versions correctly). They were able to show InnoDB’s scaling patterns for a number of different micro-benchmarks on a variety of configurations. If you didn’t attend, please look up the slides if you care about InnoDB performance. A lot of work went into the benchmarks — a lot of work. The slides should be on the conference website or on our blog, http://www.mysqlperformanceblog.com/.

Replication Tricks and Tips

Lars Thalmann and Mats Kindahl gave this session. At a high level, I’d say it was a run-down of all the different ways you can use MySQL replication. Replication is really a flexible tool, and they covered a large array of the most important ways you can use it to achieve different purposes. Many of the techniques they mentioned are implemented by various tools in Maatkit. A couple of the others are implemented in MySQL Master Master Manager and MySQL Semi Multi-Master tools. Don’t re-code these! You can save weeks of work and get quality code by using the pre-built tools. (I built Maatkit, so I know exactly how tricky it is to get some of these things right.)

BoF Sessions

I dropped in on a few BoF sessions, including the Sphinx one and the PBXT/Blob Streaming one. (Keep an eye on the PrimeBase folks — they are up to great things.) Ronald Bradford protected me from those who wanted to get me drunk. Hint: it’s really easy… I have to say, though, Monty’s black vodka was amazing.

Speaking of Blob Streaming, Paul McCullagh and I were talking earlier in the day about the project’s name, MyBS. This has been smirked about a few times. I think it’s a great name, because after all my initials are BS (I usually insert one of my four middle names in to alleviate this problem, but I digress). The conversation went like this:

Me: I like it. My initials are BS.

Paul: BS actually means British Standard, so it can’t be bad.

Me: Better than American Standard. That’s a toilet.

We also debated the merits of watching the original move The Blob. It’s a classic. It must be good.

Technorati Tags:, , , , , , , , , , , , , , ,

You might also like:

  1. Like it or not, it is the MySQL Conference and Expo
  2. Speed up your MySQL replication slaves
  3. MySQL Conference and Expo 2008, Day Three
  4. Baron Schwartz on a podcast at MySQL Conference and Expo 2008
  5. MySQL Conference and Expo 2008, Day Two

Progress on High Performance MySQL, Second Edition

It’s been a while since I said anything about the progress on the book. That doesn’t mean we are not still working on it, though.

As Peter wrote a while ago, he is basically wearing the hat of a very advanced technical reviewer at this point. We’ve finished writing all the chapters from his detailed outlines. He has worked through about half the chapters, and I’m continuing to spend my evenings and weekends and holidays (yes, nearly all my free time — just ask my wife!) writing some new material (an appendix on EXPLAIN, for example), finishing unfinished things marked with TODO in the text, and revising chapters after Peter reviews them. Vadim is working on benchmarks. For example, he just finished some benchmarks for something I profiled with SHOW STATUS. I thought that would be good enough to assert something about the performance. Sure enough, SHOW STATUS says it does less work, but Vadim’s benchmarks show it’s slower :-) This is why we check each other’s work!

The core chapters on MySQL performance — beginning with Benchmarking and Profiling, and continuing through Optimizing Server Settings — are the ones Andy Oram, our editor, thinks we should put the most effort into, and I agree. We will probably circle back and go through another review/edit cycle before we release them for technical review. Some of the other chapters, such as Replication, are already out for technical review.

Despite the fact that all of the chapters and appendixes are theoretically a “first draft,” as of several weeks ago, there is still a lot of work to do. Depending on the chapter, it takes me a solid weekend to revise a chapter after Peter reviews it. Each little thing anyone points out (does MySQL version X really do Y by default?) requires some research, testing, benchmarks, or even reading the source code.

Some miscellanea:

  • The production staff replied to my inquiry to the editor to say that yes, we will be able to have references that point to a specific page number. This was a big relief to me. It requires extra work, but makes the book so much more valuable as a reference work in my opinion. To see why, look at the top of page 151 in the first section, which just refers to chapters and sections by their titles: “See… the “Tools” section…” Now try to find the “Tools” section. If it took you a while… well, the first time I did it, I missed it, and thought it might mean the Tools Chapter. The second edition will say “The X section on page Y” or similar. (Okay, I’ll shut up about this now — everyone has to have a pet peeve, eh?)
  • We are currently at 425 pages in OpenOffice.org Writer, which by my calculations puts us around 470 pages in print. As I said before, I think we’ll break 500 pages by the time we finish the rest of the missing material.
  • Andrew Aksyonoff has contributed an appendix on the Sphinx full-text search system. If you don’t know anything about it, check it out. It’s an amazing piece of software that does a lot more than just full-text search.

Well, I’ve run out of my allotted thirty minutes of blogging! Back to the salt mines! Just kidding… I’m actually off to the climbing gym soon to get my mind off it.

Technorati Tags:, , , , , ,

You might also like:

  1. Progress report on High Performance MySQL, Second Edition
  2. High Performance MySQL, Second Edition: Backup and Recovery
  3. More progress on High Performance MySQL, Second Edition
  4. Progress on High Performance MySQL Backup and Recovery chapter
  5. Organizing High Performance MySQL, 2nd Edition