Can TokuDB replace partitioning?

I’ve been considering using TokuDB for a large dataset, primarily because of its high compression. The data is append-only, never updated, rarely read, and purged after a configurable time. I use partitions to drop old data a day at a time. It’s much more efficient than deleting rows, and it lets me avoid indexing the data on the time dimension. Partitioning serves as a crude form of indexing, as well as helping purge old data.

» Continue Reading (about 300 words)

Free talk on MySQL and Go at Percona MySQL University DC

If you’re in the Washington, DC area on Sept 12th, be sure to attend Percona University. This is a free 1-day mini-conference to bring developers and system architects up to speed on the latest MySQL products, services and technologies. Some of the topics being covered include Continuent Tungsten; Percona XtraDB Cluster; MySQL Backups in the Real World; MariaDB 10.0; MySQL 5.6 and Percona Server 5.6; Apache Hadoop. I’ll be speaking about using MySQL with Go.

» Continue Reading (about 200 words)

Speaking at Percona University Sept 12th

I’ll be joining Percona for a free day of MySQL education and insight at their upcoming Percona University Washington DC event on September 12th. My topic is accessing MySQL from Google’s Go programming language. I’ve learned a lot about this over the past year or so, and hopefully I can help you get a quick-start. If you’re not familiar with Go, it’s the darling of the Hacker News crowd these days.

» Continue Reading (about 300 words)

Speaking at Strata NYC: Making Big Data Small

I’m off to my first Strata conference, and I’m speaking! I’ve always wanted to attend Strata. (OSCON too, but I haven’t yet made it there.) My session will be about ways to make big data small, in both the storage and processing dimensions, without losing much of the value. If you’re familiar with Bloom Filters, this is an example. Bloom Filters let you answer the question, Is value X a member of this data set?

» Continue Reading (about 200 words)

Using encryption? You're suspicious

Yesterday more details on the NSA’s secret and illegal monitoring activities were revealed. (Yes, the NSA revealed some things themselves, but as far as I can tell, that was only a conciliatory effort and didn’t actually reveal more details – just more talk.) Remember my recent series of blog posts, where I claimed that privacy in today’s world is impossible without trustworthy hardware/software, privacy is impossible unless it’s default, and privacy is essentially unachievable because of the scope of the problem, and the way we’ve built our society and technologies?

» Continue Reading (about 400 words)

Email snooping is a small fraction of the story

I wrote previously about why privacy and security require open-source, inspectable hardware and software to run on, and software that makes encryption the default so everyone uses it. My example application was email, and I concluded that it’s currently impractical to think that we can block government snooping on a large scale even in the domain of email. Now, think what a small fraction of people’s Internet-connected activities we’re talking about: email.

» Continue Reading (about 700 words)

The Ultimate Notebook

If you’re like me, you spend so much time typing on a computer that a good notebook or journal is one of life’s finer pleasures. I’ve kept a diary of my personal life for close to 30 years now, and I have a shelf full of journals. I’ve found a great many that I enjoy writing in, and choosing a different one each time is part of the fun.

But thus far, my quest for a notepad has been unsatisfying. Many notepads have loved me, but I’m sorry to say their love has been unrequited. I’ve tried all the usual things: Moleskine, loose-leaf paper, binders, what have you. But I never found something that is practical, functional, a joy to write on, and a pleasure to look at and hold. I just can’t settle into a long-term relationship with my notebook, because I haven’t met The Right One yet.

Notebook

» Continue Reading (about 4300 words)

Privacy is impossible unless it's the default

This is a follow-up to my last post, in which I asserted that without free software and hardware, privacy is impossible. Suppose we have trustworthy, free hardware and software. What else is needed to thwart efforts to monitor our everyday behavior on a massive scale? Let’s look only at one activity that’s currently being monitored: email. How can we make email less vulnerable to prying eyes? Technology to encrypt email between ordinary citizens (PGP, OpenPGP, and GnuPG) has existed for years, and in a form strong enough to frustrate any known attempts at decryption.

» Continue Reading (about 900 words)

Without free software and hardware, privacy is impossible

The recent revelations about the NSA’s wide-ranging surveillance of Americans and non-Americans alike has spurred a lot of outcry. Of course, some people are crying for legal solutions, but there’s absolutely no chance of any present or future elected official changing or stopping it (it’s already completely illegal and always has been, so more laws can do nothing but poke loopholes in existing laws forbidding surveillance). We’re on a road that leads to only one place: total, absolute government monitoring of everything we do – and thus, to some extent, control of everything we do.

» Continue Reading (about 600 words)

My recipe for more enjoyable presentations

Since I started making my presentations more beautiful, people have often asked me my secret. It’s not a secret, and it’s really quite simple to do. First, realize that it’s not about you. It’s about your audience. Now, get and read a few good books on presentations. Your presentations, and your presentation skills, need to be good. You can’t just make things beautiful to compensate for badness in other areas. This is something I’m always working on.

» Continue Reading (about 400 words)

Eliminating duplicate users in MySQL

This is hypothetical. What would happen if I did the following? alter table mysql.user add unique key(User); I’m tossing this out there for people to think about because I’ve always thought that MySQL’s authentication model is a nuisance: MySQL considers both your host name and user name in identifying you because there is no reason to assume that a given user name belongs to the same person on all hosts. For example, the user joe who connects from office.example.com need not be the same person as the user joe who connects from home.example.com.

» Continue Reading (about 300 words)

Quantifying Abnormal Behavior in System Metrics

I’ve posted slides for my Velocity talk on VividCortex’s blog. The talk explained how we use exponentially weighted moving statistics to generate a meta-metric of abnormality for the time-series metrics measured from MySQL. That’s kind of a mouthful. Maybe you had to be there :-)

» Continue Reading (about 100 words)

Djancocon 2013 call for papers open

Are you a Django user? There’s an upcoming Django conference in Chicago in a few months, and I know they’re looking for speakers with MySQL experience in particular. One suggestion the organizers have floated is a talk on MySQL: I’m looking for someone to give at least one MySQL talk there. In particular, I would love a (friendly but vigorous) “Why you should use MySQL instead of PostgreSQL talk”, as PostgreSQL tends to get a lot of love and attention at Django events, and MySQL not so much.

» Continue Reading (about 200 words)

The moment I first held my newborn daughter in my arms

This is a personal post, not a technical one. We tell ourselves a lot of lies that are not okay. I want to out one of them. It is important to be real, to be true to oneself. This matters. The lie starts something like this: the moment I held my newborn child in my arms, I looked into her tiny face and felt an all-encompassing, pure love. I was breathless.

» Continue Reading (about 700 words)

A great talk on Go concurrency patterns

This 35-minute video from the recent Google I/O conference explains how to use Go’s concurrency primitives – goroutines, channels, and the select statement – to do things elegantly, correctly, and safely in a few lines of Go, which would otherwise turn your brain into a pretzel in most programming languages. My favorite thing about Go is that a good Go program looks self-evident and obvious, even when it may be doing things that would be insanely complex in another language.

» Continue Reading (about 300 words)

Agile project management tools

Wow, talk about an industry that’s overcrowded with look-alike me-too products. Online agile project management tools are a dime a dozen, which makes me think that they are probably all very similar and probably don’t solve most people’s needs. I’ve observed that when this is true, nearly-indistinguishable tools get reinvented, until the burden of evaluating the options is greater than the burden of just building yet another one, thus perpetuating the cycle.

» Continue Reading (about 200 words)

Why building a free service can be a disservice

Like many others, I don’t think that RSS is dead. It’s my favorite way to keep up with highly valuable content on the Web. So I’m in the market for a replacement for Google Reader, along with millions of others. As I’ve evaluated options, I’ve had to eliminate some of them because I’m not sure they’re serious about what they’re doing. This post is about my thought process and why I think entrepreneurs should challenge themselves to get serious, and signal that intent, by not building free services.

» Continue Reading (about 1500 words)

What's the lesson from daily deals sites?

I found myself in a, ahem, lively discussion with someone recently. It started when I said “there was always something wrong about the daily deals businesses (i.e. Groupon), but I’m sure they’ll teach us what’s really needed.” Turns out this person ran a local daily-deals site. Oops. My feeling is that anytime something doesn’t take root and grow into a lasting business, there’s a lesson to learn. Early social-networking sites weren’t quite a match with needs.

» Continue Reading (about 1000 words)

The difference between concurrency and parallelism

This confuses lots of people, including most recently Todd Hoff of HighScalability fame, who wrote in last week’s summary post, Have to say, this distinction has never made sense to me: Concurrency is not parallelism: concurrency is the composition of independently executing processes, while parallelism is the simultaneous execution of (possibly related) computations. Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once.

» Continue Reading (about 200 words)

What TokuDB might mean for MongoDB

Last week Tokutek announced that they’re open-sourcing their TokuDB storage engine for MySQL. If you’re not familiar with TokuDB, it’s an ACID-compliant storage engine with a high-performance index technology known as fractal tree indexing. Fractal trees have a number of nice characteristics, but perhaps the most interesting is that they deliver consistently high performance under varying conditions, such as when data grows much larger than memory or is updated frequently. B-tree indexes tend to get fragmented over time, and exhibit a performance cliff when data doesn’t fit in memory anymore.

» Continue Reading (about 1700 words)