I’m off to my first Strata conference, and I’m speaking! I’ve always wanted to attend Strata. (OSCON too, but I haven’t yet made it there.)
My session will be about ways to make big data small, in both the storage and processing dimensions, without losing much of the value.
If you’re familiar with Bloom Filters, this is an example. Bloom Filters let you answer the question,
Is value X a member of this data set? Yes, or no?
by substituting the question,
Is value X a member of this data set? Probably yes, or definitely no?
You lose a small and quantifiable amount of precision in the “yes,” and you gain massive savings in storage and processing cost. Bloom Filters are typically used when you need a definite answer, but only as a pre-filtering step, because if the answer happens to be No, you save the effort of looking through the set to try to find your data.
That worldview or philosophy is a valuable thing to keep in your pocket when you’re working with large amounts of data, and that’s the topic of my Strata Conference / Hadoop World NYC 2013 talk.