Comments on: Can we afford big data, or do we need smart data? http://www.xaprb.com/blog/2012/11/15/can-we-afford-big-data-or-do-we-need-smart-data/ Stay curious! Fri, 10 May 2013 18:25:19 +0000 hourly 1 http://wordpress.org/?v=3.5.1 By: Raghavendra Prabhu http://www.xaprb.com/blog/2012/11/15/can-we-afford-big-data-or-do-we-need-smart-data/#comment-20414 Raghavendra Prabhu Sun, 09 Dec 2012 12:04:06 +0000 http://www.xaprb.com/blog/?p=2960#comment-20414 I recently came across a statistics/analytics joke, would like to share in this context.

http://www.dbms2.com/2012/11/02/drunk-lamppost-statistics-illumination-pony-somewhere-analytics/

]]>
By: Andrew Parker http://www.xaprb.com/blog/2012/11/15/can-we-afford-big-data-or-do-we-need-smart-data/#comment-20397 Andrew Parker Wed, 21 Nov 2012 21:33:02 +0000 http://www.xaprb.com/blog/?p=2960#comment-20397 This reminds me of a recent article in the Atlantic, where in an interview, Noam Chomsky gives his views on “Where Artificial Intelligence Went Wrong.”

http://www.theatlantic.com/technology/archive/2012/11/noam-chomsky-on-where-artificial-intelligence-went-wrong/261637/

From the article:

“… systems biology and artificial intelligence both face the same fundamental task of reverse-engineering a highly complex system whose inner workings are largely a mystery. Yet, ever-improving technologies yield massive data related to the system, only a fraction of which might be relevant. Do we rely on powerful computing and statistical approaches to tease apart signal from noise, or do we look for the more basic principles that underlie the system and explain its essence? The urge to gather more data is irresistible, though it’s not always clear what theoretical framework these data might fit into.”

Is this more generally true? Of course, collecting data is useful, but focusing just on that distracts us from attempting to understand the underlying principles.

]]>
By: Craig Naylor http://www.xaprb.com/blog/2012/11/15/can-we-afford-big-data-or-do-we-need-smart-data/#comment-20391 Craig Naylor Fri, 16 Nov 2012 08:18:19 +0000 http://www.xaprb.com/blog/?p=2960#comment-20391 I’m glad to read posts like this that speak some sanity in the world that has gone crazy. We are indeed hoarding vast amounts of data for little purpose. The analysing and distilling of data needs be focused on more.

Keep up the great work!

]]>
By: Greg http://www.xaprb.com/blog/2012/11/15/can-we-afford-big-data-or-do-we-need-smart-data/#comment-20390 Greg Thu, 15 Nov 2012 18:05:57 +0000 http://www.xaprb.com/blog/?p=2960#comment-20390 I don’t think you’re necessarily overreacting, but – while I usually agree with you (and appreciate you communicating your thoughts via this blog) – I do not believe what you suggest is practical. Ideally, I would agree, we should discard what we do not need; thing is, the proble is that it’s impossible to know whether we will need it and why we will need it. Consider the Enron e-mail data set. Shouldn’t we have thrown that out by now? The answer is no — it continues to be a rich data set to use as an experimental testbed in text mining and AI research. However, that doesn’t mean that we should necessarily save *all* the Enron email that ever existed, perhaps just a sample of it. The question then is: which sample? How much of it? Again, hard to know. My point of view is that we should save and make accessible as much raw data as possible, because we are still in our infancy at learning how to integrate and mine such data. It would be great if we can use this surplus of data to help us compress it better and more smartly. Storage is and will continue to get cheaper. Information is one of those few things that having a lot of doesn’t really have tremendous real-world side effects (at least, they are minimized compared to other industries). So in short I think you’re heart is in the right place and I definitely endorse the cycle you have defined; I just don’t think we’re at the point where we can take that step just yet.

]]>