Finding a Needle in a Needle Stack
9/18/2012 6:24 AM
Finding a needle in a haystack is an oft-used metaphor for something that’s hard to ﬁnd. Because of
concerns about its signal-to-noise ratio, big data analytics is sometimes compared to ﬁnding a
golden needle in a haystack of data. In other words, you have to dig through a whole lot of hay (i.e.,
massive amounts of data)
before you ﬁnd a golden needle (i.e., data-driven
This is also why a lot of people talk about the
downside of data sampling, since even a statistically-valid haystack sample may not contain (Jedi mind
tricks aside) the needles you are looking for, especially with recent
technological advancements enabling you to analyze the whole haystack.
One thing that makes literally ﬁnding a needle in a haystack a little easier is how different a
needle is from a hay ﬁber — as long as the quality of the hay is good, since poor-quality hay is too coarse,
and thus needle-like. The similarity of a needle and a poor-quality hay ﬁber is similar to how it’s
difﬁcult to discern if a statistical outlier represents a business insight or a data quality issue.
Although data quality is a big concern for big data,
noise is sometimes over-identiﬁed with poor-quality data. In their book Made to Stick: Why Some Ideas Survive and Others Die,
Chip Heath and Dan Heath explained that “an accurate but useless idea is still useless. If a message can’t be used to make predictions or decisions, it is without value, no matter how accurate or comprehensive it is.”
Therefore, noise can also be high-quality data that’s not relevant to your current analytical goals.
It’s not always easy to differentiate hay ﬁbers from needles (i.e., noise from signal), or differentiate a needle from a golden needle (i.e., high-quality, but analytically useless, data from a relevant data insight). Although big data requires you to exercise better data management, having high-quality data still leaves you with an analytical challenge that’s comparable to ﬁnding a needle in a needle stack.
3 comment(s) so far...
By John Owens on
9/18/2012 5:27 PM
Re: Finding a Needle in a Needle Stack
Excellent article, Jim.
The trouble is that too many people and enterprises are looking through big data in order to find data! And they do, lots of it.
But are they any more enlightened? Too often they are not. The only way in which big data will bring benefits is if brings more information!
No matter how good the data is in big data, it is of no value unless it carries information. Information has two elements, data and structure! Neither on its own is of any value or benefit.
Incorrectly inferring structure (and humans are hard wired to do so) from big data can lead to big errors.
By Jim Harris on
9/19/2012 8:13 AM
Re: Finding a Needle in a Needle Stack
Thanks for your comment, John.
Albert Einstein once said that “information is not knowledge,” by which he meant that information (even highly structured, high-quality information) on its own does not create knowledge. Information can guide us toward knowledge, but the journey must be completed by our personal experience.
The business value of big data comes from applying it toward solving business problems. As big data guides us toward first understanding and then deriving a testable solution, we do add structure and verify context-specific quality, thus creating information. But this information only becomes knowledge if it’s augmented by our business experience, and proves helpful to our organization’s journey.
How much structure and how much quality must be applied to big data to transform it into valuable information will vary on a case-by-case basis (this is a big, pun intended, topic in and of itself).
The most important aspect of the data-information-knowledge journey is that it is a continuous cycle — in fact, some of the most important data is the data created by the evaluation of the business application of information — this is the feedback loop without which knowledge is impossible.
As T.S. Eliot once said: “we must not cease from exploration and the end of all our exploring will be to arrive where we began and to know the place for the first time.”
By TrackBack on
3/8/2013 12:58 AM
vegetables and fruits slicer
Blog | Pervasive Big Data
# vegetables and fruits slicer