Predictive Analytics, the Data Effect, and Jed Clampett
5/13/2013 7:00 AM
“Bow your head: the hot buzzword big data has ascended to royalty,” declared Eric Siegel, in his book Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. “It’s in every news clip, every data science presentation, and every advertisement for analytics solutions. It’s a crisis! It’s an opportunity! It’s a crisis of opportunity!”
Siegel then shares a big secret about big data.
“Big data does not exist,” Siegel revealed. “The elephant in the room is that there is no elephant in the room. What’s exciting about data isn’t how much of it there is, but how quickly it is growing. We’re in a persistent state of awe at data’s sheer quantity because of one thing that does not change: There’s always so much more today than yesterday.”
“Size is relative, not absolute,” Siegel explained. “If we use the word big today, we’ll quickly run out of adjectives: big data, bigger data, even bigger data, and biggest data. The International Conference on Very Large Data Bases has been running since 1975. We have a dearth of vocabulary with which to describe a wealth of data. Size doesn’t matter. It’s the rate of expansion.”
The unabated rate of expansion lead Daragh O Brien to quip last summer on Twitter that “after big data we will inevitably begin to see the rise of morbidly obese data.” To which I responded with a blog post about the need to exercise better data management.
“There’s a ton of it,” Siegel continued. “So what? What guarantees that all this residual rubbish, this by-product of organizational functions, holds value? It’s not more than an extremely long list of observed events, an obsessive-compulsive enumeration of things that have happened.”
Fear not, data lovers. Siegel says the answer is simple.
“Everything is connected to everything else—if only indirectly—and this is reflected in data. Data always speaks. It always has a story to tell, and there’s always something to learn from it. Data scientists see this over and over again across predictive analytics projects. Pull some data together and, although you can never be certain what you’ll find, you can be sure you’ll discover valuable connections by decoding the language it speaks and listening.”
Siegel calls this The Data Effect: Data is always predictive.
“This is the assumption behind the leap of faith an organization takes when undertaking predictive analytics,” Siegel explained. “Budgeting the staff and tools for a predictive analytics project requires this leap, knowing not what specifically will be discovered and yet trusting that something will be. Data is the new oil. Unlike oil, data is extremely easy to transport and cheap to store. It’s a bigger geyser, and this one is never going to run out.”
Of course, it’s impossible to predict if your predictive analytics project will turn you into Jed Clampett, a poor data scientist barely keeping your project funded, who one day runs an analysis and up through the data (the new oil that is, digital gold) comes bubbling a breakthrough business insight. The next thing you know, your management says we’re going to move away from here. California is the place we want to be, so let’s pack up your predictive models and head to the Valley (Silicon Valley that is, palm trees, swimming pools, and data geek millionaires).
Hey, you never know, it could happen. After all, in data science, nothing is impossible, there are only varying degrees of improbable.
Why Data Science Storytelling Needs a Good Editor
Big Data, Predictive Analytics, and the Ideal Chronicler
Originally published in the Actian Big Data & Analytics blog.