Big Data Blog

Big Data Blog

Predictive Analytics, the Data Effect, and Jed Clampett

May 13

Written by:
5/13/2013 7:00 AM  RssIcon

“Bow your head: the hot buzzword big data has ascended to royalty,” declared Eric Siegel, in his book Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die.  “It’s in every news clip, every data science presentation, and every advertisement for analytics solutions.  It’s a crisis!  It’s an opportunity!  It’s a crisis of opportunity!”

Siegel then shares a big secret about big data.

“Big data does not exist,” Siegel revealed.  “The elephant in the room is that there is no elephant in the room.  What’s exciting about data isn’t how much of it there is, but how quickly it is growing.  We’re in a persistent state of awe at data’s sheer quantity because of one thing that does not change: There’s always so much more today than yesterday.”

“Size is relative, not absolute,” Siegel explained.  “If we use the word big today, we’ll quickly run out of adjectives: big data, bigger data, even bigger data, and biggest data.  The International Conference on Very Large Data Bases has been running since 1975.  We have a dearth of vocabulary with which to describe a wealth of data.  Size doesn’t matter.  It’s the rate of expansion.”

The unabated rate of expansion lead Daragh O Brien to quip last summer on Twitter that “after big data we will inevitably begin to see the rise of morbidly obese data.”  To which I responded with a blog post about the need to exercise better data management.

“There’s a ton of it,” Siegel continued.  “So what?  What guarantees that all this residual rubbish, this by-product of organizational functions, holds value?  It’s not more than an extremely long list of observed events, an obsessive-compulsive enumeration of things that have happened.”

Fear not, data lovers.  Siegel says the answer is simple.

“Everything is connected to everything else—if only indirectly—and this is reflected in data.  Data always speaks.  It always has a story to tell, and there’s always something to learn from it.  Data scientists see this over and over again across predictive analytics projects.  Pull some data together and, although you can never be certain what you’ll find, you can be sure you’ll discover valuable connections by decoding the language it speaks and listening.”

Siegel calls this The Data Effect: Data is always predictive.

1s01744“This is the assumption behind the leap of faith an organization takes when undertaking predictive analytics,” Siegel explained.  “Budgeting the staff and tools for a predictive analytics project requires this leap, knowing not what specifically will be discovered and yet trusting that something will be.  Data is the new oil.  Unlike oil, data is extremely easy to transport and cheap to store.  It’s a bigger geyser, and this one is never going to run out.”

Of course, it’s impossible to predict if your predictive analytics project will turn you into Jed Clampett, a poor data scientist barely keeping your project funded, who one day runs an analysis and up through the data (the new oil that is, digital gold) comes bubbling a breakthrough business insight.  The next thing you know, your management says we’re going to move away from here.  California is the place we want to be, so let’s pack up your predictive models and head to the Valley (Silicon Valley that is, palm trees, swimming pools, and data geek millionaires).

Hey, you never know, it could happen.  After all, in data science, nothing is impossible, there are only varying degrees of improbable.



Related Posts

Why Data Science Storytelling Needs a Good Editor

Big Data, Predictive Analytics, and the Ideal Chronicler


Originally published in the Actian Big Data & Analytics blog.


Search Big Data Blogs

Tags

Big Data (126)
Analytics (66)
Pervasive (50)
DataRush (33)
Hadoop (31)
Industry trends (22)
predictive analytics (20)
Scalability (20)
Multicore (15)
Data Mining (12)
Parallelism (10)
Java (9)
Jim Harris (9)
KNIME (9)
Cloud (8)
Cyber Security (8)
MapReduce (8)
big data analytics (7)
Data Volumes (7)
Data Warehouse (7)
RushAnalytics (7)
Volumes (7)
Actian (6)
Algorithms (6)
Cost-effective (6)
David Loshin (6)
Decision Support (6)
Julie Hunt (6)
RushAnalyzer (6)
analytics tools (5)
Dataflow (5)
machine learning (5)
Data Science (4)
Forrester (4)
Google (4)
Green IT (4)
Healthcare (4)
Phil Simon (4)
YARN (4)
analytics processes (3)
Big Data Science (3)
BigQuery (3)
Bloor (3)
data centers (3)
data integration (3)
Data Preparation (3)
data tools (3)
data-driven (3)
DataMatcher (3)
machine generated data (3)
Malstone B (3)
Mike Hoskins (3)
Opera Solutions (3)
Retail Analytics (3)
Security (3)
Smart Grid (3)
software (3)
Solutions (3)
telecommunications (3)
transportation analytics (3)
Age of Data (2)
analytics accuracy (2)
architecture (2)
Austin (2)
Bloor Research (2)
Business Intelligence (2)
data management (2)
Data Rush (2)
David Inbar (2)
David Norris (2)
fraud (2)
fraud detection (2)
Gartner (2)
GigaOM (2)
Hadoop Summit (2)
IntegrationWorld (2)
intelligent transportation systems (2)
internet of things (2)
McKinsey (2)
meetup (2)
ParAccel (2)
Pervasive DataRush (2)
Rexer Analytics (2)
smart meters (2)
#FollowFriday (1)
a (1)
Amazon (1)
analytics workflow (1)
Application Development (1)
automation (1)
Benchmarks (1)
best practices (1)
Cloud Analytics Summit (1)
cloud computing (1)
Cloudera (1)
contests (1)
cost (1)
cyber security issues (1)
data flow architecture (1)
Data Integrator - Hadoop Edition (1)
data quality (1)
data visualization (1)
digital marketing (1)
Door64 (1)
easy big data analytics (1)
Ericson (1)
Esri (1)
Facebook (1)
Fuzzy Matching (1)
Goverment (1)
Hadoop User Group (1)
Hadoop World (1)
hardware (1)
HBase (1)
HDFS (1)
industrial internet (1)
Jazoon (1)
Jim Falgout (1)
MalStoneB (1)
Mansour Raad (1)
Neil Raden (1)
Netflix (1)
NetFlow (1)
operational intelligence (1)
Paige Roberts (1)
para (1)
PIG (1)
pilot projects (1)
Predictive Analytics World (1)
psychohistory (1)
Public Sector (1)
Redshift (1)
Robin Bloor (1)
ROI (1)
Rosaria Silipo (1)
RushAccelerator (1)
RushLoader (1)
Sampling (1)
Signal and Noise (1)
SmartDataCollective (1)
spatial analytics (1)
speed (1)
sports (1)
Stephen Swoyer (1)
Steve Shine (1)
Strata (1)
SXSW (1)
Telecom Analytics (1)
Telecommunications Industry Association (1)
TIA (1)
Transportation (1)
TurboRush (1)
VectorWise (1)
Zementis (1)

Latest Posts

Actian Big Data & Analytics Blog has MOVED!
Big Data Phrenology
Big Data, Simpson's Paradox and Sufficient Tools
Data Science and the Art of Data Visualization

Big Data Blog Archives

Archive
<April 2014>
SunMonTueWedThuFriSat
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910
Monthly
Go

Accelerating Big Data 2.0™