Big Data Blog

Big Data Blog

How Predictable Are You?

Dec 10

Written by:
12/10/2012 1:24 PM  RssIcon

Using the science fiction of Asimov’s Psychohistory, Paige Roberts recently blogged about whether or not data science can accurately predict human behavior using machine learning and predictive analytics.

I commented that Psychohistory is an excellent spin on what Duncan Watts refers to as the central intellectual problem of sociology — the micro-macro problem.  Outcomes that sociologists seek to explain are intrinsically macro in nature, meaning that they involve large numbers of people, while at the same time these outcomes are driven by the micro actions of individuals.  The collective behaviors of massive numbers of humans, as in Asimov’s Galactic Empire, are often easier to understand and make predictions about than are the seemingly all-too-often unpredictable behaviors of individual humans.

However, Roberts posited that machine learning and predictive analytics aren’t trying to figure out the rise and fall of human empires, but instead are trying to predict what Psychohistory thought was unpredictable — the behavior of individual humans.  “Given analysis of the actions of millions of humans under certain circumstances over a period of time,” Roberts blogged, “we can now predict which individuals are about to take a certain action, or which action a particular individual will take under certain conditions.”

I think that the reason many people resist the potential of big data analytics is because they innately resist the idea that the complexity of a person — especially themselves — can be encapsulated into a data-driven algorithm that could not only predict their future behavior, but also be able to make better decisions that would guide them toward more positive outcomes.

This reminds me of the 2006 non-fiction book Stumbling on Happiness by Daniel Gilbert, a Professor of Psychology at Harvard University, which was based on the premise that we imagine the future poorly, in particular when in comes to predicting what will make us happy.  At the beginning of the book, Gilbert tells us that he will reveal a way to allow us to make better decisions in the present to increase the likelihood of our future happiness.  But Gilbert also predicts that we will not like his recommendation, which was to leverage the collective outcomes of the past decisions of large groups of people as a guide to help us predict the outcomes of own decisions.  The reason that most of us reject this premise is because it seems to eliminate our uniqueness as an individual.

As Roberts explained in her post, Psychohistory accurately predicted the course of human history in the fictional Galactic Empire for centuries until a single, remarkable individual with extraordinary gifts altered the course of life for billions.  I think that we defy the psychohistorian using machine learning and predictive analytics because we all want to believe that we are that one-in-a-billion remarkable individual.

Although each one of us is as unique as a snowflake, it takes a large volume of snowflakes to accumulate any significant snowfall.  Which is why many big data analytical applications, such as recommendation engines and sentiment analysis, are more effective when large data volumes describing our individual preferences are aggregated.  Although comments from individuals are available to provide more supporting detail, those wonderfully unique snowflakes also produce a flummoxing flurry of eclectic tastes and idiosyncrasies.

If you have ever compared the feedback from two people providing the same rating, for example a three-out-of-five star review of a movie, you can get easily confused by comparing and contrasting a positive three-star review with a negative three-star review.  Therefore, many people, myself included, are more comfortable with the predictive power of millions of people collectively providing an average rating of four stars, while conveniently ignoring all sub-four-star reviews and not reading any of the comments.  Although I have watched a few four-star movies that I would personally rate with three or two stars, more often than not, my enjoyment of a particular movie was accurately predicted by a data-driven algorithm, such as the one used by Netflix.

Of course, it’s not just that data-driven algorithms can predict what movies we watch, but they can also predict how we work, shop, vote, and love, as Stephen Baker wrote about in his illuminating 2008 non-fiction book The Numerati.

I don’t want to minimize the data privacy concerns it raises, but I think that, more than our privacy, it’s our predictability that is being laid bare by big data.  In many cases, I don’t think it’s the predictive ability of data-driven algorithms that scares us, but instead what truly scares is how predictable we might be, how our unique snowflakes are falling toward a common ground, how much like everyone else we truly are.

The existential question underlying big data analytics is: How predictable are you?

The uncomfortable answer is that you are more predictable than you think.  Big data analytics is simply forcing all of us to acknowledge it, not just collectively, but also individually.

Search Big Data Blogs


Big Data (126)
Analytics (66)
Pervasive (50)
DataRush (33)
Hadoop (31)
Industry trends (22)
predictive analytics (20)
Scalability (20)
Multicore (15)
Data Mining (12)
Parallelism (10)
Java (9)
Jim Harris (9)
Cloud (8)
Cyber Security (8)
MapReduce (8)
big data analytics (7)
Data Volumes (7)
Data Warehouse (7)
RushAnalytics (7)
Volumes (7)
Actian (6)
Algorithms (6)
Cost-effective (6)
David Loshin (6)
Decision Support (6)
Julie Hunt (6)
RushAnalyzer (6)
analytics tools (5)
Dataflow (5)
machine learning (5)
Data Science (4)
Forrester (4)
Google (4)
Green IT (4)
Healthcare (4)
Phil Simon (4)
YARN (4)
analytics processes (3)
Big Data Science (3)
BigQuery (3)
Bloor (3)
data centers (3)
data integration (3)
Data Preparation (3)
data tools (3)
data-driven (3)
DataMatcher (3)
machine generated data (3)
Malstone B (3)
Mike Hoskins (3)
Opera Solutions (3)
Retail Analytics (3)
Security (3)
Smart Grid (3)
software (3)
Solutions (3)
telecommunications (3)
transportation analytics (3)
Age of Data (2)
analytics accuracy (2)
architecture (2)
Austin (2)
Bloor Research (2)
Business Intelligence (2)
data management (2)
Data Rush (2)
David Inbar (2)
David Norris (2)
fraud (2)
fraud detection (2)
Gartner (2)
GigaOM (2)
Hadoop Summit (2)
IntegrationWorld (2)
intelligent transportation systems (2)
internet of things (2)
McKinsey (2)
meetup (2)
ParAccel (2)
Pervasive DataRush (2)
Rexer Analytics (2)
smart meters (2)
#FollowFriday (1)
a (1)
Amazon (1)
analytics workflow (1)
Application Development (1)
automation (1)
Benchmarks (1)
best practices (1)
Cloud Analytics Summit (1)
cloud computing (1)
Cloudera (1)
contests (1)
cost (1)
cyber security issues (1)
data flow architecture (1)
Data Integrator - Hadoop Edition (1)
data quality (1)
data visualization (1)
digital marketing (1)
Door64 (1)
easy big data analytics (1)
Ericson (1)
Esri (1)
Facebook (1)
Fuzzy Matching (1)
Goverment (1)
Hadoop User Group (1)
Hadoop World (1)
hardware (1)
HBase (1)
HDFS (1)
industrial internet (1)
Jazoon (1)
Jim Falgout (1)
MalStoneB (1)
Mansour Raad (1)
Neil Raden (1)
Netflix (1)
NetFlow (1)
operational intelligence (1)
Paige Roberts (1)
para (1)
PIG (1)
pilot projects (1)
Predictive Analytics World (1)
psychohistory (1)
Public Sector (1)
Redshift (1)
Robin Bloor (1)
ROI (1)
Rosaria Silipo (1)
RushAccelerator (1)
RushLoader (1)
Sampling (1)
Signal and Noise (1)
SmartDataCollective (1)
spatial analytics (1)
speed (1)
sports (1)
Stephen Swoyer (1)
Steve Shine (1)
Strata (1)
SXSW (1)
Telecom Analytics (1)
Telecommunications Industry Association (1)
TIA (1)
Transportation (1)
TurboRush (1)
VectorWise (1)
Zementis (1)

Latest Posts

Actian Big Data & Analytics Blog has MOVED!
Big Data Phrenology
Big Data, Simpson's Paradox and Sufficient Tools
Data Science and the Art of Data Visualization

Big Data Blog Archives

<April 2014>

Accelerating Big Data 2.0™