Insights and Must-Knows
from the Big Data Blog

Big Data Blog

Is DW before BI going Bye-Bye?

Oct 29

Written by:
10/29/2012 5:40 AM  RssIcon

Data Warehouse becomes a historical artifact

Historically, standing between operations and analytics was the hulking amalgamation of extracted, transformed, and loaded data that anxiously awaited the queries of business users.  It was called the Data Warehouse (or sometimes the Data Outhouse).  But despite the valuable historical data stored in its slowly changing dimensions and transactional archives, the data warehouse, as we know it today, may soon become a historical artifact, a data museum showcasing just how quaint quants were last century.

The Rumble in the Data Jungle coming from the so-called “Big Data Revolution,” which as Mike Hoskins recently explained, “is exposing how technically obsolete the existing data warehousing infrastructure really is.  Relational databases were invented for transactional workloads, but they eventually came to be used for analytical workloads as well.  Having a standard database for all workloads, whether they were transactional or analytical, made some sense until now.  Relational technology is not well-suited to large-scale analytical workloads.  Big data analytics is going to occur on more modern technology infrastructure, such as Hadoop.”

Variety: Voldemort of relational databases

Now, of course, Volume is not the only V vying to spell Victory for non-relational data management solutions.  Although the Velociraptor of near-real-time decisions is also a vexing villain, Variety is the Voldemort that relational databases dare not query the name of because even business intelligence muggles have heard of the analytical treasures hidden in the needle stacks of unstructured data.

Perhaps a more accurate term for this non-relational-friendly data is poly-structured data, which is, as Curt Monash described, “data with structure that can be exploited to provide most of the benefits of a highly structured database (e.g., a tabular/relational one), but cannot be described in the concise, consistent form such highly structured systems require.  Poly-structured data is data that has considerable structure, but whose structure is in some important way unpredictable.”

“People used to estimate,” Paige Roberts recently blogged, “that somewhere between 50% and 80% of the data in an enterprise wasn’t getting used for business intelligence because it wasn’t structured to fit in a standard SQL database.  Now, those numbers are laughable.  The amount of poly-structured data flooding in from social media, web, and machine-generated sources dwarfs the tiny amount of structured data enterprises store in relational databases.  And that flood of data is growing beyond exponentially.  What is the percentage of poly-structured data to structured data in an enterprise now?  90% to 10%?  99% to 1%?  It depends on the enterprise, certainly, but no matter what the industry, it’s only going to get more and more out of proportion over time.  Structured, transactional data is still essential to an enterprise, but it’s becoming a smaller and smaller part of a much bigger picture.”

Business Intelligence Big Picture

Does this mean that data warehousing is becoming a smaller and smaller part of a much bigger business intelligence picture?  Is DW before BI going Bye-Bye?  In other words, will traditional data warehouses built with relational database technology no longer be a primary data source for business intelligence?

The new technology challenges, and, far more important, business opportunities, represented by poly-structured data require a willingness to think outside the box.  So, at the very least, data management needs to start thinking outside the relational model, and business intelligence needs to start thinking outside the data warehouse.


Your name:
Gravatar Preview
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Title:
Comment:
Security Code
CAPTCHA image
Enter the code shown above in the box below
Add Comment   Cancel 

Search Big Data Blogs

Tags

Big Data (117)
Analytics (62)
Pervasive (50)
DataRush (33)
Hadoop (30)
predictive analytics (20)
Scalability (20)
Industry trends (19)
Multicore (15)
Data Mining (12)
Parallelism (10)
Java (9)
KNIME (9)
Cloud (8)
Cyber Security (8)
MapReduce (8)
RushAnalytics (7)
Volumes (7)
Algorithms (6)
Cost-effective (6)
Data Volumes (6)
Data Warehouse (6)
David Loshin (6)
Decision Support (6)
RushAnalyzer (6)
Jim Harris (5)
machine learning (5)
Actian (4)
Dataflow (4)
Forrester (4)
Google (4)
Green IT (4)
Healthcare (4)
analytics processes (3)
BigQuery (3)
data centers (3)
data integration (3)
Data Preparation (3)
data-driven (3)
DataMatcher (3)
Julie Hunt (3)
Malstone B (3)
Opera Solutions (3)
Retail Analytics (3)
Security (3)
Solutions (3)
telecommunications (3)
transportation analytics (3)
YARN (3)
analytics tools (2)
Austin (2)
big data analytics (2)
Big Data Science (2)
Bloor (2)
Bloor Research (2)
Data Rush (2)
Data Science (2)
David Inbar (2)
David Norris (2)
fraud (2)
fraud detection (2)
Gartner (2)
GigaOM (2)
Hadoop Summit (2)
IntegrationWorld (2)
machine generated data (2)
meetup (2)
Mike Hoskins (2)
ParAccel (2)
Pervasive DataRush (2)
Rexer Analytics (2)
Smart Grid (2)
software (2)
#FollowFriday (1)
Amazon (1)
analytics accuracy (1)
analytics workflow (1)
Application Development (1)
automation (1)
Benchmarks (1)
Business Intelligence (1)
Cloud Analytics Summit (1)
cloud computing (1)
Cloudera (1)
contests (1)
cost (1)
cyber security issues (1)
Data Integrator - Hadoop Edition (1)
data management (1)
data quality (1)
data tools (1)
Door64 (1)
easy big data analytics (1)
Ericson (1)
Esri (1)
Facebook (1)
Fuzzy Matching (1)
Goverment (1)
Hadoop User Group (1)
Hadoop World (1)
hardware (1)
HBase (1)
HDFS (1)
intelligent transportation systems (1)
Jazoon (1)
Jim Falgout (1)
MalStoneB (1)
Mansour Raad (1)
McKinsey (1)
Neil Raden (1)
Netflix (1)
NetFlow (1)
operational intelligence (1)
para (1)
Phil Simon (1)
PIG (1)
Predictive Analytics World (1)
psychohistory (1)
Public Sector (1)
Redshift (1)
Rosaria Silipo (1)
RushAccelerator (1)
RushLoader (1)
Sampling (1)
Signal and Noise (1)
smart meters (1)
SmartDataCollective (1)
spatial analytics (1)
speed (1)
sports (1)
Stephen Swoyer (1)
Steve Shine (1)
Strata (1)
SXSW (1)
Telecom Analytics (1)
Telecommunications Industry Association (1)
TIA (1)
Transportation (1)
TurboRush (1)
VectorWise (1)
Zementis (1)

Latest Posts

Big Data Blog Archives

Archive
<June 2013>
SunMonTueWedThuFriSat
2627282930311
2345678
9101112131415
16171819202122
23242526272829
30123456
Monthly
Go