Big Data Blog

Big Data Blog

Is DW before BI going Bye-Bye?

Oct 29

Written by:
10/29/2012 5:40 AM  RssIcon

Data Warehouse becomes a historical artifact

Historically, standing between operations and analytics was the hulking amalgamation of extracted, transformed, and loaded data that anxiously awaited the queries of business users.  It was called the Data Warehouse (or sometimes the Data Outhouse).  But despite the valuable historical data stored in its slowly changing dimensions and transactional archives, the data warehouse, as we know it today, may soon become a historical artifact, a data museum showcasing just how quaint quants were last century.

The Rumble in the Data Jungle coming from the so-called “Big Data Revolution,” which as Mike Hoskins recently explained, “is exposing how technically obsolete the existing data warehousing infrastructure really is.  Relational databases were invented for transactional workloads, but they eventually came to be used for analytical workloads as well.  Having a standard database for all workloads, whether they were transactional or analytical, made some sense until now.  Relational technology is not well-suited to large-scale analytical workloads.  Big data analytics is going to occur on more modern technology infrastructure, such as Hadoop.”

Variety: Voldemort of relational databases

Now, of course, Volume is not the only V vying to spell Victory for non-relational data management solutions.  Although the Velociraptor of near-real-time decisions is also a vexing villain, Variety is the Voldemort that relational databases dare not query the name of because even business intelligence muggles have heard of the analytical treasures hidden in the needle stacks of unstructured data.

Perhaps a more accurate term for this non-relational-friendly data is poly-structured data, which is, as Curt Monash described, “data with structure that can be exploited to provide most of the benefits of a highly structured database (e.g., a tabular/relational one), but cannot be described in the concise, consistent form such highly structured systems require.  Poly-structured data is data that has considerable structure, but whose structure is in some important way unpredictable.”

“People used to estimate,” Paige Roberts recently blogged, “that somewhere between 50% and 80% of the data in an enterprise wasn’t getting used for business intelligence because it wasn’t structured to fit in a standard SQL database.  Now, those numbers are laughable.  The amount of poly-structured data flooding in from social media, web, and machine-generated sources dwarfs the tiny amount of structured data enterprises store in relational databases.  And that flood of data is growing beyond exponentially.  What is the percentage of poly-structured data to structured data in an enterprise now?  90% to 10%?  99% to 1%?  It depends on the enterprise, certainly, but no matter what the industry, it’s only going to get more and more out of proportion over time.  Structured, transactional data is still essential to an enterprise, but it’s becoming a smaller and smaller part of a much bigger picture.”

Business Intelligence Big Picture

Does this mean that data warehousing is becoming a smaller and smaller part of a much bigger business intelligence picture?  Is DW before BI going Bye-Bye?  In other words, will traditional data warehouses built with relational database technology no longer be a primary data source for business intelligence?

The new technology challenges, and, far more important, business opportunities, represented by poly-structured data require a willingness to think outside the box.  So, at the very least, data management needs to start thinking outside the relational model, and business intelligence needs to start thinking outside the data warehouse.

Search Big Data Blogs

Tags

Big Data (126)
Analytics (66)
Pervasive (50)
DataRush (33)
Hadoop (31)
Industry trends (22)
predictive analytics (20)
Scalability (20)
Multicore (15)
Data Mining (12)
Parallelism (10)
Java (9)
Jim Harris (9)
KNIME (9)
Cloud (8)
Cyber Security (8)
MapReduce (8)
big data analytics (7)
Data Volumes (7)
Data Warehouse (7)
RushAnalytics (7)
Volumes (7)
Actian (6)
Algorithms (6)
Cost-effective (6)
David Loshin (6)
Decision Support (6)
Julie Hunt (6)
RushAnalyzer (6)
analytics tools (5)
Dataflow (5)
machine learning (5)
Data Science (4)
Forrester (4)
Google (4)
Green IT (4)
Healthcare (4)
Phil Simon (4)
YARN (4)
analytics processes (3)
Big Data Science (3)
BigQuery (3)
Bloor (3)
data centers (3)
data integration (3)
Data Preparation (3)
data tools (3)
data-driven (3)
DataMatcher (3)
machine generated data (3)
Malstone B (3)
Mike Hoskins (3)
Opera Solutions (3)
Retail Analytics (3)
Security (3)
Smart Grid (3)
software (3)
Solutions (3)
telecommunications (3)
transportation analytics (3)
Age of Data (2)
analytics accuracy (2)
architecture (2)
Austin (2)
Bloor Research (2)
Business Intelligence (2)
data management (2)
Data Rush (2)
David Inbar (2)
David Norris (2)
fraud (2)
fraud detection (2)
Gartner (2)
GigaOM (2)
Hadoop Summit (2)
IntegrationWorld (2)
intelligent transportation systems (2)
internet of things (2)
McKinsey (2)
meetup (2)
ParAccel (2)
Pervasive DataRush (2)
Rexer Analytics (2)
smart meters (2)
#FollowFriday (1)
a (1)
Amazon (1)
analytics workflow (1)
Application Development (1)
automation (1)
Benchmarks (1)
best practices (1)
Cloud Analytics Summit (1)
cloud computing (1)
Cloudera (1)
contests (1)
cost (1)
cyber security issues (1)
data flow architecture (1)
Data Integrator - Hadoop Edition (1)
data quality (1)
data visualization (1)
digital marketing (1)
Door64 (1)
easy big data analytics (1)
Ericson (1)
Esri (1)
Facebook (1)
Fuzzy Matching (1)
Goverment (1)
Hadoop User Group (1)
Hadoop World (1)
hardware (1)
HBase (1)
HDFS (1)
industrial internet (1)
Jazoon (1)
Jim Falgout (1)
MalStoneB (1)
Mansour Raad (1)
Neil Raden (1)
Netflix (1)
NetFlow (1)
operational intelligence (1)
Paige Roberts (1)
para (1)
PIG (1)
pilot projects (1)
Predictive Analytics World (1)
psychohistory (1)
Public Sector (1)
Redshift (1)
Robin Bloor (1)
ROI (1)
Rosaria Silipo (1)
RushAccelerator (1)
RushLoader (1)
Sampling (1)
Signal and Noise (1)
SmartDataCollective (1)
spatial analytics (1)
speed (1)
sports (1)
Stephen Swoyer (1)
Steve Shine (1)
Strata (1)
SXSW (1)
Telecom Analytics (1)
Telecommunications Industry Association (1)
TIA (1)
Transportation (1)
TurboRush (1)
VectorWise (1)
Zementis (1)

Latest Posts

Actian Big Data & Analytics Blog has MOVED!
Big Data Phrenology
Big Data, Simpson's Paradox and Sufficient Tools
Data Science and the Art of Data Visualization

Big Data Blog Archives

Archive
<August 2014>
SunMonTueWedThuFriSat
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456
Monthly
Go

Accelerating Big Data 2.0™