Is DW before BI going Bye-Bye?
10/29/2012 5:40 AM
Historically, standing between operations and analytics was the hulking amalgamation of extracted, transformed, and loaded data that anxiously awaited the queries of business users. It was called the Data Warehouse (or sometimes the Data Outhouse). But despite the valuable historical data stored in its slowly changing dimensions and transactional archives, the data warehouse, as we know it today, may soon become a historical artifact, a data museum showcasing just how quaint quants were last century.
The Rumble in the Data Jungle coming from the so-called “Big Data Revolution,” which as Mike Hoskins recently explained, “is exposing how technically obsolete the existing data warehousing infrastructure really is. Relational databases were invented for transactional workloads, but they eventually came to be used for analytical workloads as well. Having a standard database for all workloads, whether they were transactional or analytical, made some sense until now. Relational technology is not well-suited to large-scale analytical workloads. Big data analytics is going to occur on more modern technology infrastructure, such as Hadoop.”
Now, of course, Volume is not the only V vying to spell Victory for non-relational data management solutions. Although the Velociraptor of near-real-time decisions is also a vexing villain, Variety is the Voldemort that relational databases dare not query the name of because even business intelligence muggles have heard of the analytical treasures hidden in the needle stacks of unstructured data.
Perhaps a more accurate term for this non-relational-friendly data is poly-structured data, which is, as Curt Monash described, “data with structure that can be exploited to provide most of the benefits of a highly structured database (e.g., a tabular/relational one), but cannot be described in the concise, consistent form such highly structured systems require. Poly-structured data is data that has considerable structure, but whose structure is in some important way unpredictable.”
“People used to estimate,” Paige Roberts recently blogged, “that somewhere between 50% and 80% of the data in an enterprise wasn’t getting used for business intelligence because it wasn’t structured to fit in a standard SQL database. Now, those numbers are laughable. The amount of poly-structured data flooding in from social media, web, and machine-generated sources dwarfs the tiny amount of structured data enterprises store in relational databases. And that flood of data is growing beyond exponentially. What is the percentage of poly-structured data to structured data in an enterprise now? 90% to 10%? 99% to 1%? It depends on the enterprise, certainly, but no matter what the industry, it’s only going to get more and more out of proportion over time. Structured, transactional data is still essential to an enterprise, but it’s becoming a smaller and smaller part of a much bigger picture.”
Does this mean that data warehousing is becoming a smaller and smaller part of a much bigger business intelligence picture? Is DW before BI going Bye-Bye? In other words, will traditional data warehouses built with relational database technology no longer be a primary data source for business intelligence?
The new technology challenges, and, far more important, business opportunities, represented by poly-structured data require a willingness to think outside the box. So, at the very least, data management needs to start thinking outside the relational model, and business intelligence needs to start thinking outside the data warehouse.