Weighing the Benefits of Big Data: What’s in it for Me?
Feb
18
Written by:
2/18/2013 5:57 AM
In my last post I began to look at the types and characteristics of business problems that are suited to big data analytics programming models, and one conclusion that we can draw from that discussion is that one of the main drivers of suitability is the desire to improve performance. Interestingly, though, I can break that desire for improved performance into two different categories: improved computational performance vs. improved business performance.
An example of the first category is implementing the “transformation” part of data warehouse ETL (Extraction, Transformation, and Loading) on a big data platform. This type of application, though technical, exhibits many of the characteristics that I described in my last post: the performance of existing implementations is impeded by data latency, computational performance of traditional approaches is limited by the single-threaded execution model, it can consume large data volumes, and is amenable to parallelism. You would expect that a big data implementation of this kind of data integration will speed the execution time, resulting in a faster load of your target data warehouse.
An example of the second kind involves absorbing large unstructured data feeds streamed from social media channels into a big data environment that can analyze text, identify entities, contexts, and roles, resolve those identities and link them within the contexts and roles, and use that to both enhance the breadth and depth of customer profiles and to proactively monitor customer sentiment for unsatisfactory experiences that pose a potential brand risk. In this case, the computational benefit comes from both absorbing greater volumes of data from each data source as well as broadening the variety of data sources – you can process large inputs in a shorter period, and end up with a richer customer profile that can feed operational analytics more quickly.
We can summarize the two stereotypical categories as “doing things faster/cheaper” and “getting better results.” Both are valid drivers for exploring a big data programming environment and execution platform. While the two cases we looked at here are somewhat disjointed in terms of the anticipated benefit, it is interesting to find those use cases where there is synergy between those two value drivers. In other words, look for those scenarios where “faster/cheaper” leads to “better results” or where “better results” leads to “faster/cheaper.”
One example might be network load analysis and optimization, a generic description of a capability that is beneficial in the telecommunications industry and is emerging as possible within the energy industry. The larger the network, the greater the volume of data being generated – think of the number of mobile communication events transpiring simultaneously in any densely populated area, including calls, text messages, and internet access. Analyzing network traffic for adverse events (dropped calls, for example) allows the service provider to determine whether there are any imminent failure points that can be alleviated through reconfiguring the network and rerouting data traffic through alternate paths. This is a great analytics application for big data: it needs to absorb and analyze large volumes of transaction information streamed in real time. Faster analysis enables engineers to take action quickly to reduce outages and failures – in other words “faster/cheaper” leads to “better results.”
But to determine when it makes sense to invest in the effort and resource for big data, you need to know when its value exceeds the cost of operations. In my next post I’ll start to look at quantifying potential lift, specifying success criteria, and ways of estimating and then measuring the benefits.