Insights and Must-Knows
from the Big Data Blog

Big Data Blog

Weighing the Benefits of Big Data: What’s in it for Me?

Feb 18

Written by:
2/18/2013 5:57 AM  RssIcon

In my last post I began to look at the types and characteristics of business problems that are suited to big data analytics programming models, and one conclusion that we can draw from that discussion is that one of the main drivers of suitability is the desire to improve performance. Interestingly, though, I can break that desire for improved performance into two different categories: improved computational performance vs. improved business performance.

An example of the first category is implementing the “transformation” part of data warehouse ETL (Extraction, Transformation, and Loading) on a big data platform. This type of application, though technical, exhibits many of the characteristics that I described in my last post: the performance of existing implementations is impeded by data latency, computational performance of traditional approaches is limited by the single-threaded execution model, it can consume large data volumes, and is amenable to parallelism. You would expect that a big data implementation of this kind of data integration will speed the execution time, resulting in a faster load of your target data warehouse.

An example of the second kind involves absorbing large unstructured data feeds streamed from social media channels into a big data environment that can analyze text, identify entities, contexts, and roles, resolve those identities and link them within the contexts and roles, and use that to both enhance the breadth and depth of customer profiles and to proactively monitor customer sentiment for unsatisfactory experiences that pose a potential brand risk. In this case, the computational benefit comes from both absorbing greater volumes of data from each data source as well as broadening the variety of data sources – you can process large inputs in a shorter period, and end up with a richer customer profile that can feed operational analytics more quickly.

We can summarize the two stereotypical categories as “doing things faster/cheaper” and “getting better results.” Both are valid drivers for exploring a big data programming environment and execution platform. While the two cases we looked at here are somewhat disjointed in terms of the anticipated benefit, it is interesting to find those use cases where there is synergy between those two value drivers. In other words, look for those scenarios where “faster/cheaper” leads to “better results” or where “better results” leads to “faster/cheaper.”

One example might be network load analysis and optimization, a generic description of a capability that is beneficial in the telecommunications industry and is emerging as possible within the energy industry. The larger the network, the greater the volume of data being generated – think of the number of mobile communication events transpiring simultaneously in any densely populated area, including calls, text messages, and internet access. Analyzing network traffic for adverse events (dropped calls, for example) allows the service provider to determine whether there are any imminent failure points that can be alleviated through reconfiguring the network and rerouting data traffic through alternate paths. This is a great analytics application for big data: it needs to absorb and analyze large volumes of transaction information streamed in real time. Faster analysis enables engineers to take action quickly to reduce outages and failures – in other words “faster/cheaper” leads to “better results.”

But to determine when it makes sense to invest in the effort and resource for big data, you need to know when its value exceeds the cost of operations. In my next post I’ll start to look at quantifying potential lift, specifying success criteria, and ways of estimating and then measuring the benefits.

Categories:
Location: Blogs Parent Separator David Loshin

Your name:
Gravatar Preview
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Title:
Comment:
Security Code
CAPTCHA image
Enter the code shown above in the box below
Add Comment   Cancel 

Search Big Data Blogs

Tags

Big Data (114)
Analytics (59)
Pervasive (50)
DataRush (32)
Hadoop (29)
predictive analytics (20)
Scalability (20)
Industry trends (18)
Multicore (15)
Data Mining (12)
Parallelism (10)
Java (9)
Cloud (8)
Cyber Security (8)
KNIME (8)
MapReduce (8)
RushAnalytics (7)
Volumes (7)
Algorithms (6)
Cost-effective (6)
Data Volumes (6)
Data Warehouse (6)
RushAnalyzer (6)
David Loshin (5)
Decision Support (5)
machine learning (5)
Dataflow (4)
Forrester (4)
Google (4)
Green IT (4)
Healthcare (4)
Actian (3)
analytics processes (3)
BigQuery (3)
data centers (3)
data integration (3)
Data Preparation (3)
data-driven (3)
DataMatcher (3)
Jim Harris (3)
Malstone B (3)
Opera Solutions (3)
Retail Analytics (3)
Security (3)
Solutions (3)
telecommunications (3)
transportation analytics (3)
YARN (3)
analytics tools (2)
Austin (2)
big data analytics (2)
Big Data Science (2)
Bloor (2)
Bloor Research (2)
Data Rush (2)
David Inbar (2)
David Norris (2)
fraud (2)
fraud detection (2)
Gartner (2)
GigaOM (2)
Hadoop Summit (2)
IntegrationWorld (2)
Julie Hunt (2)
machine generated data (2)
meetup (2)
Mike Hoskins (2)
ParAccel (2)
Pervasive DataRush (2)
Rexer Analytics (2)
software (2)
#FollowFriday (1)
Amazon (1)
analytics accuracy (1)
analytics workflow (1)
Application Development (1)
automation (1)
Benchmarks (1)
Business Intelligence (1)
Cloud Analytics Summit (1)
cloud computing (1)
Cloudera (1)
contests (1)
cost (1)
cyber security issues (1)
Data Integrator - Hadoop Edition (1)
data quality (1)
Data Science (1)
data tools (1)
Door64 (1)
easy big data analytics (1)
Ericson (1)
Esri (1)
Facebook (1)
Fuzzy Matching (1)
Goverment (1)
Hadoop User Group (1)
Hadoop World (1)
intelligent transportation systems (1)
Jazoon (1)
Jim Falgout (1)
MalStoneB (1)
Mansour Raad (1)
McKinsey (1)
Neil Raden (1)
Netflix (1)
NetFlow (1)
para (1)
PIG (1)
Predictive Analytics World (1)
psychohistory (1)
Public Sector (1)
Redshift (1)
Rosaria Silipo (1)
RushAccelerator (1)
Sampling (1)
Signal and Noise (1)
Smart Grid (1)
SmartDataCollective (1)
spatial analytics (1)
speed (1)
sports (1)
Stephen Swoyer (1)
Steve Shine (1)
Strata (1)
SXSW (1)
Telecom Analytics (1)
Telecommunications Industry Association (1)
TIA (1)
Transportation (1)
TurboRush (1)
VectorWise (1)
Zementis (1)

Big Data Blog Archives

Archive
<May 2013>
SunMonTueWedThuFriSat
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678
Monthly
Go