Big Data Blog

Big Data Blog

Weighing the Benefits of Big Data: What’s in it for Me?

Feb 18

Written by:
2/18/2013 5:57 AM  RssIcon

In my last post I began to look at the types and characteristics of business problems that are suited to big data analytics programming models, and one conclusion that we can draw from that discussion is that one of the main drivers of suitability is the desire to improve performance. Interestingly, though, I can break that desire for improved performance into two different categories: improved computational performance vs. improved business performance.

An example of the first category is implementing the “transformation” part of data warehouse ETL (Extraction, Transformation, and Loading) on a big data platform. This type of application, though technical, exhibits many of the characteristics that I described in my last post: the performance of existing implementations is impeded by data latency, computational performance of traditional approaches is limited by the single-threaded execution model, it can consume large data volumes, and is amenable to parallelism. You would expect that a big data implementation of this kind of data integration will speed the execution time, resulting in a faster load of your target data warehouse.

An example of the second kind involves absorbing large unstructured data feeds streamed from social media channels into a big data environment that can analyze text, identify entities, contexts, and roles, resolve those identities and link them within the contexts and roles, and use that to both enhance the breadth and depth of customer profiles and to proactively monitor customer sentiment for unsatisfactory experiences that pose a potential brand risk. In this case, the computational benefit comes from both absorbing greater volumes of data from each data source as well as broadening the variety of data sources – you can process large inputs in a shorter period, and end up with a richer customer profile that can feed operational analytics more quickly.

We can summarize the two stereotypical categories as “doing things faster/cheaper” and “getting better results.” Both are valid drivers for exploring a big data programming environment and execution platform. While the two cases we looked at here are somewhat disjointed in terms of the anticipated benefit, it is interesting to find those use cases where there is synergy between those two value drivers. In other words, look for those scenarios where “faster/cheaper” leads to “better results” or where “better results” leads to “faster/cheaper.”

One example might be network load analysis and optimization, a generic description of a capability that is beneficial in the telecommunications industry and is emerging as possible within the energy industry. The larger the network, the greater the volume of data being generated – think of the number of mobile communication events transpiring simultaneously in any densely populated area, including calls, text messages, and internet access. Analyzing network traffic for adverse events (dropped calls, for example) allows the service provider to determine whether there are any imminent failure points that can be alleviated through reconfiguring the network and rerouting data traffic through alternate paths. This is a great analytics application for big data: it needs to absorb and analyze large volumes of transaction information streamed in real time. Faster analysis enables engineers to take action quickly to reduce outages and failures – in other words “faster/cheaper” leads to “better results.”

But to determine when it makes sense to invest in the effort and resource for big data, you need to know when its value exceeds the cost of operations. In my next post I’ll start to look at quantifying potential lift, specifying success criteria, and ways of estimating and then measuring the benefits.

Location: Blogs Parent Separator David Loshin

Search Big Data Blogs


Big Data (126)
Analytics (66)
Pervasive (50)
DataRush (33)
Hadoop (31)
Industry trends (22)
predictive analytics (20)
Scalability (20)
Multicore (15)
Data Mining (12)
Parallelism (10)
Java (9)
Jim Harris (9)
Cloud (8)
Cyber Security (8)
MapReduce (8)
big data analytics (7)
Data Volumes (7)
Data Warehouse (7)
RushAnalytics (7)
Volumes (7)
Actian (6)
Algorithms (6)
Cost-effective (6)
David Loshin (6)
Decision Support (6)
Julie Hunt (6)
RushAnalyzer (6)
analytics tools (5)
Dataflow (5)
machine learning (5)
Data Science (4)
Forrester (4)
Google (4)
Green IT (4)
Healthcare (4)
Phil Simon (4)
YARN (4)
analytics processes (3)
Big Data Science (3)
BigQuery (3)
Bloor (3)
data centers (3)
data integration (3)
Data Preparation (3)
data tools (3)
data-driven (3)
DataMatcher (3)
machine generated data (3)
Malstone B (3)
Mike Hoskins (3)
Opera Solutions (3)
Retail Analytics (3)
Security (3)
Smart Grid (3)
software (3)
Solutions (3)
telecommunications (3)
transportation analytics (3)
Age of Data (2)
analytics accuracy (2)
architecture (2)
Austin (2)
Bloor Research (2)
Business Intelligence (2)
data management (2)
Data Rush (2)
David Inbar (2)
David Norris (2)
fraud (2)
fraud detection (2)
Gartner (2)
GigaOM (2)
Hadoop Summit (2)
IntegrationWorld (2)
intelligent transportation systems (2)
internet of things (2)
McKinsey (2)
meetup (2)
ParAccel (2)
Pervasive DataRush (2)
Rexer Analytics (2)
smart meters (2)
#FollowFriday (1)
a (1)
Amazon (1)
analytics workflow (1)
Application Development (1)
automation (1)
Benchmarks (1)
best practices (1)
Cloud Analytics Summit (1)
cloud computing (1)
Cloudera (1)
contests (1)
cost (1)
cyber security issues (1)
data flow architecture (1)
Data Integrator - Hadoop Edition (1)
data quality (1)
data visualization (1)
digital marketing (1)
Door64 (1)
easy big data analytics (1)
Ericson (1)
Esri (1)
Facebook (1)
Fuzzy Matching (1)
Goverment (1)
Hadoop User Group (1)
Hadoop World (1)
hardware (1)
HBase (1)
HDFS (1)
industrial internet (1)
Jazoon (1)
Jim Falgout (1)
MalStoneB (1)
Mansour Raad (1)
Neil Raden (1)
Netflix (1)
NetFlow (1)
operational intelligence (1)
Paige Roberts (1)
para (1)
PIG (1)
pilot projects (1)
Predictive Analytics World (1)
psychohistory (1)
Public Sector (1)
Redshift (1)
Robin Bloor (1)
ROI (1)
Rosaria Silipo (1)
RushAccelerator (1)
RushLoader (1)
Sampling (1)
Signal and Noise (1)
SmartDataCollective (1)
spatial analytics (1)
speed (1)
sports (1)
Stephen Swoyer (1)
Steve Shine (1)
Strata (1)
SXSW (1)
Telecom Analytics (1)
Telecommunications Industry Association (1)
TIA (1)
Transportation (1)
TurboRush (1)
VectorWise (1)
Zementis (1)

Latest Posts

Actian Big Data & Analytics Blog has MOVED!
Big Data Phrenology
Big Data, Simpson's Paradox and Sufficient Tools
Data Science and the Art of Data Visualization

Big Data Blog Archives

<October 2014>

Accelerating Big Data 2.0™