Insights and Must-Knows
from the Big Data Blog

Big Data Blog

Pervasive DataRush Performance Testing Results Wow HUG

Mar 1

Written by:
3/1/2013 6:07 AM  RssIcon

Pervasive Big Data & Analytics Chief Technologist Jim Falgout recently had an opportunity to speak to the Bay Area Hadoop User Group (HUG), along with Mukund Madhugiri and Baljit Deot of Yahoo! and Hari Shreedharan of Cloudera. Jim discussed the major barriers to effective Hadoop deployments in the enterprise – complexity and the steep learning curve of MapReduce.

He detailed how Pervasive Big Data & Analytics solves these issues through a visual workbench integrated with Apache Hadoop that enables data scientists and analysts to build and execute complex big data workflows for Hadoop with minimal training and without MapReduce knowledge. A long-time evangelist of the DataFlow approach to big data, which is woven into the Pervasive DataRush framework, Jim discussed the key concepts behind it –  libraries of pre-built operators, the use of directed graphs, pipeline parallelism and its “share-nothing" architecture – and provided the specific enterprise benefits of Pervasive DataRush, Pervasive RushAnalytics and our accelerator for KNIME.

TPC-H Performance Testing: Pervasive DataRush vs. Apache PIG
A highlight of Jim’s discussion came when he showed the results of TPC-H performance testing* in which Pervasive DataRush showed superior performance over comparable Apache PIG scripts. For a number of HUG participants the results were eye-opening. They may be for our blog readers, too:


Click to enlarge

If you’d like to learn more about the testing, please contact Pervasive Big Data & Analytics.

*Additional Details on Pervasive DataRush vs Apache PIG testing:

  • Used TPC-H data
  • Generated 1TB data set in HDFS
  • Ran several “queries” coded in DataRush and PIG
  • Run times in seconds (smaller is better)

Cluster Configuration:

  • 5 worker nodes
  • 2 X Intel E5-2650 (8 core)
  • 64GB RAM
  • 24 X 1TB SATA 7200 rpm

Resources of Interest
YouTube Video:
Jim’s February 2013 Bay Area HUG presentation

Slideshare:
“A Visual Workbench for Big Data Analytics on Hadoop”

Jim’s article on Dataflow in Dr. Dobbs:
“Dataflow Programming: Handling Huge Data Loads Without Adding Complexity”

 

Pervasive Big Data & Analytics


Your name:
Gravatar Preview
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Title:
Comment:
Security Code
CAPTCHA image
Enter the code shown above in the box below
Add Comment   Cancel 

Search Big Data Blogs

Tags

Big Data (114)
Analytics (59)
Pervasive (50)
DataRush (32)
Hadoop (29)
predictive analytics (20)
Scalability (20)
Industry trends (18)
Multicore (15)
Data Mining (12)
Parallelism (10)
Java (9)
Cloud (8)
Cyber Security (8)
KNIME (8)
MapReduce (8)
RushAnalytics (7)
Volumes (7)
Algorithms (6)
Cost-effective (6)
Data Volumes (6)
Data Warehouse (6)
RushAnalyzer (6)
David Loshin (5)
Decision Support (5)
machine learning (5)
Dataflow (4)
Forrester (4)
Google (4)
Green IT (4)
Healthcare (4)
Actian (3)
analytics processes (3)
BigQuery (3)
data centers (3)
data integration (3)
Data Preparation (3)
data-driven (3)
DataMatcher (3)
Jim Harris (3)
Malstone B (3)
Opera Solutions (3)
Retail Analytics (3)
Security (3)
Solutions (3)
telecommunications (3)
transportation analytics (3)
YARN (3)
analytics tools (2)
Austin (2)
big data analytics (2)
Big Data Science (2)
Bloor (2)
Bloor Research (2)
Data Rush (2)
David Inbar (2)
David Norris (2)
fraud (2)
fraud detection (2)
Gartner (2)
GigaOM (2)
Hadoop Summit (2)
IntegrationWorld (2)
Julie Hunt (2)
machine generated data (2)
meetup (2)
Mike Hoskins (2)
ParAccel (2)
Pervasive DataRush (2)
Rexer Analytics (2)
software (2)
#FollowFriday (1)
Amazon (1)
analytics accuracy (1)
analytics workflow (1)
Application Development (1)
automation (1)
Benchmarks (1)
Business Intelligence (1)
Cloud Analytics Summit (1)
cloud computing (1)
Cloudera (1)
contests (1)
cost (1)
cyber security issues (1)
Data Integrator - Hadoop Edition (1)
data quality (1)
Data Science (1)
data tools (1)
Door64 (1)
easy big data analytics (1)
Ericson (1)
Esri (1)
Facebook (1)
Fuzzy Matching (1)
Goverment (1)
Hadoop User Group (1)
Hadoop World (1)
intelligent transportation systems (1)
Jazoon (1)
Jim Falgout (1)
MalStoneB (1)
Mansour Raad (1)
McKinsey (1)
Neil Raden (1)
Netflix (1)
NetFlow (1)
para (1)
PIG (1)
Predictive Analytics World (1)
psychohistory (1)
Public Sector (1)
Redshift (1)
Rosaria Silipo (1)
RushAccelerator (1)
Sampling (1)
Signal and Noise (1)
Smart Grid (1)
SmartDataCollective (1)
spatial analytics (1)
speed (1)
sports (1)
Stephen Swoyer (1)
Steve Shine (1)
Strata (1)
SXSW (1)
Telecom Analytics (1)
Telecommunications Industry Association (1)
TIA (1)
Transportation (1)
TurboRush (1)
VectorWise (1)
Zementis (1)

Big Data Blog Archives

Archive
<May 2013>
SunMonTueWedThuFriSat
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678
Monthly
Go