Big Data Blog

Big Data Blog

Pervasive DataRush Performance Testing Results Wow HUG

Mar 1

Written by:
3/1/2013 6:07 AM  RssIcon

Pervasive Big Data & Analytics Chief Technologist Jim Falgout recently had an opportunity to speak to the Bay Area Hadoop User Group (HUG), along with Mukund Madhugiri and Baljit Deot of Yahoo! and Hari Shreedharan of Cloudera. Jim discussed the major barriers to effective Hadoop deployments in the enterprise – complexity and the steep learning curve of MapReduce.

He detailed how Pervasive Big Data & Analytics solves these issues through a visual workbench integrated with Apache Hadoop that enables data scientists and analysts to build and execute complex big data workflows for Hadoop with minimal training and without MapReduce knowledge. A long-time evangelist of the DataFlow approach to big data, which is woven into the Pervasive DataRush framework, Jim discussed the key concepts behind it –  libraries of pre-built operators, the use of directed graphs, pipeline parallelism and its “share-nothing" architecture – and provided the specific enterprise benefits of Pervasive DataRush, Pervasive RushAnalytics and our accelerator for KNIME.

TPC-H Performance Testing: Pervasive DataRush vs. Apache PIG
A highlight of Jim’s discussion came when he showed the results of TPC-H performance testing* in which Pervasive DataRush showed superior performance over comparable Apache PIG scripts. For a number of HUG participants the results were eye-opening. They may be for our blog readers, too:


Click to enlarge

If you’d like to learn more about the testing, please contact Pervasive Big Data & Analytics.

*Additional Details on Pervasive DataRush vs Apache PIG testing:

  • Used TPC-H data
  • Generated 1TB data set in HDFS
  • Ran several “queries” coded in DataRush and PIG
  • Run times in seconds (smaller is better)

Cluster Configuration:

  • 5 worker nodes
  • 2 X Intel E5-2650 (8 core)
  • 64GB RAM
  • 24 X 1TB SATA 7200 rpm

Resources of Interest
YouTube Video:
Jim’s February 2013 Bay Area HUG presentation

Slideshare:
“A Visual Workbench for Big Data Analytics on Hadoop”

Jim’s article on Dataflow in Dr. Dobbs:
“Dataflow Programming: Handling Huge Data Loads Without Adding Complexity”

 

Pervasive Big Data & Analytics

Search Big Data Blogs

Tags

Big Data (126)
Analytics (66)
Pervasive (50)
DataRush (33)
Hadoop (31)
Industry trends (22)
predictive analytics (20)
Scalability (20)
Multicore (15)
Data Mining (12)
Parallelism (10)
Java (9)
Jim Harris (9)
KNIME (9)
Cloud (8)
Cyber Security (8)
MapReduce (8)
big data analytics (7)
Data Volumes (7)
Data Warehouse (7)
RushAnalytics (7)
Volumes (7)
Actian (6)
Algorithms (6)
Cost-effective (6)
David Loshin (6)
Decision Support (6)
Julie Hunt (6)
RushAnalyzer (6)
analytics tools (5)
Dataflow (5)
machine learning (5)
Data Science (4)
Forrester (4)
Google (4)
Green IT (4)
Healthcare (4)
Phil Simon (4)
YARN (4)
analytics processes (3)
Big Data Science (3)
BigQuery (3)
Bloor (3)
data centers (3)
data integration (3)
Data Preparation (3)
data tools (3)
data-driven (3)
DataMatcher (3)
machine generated data (3)
Malstone B (3)
Mike Hoskins (3)
Opera Solutions (3)
Retail Analytics (3)
Security (3)
Smart Grid (3)
software (3)
Solutions (3)
telecommunications (3)
transportation analytics (3)
Age of Data (2)
analytics accuracy (2)
architecture (2)
Austin (2)
Bloor Research (2)
Business Intelligence (2)
data management (2)
Data Rush (2)
David Inbar (2)
David Norris (2)
fraud (2)
fraud detection (2)
Gartner (2)
GigaOM (2)
Hadoop Summit (2)
IntegrationWorld (2)
intelligent transportation systems (2)
internet of things (2)
McKinsey (2)
meetup (2)
ParAccel (2)
Pervasive DataRush (2)
Rexer Analytics (2)
smart meters (2)
#FollowFriday (1)
a (1)
Amazon (1)
analytics workflow (1)
Application Development (1)
automation (1)
Benchmarks (1)
best practices (1)
Cloud Analytics Summit (1)
cloud computing (1)
Cloudera (1)
contests (1)
cost (1)
cyber security issues (1)
data flow architecture (1)
Data Integrator - Hadoop Edition (1)
data quality (1)
data visualization (1)
digital marketing (1)
Door64 (1)
easy big data analytics (1)
Ericson (1)
Esri (1)
Facebook (1)
Fuzzy Matching (1)
Goverment (1)
Hadoop User Group (1)
Hadoop World (1)
hardware (1)
HBase (1)
HDFS (1)
industrial internet (1)
Jazoon (1)
Jim Falgout (1)
MalStoneB (1)
Mansour Raad (1)
Neil Raden (1)
Netflix (1)
NetFlow (1)
operational intelligence (1)
Paige Roberts (1)
para (1)
PIG (1)
pilot projects (1)
Predictive Analytics World (1)
psychohistory (1)
Public Sector (1)
Redshift (1)
Robin Bloor (1)
ROI (1)
Rosaria Silipo (1)
RushAccelerator (1)
RushLoader (1)
Sampling (1)
Signal and Noise (1)
SmartDataCollective (1)
spatial analytics (1)
speed (1)
sports (1)
Stephen Swoyer (1)
Steve Shine (1)
Strata (1)
SXSW (1)
Telecom Analytics (1)
Telecommunications Industry Association (1)
TIA (1)
Transportation (1)
TurboRush (1)
VectorWise (1)
Zementis (1)

Latest Posts

Actian Big Data & Analytics Blog has MOVED!
Big Data Phrenology
Big Data, Simpson's Paradox and Sufficient Tools
Data Science and the Art of Data Visualization

Big Data Blog Archives

Archive
<April 2014>
SunMonTueWedThuFriSat
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910
Monthly
Go

Accelerating Big Data 2.0™