Data Preparation Shouldn't Consume You

Have you ever thought, "There has to be a better way?"

If you've ever had to tackle an analytics problem from the beginning, you know you've got a lot of work to do before you can get down to the actual analytics. Accessing the various forms of data can be a challenge in itself, and that's just the beginning.

60 - 80% of the time spent on data analytics projects is spent preparing the data for analysis.

You have to join and sort your sources, profile your data, find problems and duplicates, fill in missing values, reject or enhance incomplete or inaccurate data, calculate means and averages, adjust for weighting and scale issues, and a hundred other little transformations that turn raw data into something you can get real value from. All of that can take far longer than developing and testing the analytic models themselves.

What if a single platform could do all of that, without coding, in a fraction of the tiime?

What if you could do all of that on your desktop, and scale up clear to a hadoop cluster if needed, by just changing a setting? 

Sure, all that would be great, but that's impossible, right?

Not Impossible. Actian.

Actian RushAnalytics supports access to all standard databases, anything with a JDBC connector, flat files, and delimited files, as well as standard visualization and predictive model exchange files such as PMML and GEXF, so you can use your visualization tools of choice. Actian can access data as fast as the source system can possibly feed it in, even if you're talking millions of records per second, and it can read and write any combination of sources simultaneously. 

Actian also reads and writes Hadoop (HDFS and HBase) data. But we're not just talking about a connector to Hadoop, which everyone seems to have these days. We're talking about the ability to read HDFS and HBase where they live, on a cluster, in a distributed, high speed, parallel fashion that means reading , writing and transforming massive amounts of Hadoop data can be done faster than you might think possible.

We have performance metrics for processing Hadoop on a tiny, inexpensive cluster that would blow your mind. Give Actian software more powerful hardware to run on, and RushPrep accesses the data that much faster with near linear automatic scaling.

Data access has never been more efficient.

Check out Actian RushAnalytics 

Or, download a free trial version of RushAnalytics now.

In our experience, most analytics processes follow a pattern. All the data preparation has to be done in a separate system, using separate software and hardware, and the prepared data loaded into a special location, such as an analytics database, before analytics can begin. If you're used to this pattern, you may have run into some common frustrations. Do any of these sound familiar?

  • When you need to add a dataset, or a select a different set of columns, or any other change to the data to answer a new question, you have to wait for IT to make that change.
  • You have miscommunications between the two teams and end up with the wrong data, and then have to wait again.
  • Your company has to maintain two sets of expensive hardware and software environments.
  • Initial data preparation setup take weeks or months before you can even begin digging into the data.

What if you could do data preparation and analytics in one point and click software platform, on say, your desktop computer? What if you could change the selection of source data, the profiling thresholds, the type of join, the aggregation algorithm, the data type transformation, or any other data preparation aspect with a few mouse clicks whenever you needed to? And do it yourself, not have to wait on IT to get around to it.  What if you could run quick tests, realize you didn't have quite the right data, tweak the data preparation, and run the test again, in minutes? 

You can.

Check out Actian RushAnalytics 

Or, download a free trial version of RushAnalytics now.

Benefits of powering your analytics workflow with Actian RushPrep include:

  • Vastly reduce time spent preparing data.
  • Audit all incoming data, not just samples. Don't miss unusual problems that can crash the system.
  • Speed deployment with automatic scaling.
  • Reduce cost and complexity of deployment by fully leveraging  industry standard multicore servers.
  • Improve green efficiency and save energy dollars by making optimum use of available hardware.
  • Easily integrate with existing analytics such as R, SAS, etc. 
  • Feed data to high performance analytics databases at similar high speed.

Download a free trial version of RushAnalytics now.

“Always the first step is discovering the data problems.”

Joseph A di Paolantonio
VP and Principal Analyst
Constellation Research

 

Example: 

A claims processing service took 26 days to do data preparation for analysis of 250 million claims. 

With DataRush, they did the same thing in less than one day, dramatically expanding their ability to detect fraud or claims mismanagement.

What could you accomplish if your data was prepared for analytics 26 times faster?


 

Accelerating Big Data 2.0™