Predictive Analytics

Dispelling Confusion

Predictive analytics is the practice of analyzing a set of current data to find patterns, anomalies, and outliers. That analysis is then used to predict future trends, and to spot repeating patterns before they reoccur. That foreknowledge is used to guide business decisions to improve revenue, reduce costs,prevent fraud, and improve customer satisfaction.

Predictive analytics can also help make advances in scientific knowledge, improve healthcarereduce energy consumption in networks and data centers for a healthier planet, prevent cybersecurity breaches, and provide a lot of other benefits depending on how and where the same methods are applied.

Historically, large data sets have been processed on expensive specialized servers that many companies couldn’t afford, and that were difficult to scale up as data volumes continued to rapidly grow. Hadoop was developed in an effort to meet the challenge of big data processing in a way that is both scalable and affordable. Hadoop is an open source software framework that supports dividing data processing across multiple networked computers, aka distributed processing. These groups of computers are called clusters, and generally consist of inexpensive industry standard machines, not expensive high performance super computers.

The basic concept behind Hadoop is that all processing and data storage should be spread equally across the available computers. If one computer fails, it does no harm because the data is stored redundantly on more than one system and the processing also happens in more than one location. This makes clusters very resilient. As data volumes grow, compute and storage capacity can be added inexpensively by simply adding more computers (nodes) to the cluster. 

Read more about Hadoop and Actian

Actian DataRush is a patented application framework and data processing engine that was originally developed over a decade ago to take advantage of multi-core computers. DataRush detects the available cores, threads, CPU’s, etc. in any environment and adjusts the data processing workflow accordingly at runtime. Usually, multi-threaded programming is highly complex and difficult. With DataRush, much of the complexity of multi-core programming is handled by the framework. All the developer has to do is decide what processing steps are needed, and the framework handles how. This makes it far easier for developers to use than frameworks such as MapReduce, vastly reducing initial big data predictive analytics development time.

This also means that applications developed on DataRush automatically use hardware to full capacity. When DataRush workflows are placed on more powerful hardware, they automatically scale up to use the new hardware to full capacity. The DataRush developer can write and test an application on a 2-core laptop, and it will run at a near linear speed increase on a 384-core super server. Similarly, it will scale out to use every available core on a multi-node cluster, without any need to re-design.

While most data centers achieve 15% hardware usage at best, Actian DataRush clusters routinely experience 60-70% usage, with the capability to go as high as 90%. This provides tremendous processing speeds on modest, inexpensive hardware, and a huge potential savings in energy and carbon footprint.

Read more about the Actian DataRush™ Analytic Engine

See some examples of Actian DataRush Performance Metrics

Request a free trial of Actian DataRush

Actian DataRush has been a boon to Actian integration software users, application developers and custom solution developers for years. But not everyone is a programmer. We wanted to put that data crunching capability into the hands of data analysts, so they could custom design their predictive analytics workflows.

When Actian decided to make the Actian DataRush highly parallel analytics engine accessible to non-programmers, there was an obvious need for a graphical user interface. The straightforward KNIME interface on the easily extensible Eclipse platform was just what was needed. 

Actian developed DataRush-based distributed versions of common predictive analytics algorithms in the form of KNIME operators. Machine learning and predicting, clustering, classification, and regression algorithms can now execute at exponentially higher speeds, enabling analytics processes that would otherwise have been beyond the capacity of most affordable industry standard hardware. Distributed data access, data quality, and other data preparation operators were also created using the Actian DataRush framework.

Actian RushAnalytics provides data analysts the ability to design, test, and iterate analytics workflows that could function as fast as 100X KNIME’s already fast base processing speed, depending on hardware and configuration, and deploy those workflows on any hardware, including clusters, with a couple clicks of a mouse.

Actian RushAnalytics puts the power to get answers into the hands of the people asking the questions.

Read more about Actian RushAnalytics

Download a free trial of Actian RushAnalytics now

Accelerating Big Data 2.0™