1. Auto-detect available cores at runtime.
Actian DataRush is a programming framework and data processing engine that detects the available cores, threads, CPU’s etc in any environment and adjusts the data processing workflow accordingly at runtime.
That last bit is important. Since the analytics workload isn't divided up until execution time, the application can be built one time, and executed on any hardware. In each hardware configuration, DataRush detects the available hardware, divides the work accordingly, and then uses every bit of power available.
Add more or better hardware, get more speed. No re-programming necessary.
2. Let the framework handle the complexity of multi-threaded programming, let the programmer focus on the program.
Much of the low level, tedious complexity of multi-core programming is handled by the DataRush framework. The developer can focus on the processing steps that are needed to accomplish the business goal, and the framework handles how that work is divided. This makes it far easier for developers to use than frameworks such as MapReduce, vastly reducing initial development time.
3. Detect available CPU's in every node in a cluster
Actian DataRush detects, not just the compute power available on individual machines, but on clusters of machines. Applications built on DataRush can detect at runtime the number of available cores, CPU’s, threads, etc on each machine in a cluster and divide the data analytics workload appropriately.
While most data centers achieve 15% hardware usage at best, Actian DataRush clusters routinely experience 60-70% usage, with the capability to go as high as 90%. DataRush resource usage is often deliberately capped at 70% to leave overhead for other applications. Since, other applications rarely use more than 10-15% of the available hardware power, this resource sharing makes optimum use of any hardware. This provides game- changing processing speeds on modest, inexpensive hardware, as well as a huge savings in energy costs and carbon footprint.
4. Use every type of parallelism possible to get the best processing speed.
Horizontal Parallelism - DataRush, like MapReduce, makes use of horizontal partitioning, also known as data parallelism, or embarrassingly parallel processing.
Vertical Parallelism - DataRush, unlike most MapReduce implementations, or most other analytics frameworks, with the exception of some High Performance Computing models, uses vertical partitioning, also known as task parallelism.
Pipeline Parallelism - DataRush also uses a data flow paradigm which takes advantage of pipeline parallelism, which is also a framework normally only seen in High Performance Computing models.
Not every project can use every type of parallelism, but DataRush always uses a dataflow architecture, and automatically divides the workload as many ways as possible in the available environment.
Add it all together
This means that applications developed on DataRush can be built once, and deployed anywhere. A DataRush developer can write and test an application on a 2-core laptop, and it will run at a near linear speed increase on a 384 core super server or a 300 node cluster. This has given us the fastest, most efficient and economical analytics data processing engine on the planet. Fastest to develop on, fastest to deploy, and fastest to process.
See some Actian DataRush Performance Metrics