What's all the hoopla about Hadoop?
Hadoop is in the industry news a great deal these days. It's fascinating technology to the geeks of the world, but what does it do for a business? How does that technology help you solve business problems like targeted marketing, cybersecurity, medical claims fraud detection, network optimization, or smart grid energy usage optimization? How can it advance life sciences or pharmacology?
What is Hadoop exactly? Apache Hadoop is an open-source software framework that supports data-intensive distributed applications. But what does that really mean? How is Hadoop used? In what situations is Hadoop needed? How does it work?
And where does Actian fit in? What advantage is there to using Actian DataRush with Hadoop, as opposed to Hadoop alone? Is Actian a Hadoop replacement? Does it execute natively on Hadoop? What the heck does "execute natively on Hadoop" even mean?
That's a lot of questions. Let's see if we can shed a little light on the answers.
Apache Hadoop Matters Because ...
Historically, large data sets for business analytics and scientific advancement were processed on expensive specialized servers that many companies couldn’t afford, and that were difficult to scale up as data volumes rapidly grew. Hadoop is a different strategy for processing massive amounts of data in small amounts of time. What makes it different?
It's Affordable - Hadoop is a software framework that supports dividing data processing across multiple networked computers, aka distributed processing. These groups of computers are called clusters, and generally consist of inexpensive industry standard machines, not expensive high performance super computers or appliances. Hadoop itself is open source, minimizing software license fees.
It's Resilient - The basic concept behind Hadoop is that all processing and data storage should be spread equally across the available computers in a cluster. If one computer fails, it does no harm because the data is stored redundantly on more than one system and the processing also happens in more than one location. This makes Hadoop clusters very resilient to failure.
It's Scalable - As data volumes grow, compute and storage capacity can be added inexpensively by simply adding more standard servers (called nodes) to the cluster. This makes Hadoop clusters infinitely scalable so that as businesses and their data processing needs grow, the processing power grows right along with them in small affordable increments.
Affordable, scalable, resilient data processing power is what makes Hadoop so exciting. The uses that power can be put to are wide and varied, but business and scientific analysis of massive amounts of data is the obvious sweet spot.