Short answer - there really is no difference.
Data analysis has always involved mining data sets, finding the patterns and anomalies, gleaning the important trends. Data analysts have always had to deal with limitations of analysis technologies with larger datasets. A lot of standard practices, such as sampling and aggregation, have been developed to deal with common problems of data sets that are too large to be analyzed in a reasonable time frame at a reasonable cost. (Sound familiar?)
The truth is that in many cases, standard analytics practices work fine with the data sets most businesses need analyzed.
So, what's changed? Why the big emphasis on big data analytics in the industry?
- New sources of valuable data are available now that can't be tackled with standard analytics technology
- Even the data sets that analysts are accustomed to working with are growing exponentially
That first one means that a lot of valuable data simply couldn't be mined until the technology caught up. That second one means more and more compromises. And slower and slower answers to business questions in a cutthroat economy where seconds can make the difference.
Solution: New technologies (such as Hadoop, Pervasive DataRush and Pervasive RushAnalytics) have been developed to economically analyze data in massive volumes at extremely high speeds.
These methods are essential when dealing with some of the new data sets, such as machine generated data, that dwarf the old school transactional data sets. And, even with less extreme data, the same technologies can give a speed boost that radically cuts the time to get answers to essential questions. They can also make compromises like sampling and aggregation less necessary, and make useful analyses such as anomaly detection more viable.
Example of data that couldn't be analyzed before: Netflow data streams are essential for network performance monitoring and problem diagnostics, but existing technology chokes on the volume and velocity. Pervasive captured, transformed into HBase and analyzed more than 1 million Netflow events/second, on a sustained basis, using just a 3-node cluster. The company can now get precise usage measurements, catch network breaches, a machine down and spot cybersecurity attacks such as invalid requests, page redirects and SQL injection attacks.
Example of accelerating an existing analysis: A global financial bank got their risk management solution processing time down from 15+ hours to about 20 minutes while reducing the hardware required by 50%.