Big Data is More Than Hadoop
By Mark Dexter
28th January 2012

We recently published the results of our benchmark research on Big Data to complement the previously published benchmark research on Hadoop and Information Management. Ventana Research undertook this research to acquire real-world information about levels of maturity, trends and best practices in organizations’ use of large-scale data management systems now commonly called Big Data. The results are illuminating.

Volume, velocity and variety of data (the so-called three V’s) are often cited as characteristics of big data. Our research offers insight into each of these three categories. Regarding volume, over half the participating organizations process more than 10 terabytes of data, and 10% process more than 1 petabyte of data. In terms of velocity, 30% are producing more than 100 gigabytes of data per day. In terms of the variety of data, the most common types of big data are structured, containing information about customers and transactions.

However, one-third (31%) of participants are working with large amounts of unstructured data. Of the three V’s, nine out of 10 participants rate scalability and performance as the most important evaluation criteria, suggesting that volume and velocity of big data are more important concerns than variety.

This research shows that big data is not a single thing with one uniform set of requirements. Hadoop, a well-publicized technology for dealing with big data, gets a lot of attention (including from me), but there are other technologies being used to store and analyze big data.

The research data shows an environment that is still evolving. The majority of organizations still use relational databases but not exclusively: More than 90 percent of participants using relational databases also use at least one other technology for some of their big-data operations. One-third (34%) are using data warehouse appliances, which typically combine relational database technology with massively parallel processing. About as many (33%) are usingin-memory databases. Each of these alternatives is being more widely used than Hadoop. As well, 15% use specialized databases such as columnar technologies, and one-quarter (26%) are using other technologies.

While these technologies enable organizations to do things they haven’t done before, there is no technological silver bullet that will solve all big-data challenges. Organizations struggle with people and process issues as well. In fact, our research shows that the most troublesome issues are not technical but people-related: staffing and training. Big data itself and these new approaches to processing it require additional resources and specialized skills. Hence we see high levels of interest in big-data industry events such asHadoop World and the Strata Conference. Recognizing the dearth of trained resources here, some academic institutions have launched degree programsin analyzing big data, and IBM has started BigData University.

Research participants cited real-time capabilities and integration as their key technical challenges. The velocity with which they generate data and the fact that over half the organizations analyze their data more than once a day are forcing them to seek real-time capabilities; the pace of business today demands that they extract as soon as possible all useful information to support rapid decision-making.

With respect to integration, less than half of participants are satisfied with integration of third-party products, and almost two-thirds cite lack of integration as an obstacle to analyzing big data. Three-quarters have integrated query and reporting with their big-data systems, but more advanced analytics such as data mining, visualization and what-if analysis are seldom available as integrated capabilities. Responding to such comments, vendors have been racing to integrate their business intelligence and information management products with big-data sources. As you consider big-data projects and technologies, make sure that the vendors you select can handle the big-data sources you must use.

Looking ahead we expect more changes in this evolving landscape. In some ways big-data challenges and the presence of Hadoop in particular have paved the way for other technologies besides relational databases. NoSQL alternatives, such as Cassandra, MongoDB and Couchbase, are gaining notice in enterprise IT organizations after the success of Hadoop. In-memory databases, once considered a niche technology, are being considered by SAP, in HANA, as its primary big-data analytical platform. There are differing opinions about whether these various big-data technologies will converge or diverge. We can look to the past for some indications of where the market might go. Over the years a variety of alternatives to relational databases have emerged, including OLAP, data warehouse appliances and columnar databases; each eventually was absorbed into relational databases.

We also see signs of the major relational vendors embracing big-data technologies. IBM acquired Netezza for its massively parallel data warehouse appliance technology. IBM has also invested heavily in Hadoop. Oracle introduced its own line of data warehouse appliances and recently brought a big-data appliance to market that includes Hadoop and NoSQL technologies. Microsoft has invested in massively parallel processing and Hadoop. We also see independent vendors such as Hadapt combining relational database technology with Hadoop. The past is not necessarily an indication of the future, but our research shows and recent market dynamics suggest it may be premature to write off the relational database vendors as out of touch.

In light of this information, I recommend that your organization explore various alternatives for solving specific challenges. At a minimum you should be aware of the alternatives so when the need arises you will know what is available. Use our big-data research to guide your use of these technologies and to help avoid some of the obstacles they present so you can be more successful in applying big data to business decisions.

Read at source: David Menninger, Ventana Research, Information Management


Currently there are no comments. Be the first to post one!

Post Comment


Top 5 articles of 2017

2018 is almost upon us, and now is a time for looking back over the year that has been 2017. Here at KDR we have a had a very busy year; with our brand-new expansion to the USA , setting up... Read More

Information Matters – Real-time analytics & consumer spending

We are very proud to announce the fourth issue of Information Matters ! As recruiters in the information management and data analytics industry we consider it vital to be in the know about issues and events facing our industry and your... Read More

How the evolution of AI is transforming the e-commerce industry

Artificial Intelligence has unleashed the power for e-commerce businesses to explore countless opportunities to dramatically improve customer experiences, generate new leads and better understand their customers. Businesses are continuing to evolve and are steadily incorporating Artificial Intelligence into their strategies... Read More

Why do I headhunt?

The data and analytics industry is a competitive market, with many of the best candidates not actively looking for roles. This means as a recruiter I have to search and sometimes ‘cold call’ the best candidates. Headhunting calls (and recruiters)... Read More

Where should we send our newsletter?