Devising a ‘Big’ Data Strategy

We are entering an age where data is becoming an extension of our daily lives. With online and offline behaviour blending into a single entity, we are witnessing unprecedented creation and demand for information. This explosion of data fueled from new sources such as mobile devices, collaboration tools and social networks is creating a tremendous opportunity for businesses. This is an exciting time for organizations to achieve a new level of insight from this data. In order to capitalize on this opportunity, companies need a sound strategy for harnessing all of this information in an efficient and useful way.

Exploring the Causes of Data Explosion

There are many sources of data that are contributing to this remarkable data growth. In addition to the vast amounts of data coming from mobile, Internet and traditional sources, millions of users are now evolving from information consumers to producers, creating their own data at any given moment. Therefore, companies need a way to effectively leverage information and blend data coming from all these sources and then provide intelligence around it. In order to aggregate and make sense of it all, a company’s data strategy needs to account for significantly larger and constantly growing volumes of information to eliminate data performance problems that can greatly impact business operations.

For example, a data performance bottleneck that prevents IT from meeting nightly load windows for a mission-critical application means the next day’s operational data is out of date, introducing risk and error into business decisions and potentially introducing unexpected costs to the business. For a major health care management organization, failure to deliver hundreds of thousands of reports containing daily claims to insurance companies can result in critical errors administering and delivering health benefits to individuals. Similarly, at this very moment, a leading online retailer must analyze hundreds of millions of records to derive critical information about customer preferences, online behavior and latest trends. Failure to do so can result in severe revenue losses and customer attrition.

Common Approaches to Cope with Data Explosion

Companies often turn to the traditional methods of adding more hardware, pushing data transformations elsewhere, such as down into the database, or custom coding when addressing data performance problems that arise as data volumes grow. Though these methods are common, they are typically not the best way to tackle big data and can actually hinder an organization’s ability to quickly adapt and respond to changing business demands.

For example, adding hardware may shorten the elapsed time for data processing tasks, but it is costly at all stages, including the initial implementation and ongoing maintenance costs. Moreover, hardware alone can no longer keep up with the data growth rate many organizations are experiencing today. Pushing all heavy transformations out of ETL platforms and into the database creates other problems for the organization such as the inability to maintain data lineage and hindered agility. In many cases, companies find that they cannot deliver reports in an effective manner or cope with new requests for information. Custom coding can quickly become riddled with problems given its complex upkeep, ongoing labor costs and manageability issues.

The Recommended Approach: Bringing the Focus Back to Data Integration

Since data warehouses are no longer economically or physically capable of managing big data within today’s commercial databases, new technology frameworks like Hadoop are emerging to track and manage these unprecedented volumes of data. Therefore, when devising a big data strategy, companies need to account for not only enterprise data, but also new sources of data, and then determine the best way to integrate the two for timely, accurate access to information as a basis for making business decisions.

Whether leveraging a data warehouse or Hadoop environment to manage big data, an important first step is to look at the effectiveness of the data integration function. In other words, is it successfully transforming data into value? If you find that improvements could be made, then determine where the data performance bottlenecks are likely to occur. Most likely, you will find delays occur during the complex sorts, joins and aggregations of this large volume of data. The best approach is to identify and target the top 20 percent of these processes in terms of elapsed time and complexity. These are the same jobs that cause 80 percent of the problems. Therefore, addressing them first can result in relatively quick and easy gains with huge benefits to the organization. To manage big data for the long-term as data volumes continue to grow, you will want to eliminate the need for tuning and look to create a fast, efficient, simple cost-effective data integration environment.

Making Big Data Work for You

In today’s 24×7 business world with demand for timely and relevant information, devising a forward-looking big data strategy is critical to ensure your organization can effectively leverage data from all available sources and quickly turn it into a competitive advantage. Reducing total cost of ownership through a data integration approach that is lean and scalable will allow big data to reach its full potential for your organization.

By planning for the future and keeping an eye on strategy, companies will see not only performance increases but also major business successes. With a sound strategy in place, big data can actually help provide the key to unlocking an organization’s next big opportunity.

Source: Information Management

Mark Dexter

July 28th, 2011 View my profile

You might also like: