By Mark Dexter
17th January 2012

I had the opportunity to participate in two pretty good conferences in 2011. The first was the data science-oriented Strata by O'Reilly in February, the other the more traditional BI/Netezza-focused Enzee Universe in June.

One striking difference between the two was the median age of participants: it seems baby-faced “Stratans” might well have been the progeny of graybeard “Enzeens.” I believe that age divide between the nascent DS and the now-mature BI is most telling.

In the earlier posts of this series, I've proposed that both data science and BI share underpinnings of business, technology and statistical science. According to data scientist Drew Conway: “First, one must have hacking skills …(which) in this context mean proficiency working with large, unstructured chunks of electronic data … Second, one needs a basic understanding of mathematics and statistics … Finally, and perhaps most importantly, a data scientist must have some substantive expertise in the data being analyzed.”

Data science and BI differ in the foci of their investigations. DS is consumed with supporting the development of data products. As Monica Rogati of LinkedIn notes, “On one side, I’ve been working on building products … The other side is finding interesting stories in the data.” BI, on the other hand, is all about measuring and managing business performance. At their best, though, both disciplines have an evidenced-based “science of business” foundation that makes me reject the contention by some that data science has a higher calling and is more scientifically sophisticated than BI.

DS and BI relate differently to the critical data that feeds them. Over time, BI has become obsessed with absolute answers using complete, precise, high-quality information, while data science often bludgeons solutions, settling for approximate responses from incomplete but massive data sets.

The maturity divide between data science and BI carries with it a number of cultural differences. At present, BI is probably more methodical and bureaucratic than DS, though impatient-with-IT DS'ers argue that's a good thing. I suspect with maturity comes a governance “advantage” for BI as well. DS seems unencumbered with these “shackles,” but will probably start to look more like BI organizationally in time. Indeed, I believe that BI's methodical, governed approach will positively impact DS, just as DS's get-it-done intolerance of sloth and bureaucracy will rattle BI for the better.

With maturity comes an age divide that shows in platform software choices. Young DS'ers arrive at commerce from academia armed with the open source tools they learned in school: Perl/Python/Ruby for data integration, MySQL and Postgres for database management, R for analytics and graphics and, increasingly, Cloud computing and the Hadoop ecosystem for big data handling.

BI'ers, in contrast, are more likely to have settled in over the years on proprietary offerings from big technology vendors for their BI tasks – e.g. Informatica or DataStage for integration, Oracle or IBM-Netezza for database management, BusinessObjects or Cognos for query and reporting, and SAS or SPSS for analytics.

With maturity also comes a work group size difference that promotes a wider division of labor in BI than in DS. In large BI shops now there are business analysts, data analysts, DBAs, infrastructure specialists, developers, user experience experts, analytics experts, statisticians, et al. While the more sophisticated DS shops are rapidly growing and diversifying, many are still relatively small with jack of all trade contributors.

My take is that over time DS and BI will start to look more alike as areas of intersection between the disciplines grow. Indeed, OpenBI's seeing that now with several of our current big data customers, where the database group and the Hadoop guys are already starting to align. We suspect the BI/ETL-OLAP teams and the stats geeks to start meshing forces soon as well. For these customers, the organizational distinctions between DS and BI may soon vanish. New data products and business performance evaluation will both be driven from a common analytics infrastructure. After all, is marketing attribution a data product or BI performance evaluation?

Current BI and DS vendors will play a key role in expediting this confluence as new versions of their platforms combine BI and big data capabilities. Anyone who's coded MapReduce in Java understands the productivity and maintenance benefits of using a higher-order language to program big data jobs in Hadoop. Hive and Pig are already being used in that capacity. Now, BI ETL software such as Pentaho Data Integration (PDI) is making MapReduce programming even more accessible to developers. Pentaho Business Analytics with Hadoop will promote “Integration of big data tasks into the overall IT/ETL/BI solutions.”

Commercial R purveyor Revolution Analytics has jumped into the big analytics fray head first with enhanced support for large data and distributed computing, as well as integrations to both Netezza and Hadoop. They recently announced a partnership with Apache Hadoop provider Cloudera to introduce “RevoConnectR for Apache Hadoop, a collection of open-source packages that allows R programmers to access Hadoop HDFS and HBASE data stores in Apache Hadoop directly from R and write MapReduce jobs with R.”

And let's not overlook cagey Oracle, who just announced Oracle R Enterprise, its own integration of RDBMS and R statistics. The software will allow “analysts and statisticians to run existing R applications and use the R client directly against data stored in Oracle Database 11g – vastly increasing scalability, performance and security.”

Look for 2012 to be the year that BI and DS start to get on the same page. That's great news for both business intelligence and data science.

Read at source: Information Management


Currently there are no comments. Be the first to post one!

Post Comment


What makes a successful data contractor?

I’ve been working in the data and analytics industry for several years now; in that time I have recruited for plenty of contractors. The demand for skilled contractors is always high, and it shows no sign of slowing down. Making... Read More

Should I accept a counter offer?

Simple answer, no. Counter offers are becoming more popular, especially for highly skilled technical staff that are hard to come by. Many employers give counter offers to try and stop their employees from leaving and to give them something to... Read More

Can big data improve CSR?

Many companies and industries are using big data to understand their customer’s behaviour, business needs and product sales, it helps the companies to keep going and powers them to improve. But this data that is being collected by almost every... Read More

10 top tech predictions

The technology industry is ever changing, and people are constantly talking about the future. With the development of tech like VR and AI making huge advancements from even a year ago it’s no wonder people are fascinated by what’s to... Read More

Where should we send our newsletter?