Or is it ‘lies, damn lies and statistics’ depending on your stance?
I’m sure most of you in the Information Management market will have read the book Freakonomics by Steven Levitt and Stephen J. Dubner. After all, it’s of general interest to anyone who works with or is interested in data, and it has been around for a while.
But just in case you haven’t read it, here’s a brief synopsis. Published in 2005 and having sold more than 5.5 million copies, this non-fiction book examines a number of every-day beliefs and applies the authors’ brand of economics to debunking the popular thinking on areas such as cheating by teachers in exams, sumo wrestling and crime figures. The underlying tenet of all of the findings is that people (and for that matter all intelligent animals) are motivated by incentives, or put another way, rewards, for their actions. And they come up with some pretty radical reappraisals of commonly held beliefs. They use data mining techniques to uncover unusual patterns and analyse this data with a view to giving answers to questions that intrigue them.
What struck me when reading this in the age of Big Data and data Science is how much fun a group of data scientists could have by coming up with their own interpretations. It’s easy to read and understand a new radical reason as to why crime figures in the US fell in the mid ‘90s; the legalisation of abortion in the 70s, natch. I thought this was an amazing conclusion but was fully convinced by the arguments put forward.
But there’s a whole host of other data to analyse here (prison population, the economy etc) and would today’s data science techniques and technologies find another pattern here? Or why can Sumo wrestlers with a less good record suddenly pull out all the stops and win their final qualifying bout against a better opponent? Match fixing, of course.
But is this the case? Did the authors’ noughties data mining and analysis techniques really get to the (ample) bottom if it? Or could their conclusion be unpicked by a re analysis of the available data? Perhaps just good old effort on the part of the underdog who really needed to win to progress? But how do you quantify this from data. Data Science anyone?
Just ask the people involved in the ‘is global warming caused by man’ debate; everyone seems to have a different answer based on whatever data they are looking at but in this case politics often plays a huge part in which side of the fence you come down on.
So I come back to my question about what today’s data scientists might uncover from the data behind Freakonomics’ answers. Would today’s technology along with modern data science thinking change the results?
What do you think? Have you read the book and/or its follow ups? Did you wonder how you might have come up with different answers? If you work in the data science field have you tried to take a new look at the results?