Are You Collecting the Right Data?
For companies seeking to answer questions, solve problems, and make decisions via data, there has never been a better time to do so. The technology to collect and store data is declining in cost, and the tools to analyze that data are even cheaper (or free). So what is it that still causes analytics initiatives to fail?
The most difficult part of any analytics project is not the technology, the mathematics, or the business objective, but instead the marrying of the technology and mathematics to the business objective. Often, this is caused by not having the right data to solve the specific business objective at hand. This isn't a new problem, and requires more critical thinking than technical skill.
During World War II, the United States Navy was concerned about the durability of the bomber planes that were flying over Germany. They decided to conduct a study on the planes that had been shot and returned to base, and identify where the bombers were experiencing the most damage (right wing, left wing, tail, etc.). The idea was that they would add additional armor to the areas that were most often wounded.
A statistician named Abraham Wald stepped in to assist the Navy with their study, and found a critical flaw in their methodology: they had no data on the planes that were shot down and crashed! Instead, they only had data on the planes (and pilots) that made it back to base. It did not make sense to outfit the bombers with additional armor based upon information collected on those that survived the gunfire.
In statistics, this is a form of sample bias, resulting from a lack of data that correctly addresses the question at hand. This phenomenon shows up often at the intersection between data and business today. "Fraud detection" is a hot topic in the world of machine learning, though most companies (with the exception of perhaps large banks and credit card vendors) simply do not have enough observations of fraud in their database to predict instances of fraud accurately. Worse, they may not even be collecting data on instances of fraud.
This is not to say that a company trying to detect fraud (or predict anything, for that matter) cannot solve a similar problem that matches the data they do have. They key is to have the expertise and wherewithal required to shape the data to match the business objective, as well as to begin to collect data relevant to the problems you would like to solve.