Data Correlation, Data Causality or Data Explanation? Part 1

If we take a helicopter view above data management then we see all kinds of different methods to read data and determine the worth of data. The question about the trustworthiness of data is the important factor. There are many different ways to answer this question. How about data correlation, data causality or data explanation. When does which method fits best?

The method that we try to clarify in this article is the method of data correlation.

 Definition of Data Correlation

Correlation is any of a broad class of statistical relationships involving dependence, though in common usage it most often refers to the extent to which two variables have a linear relationship with each other. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price. Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. (by Wikipedia)

In practice

I am not a mathematician or a data scientist but I am finding this subject very interesting. Especially when we look at these different approaches in different markets with different goals. The Big Data consuming organizations of use the data correlation method such as Google or Amazon. They use this correlation to predict a certain happening.

A common and well known example is the prediction of Influenza by Google in the United States. Based on the search queries by people in the country before they really catch the flu, Google was able to predict where the flu would break out. For example if a lot of people in the area of Indiana typed in the search queries; flu, catching a cold, medicine against flu, grandmothers receipt against flu and other related searches, Google catches all these Influenza related data and the large amounts give them insight in possible outbreaks. They are able to define these outbreaks even before the government does.

In this case data correlation is about the “What” question and not the why.