trends

Spurious Correlations – What they are and Why they Matter

Posted on Updated on

Image containing the word "Technology"

In an earlier post, I mentioned that one of the big benefits of geospatial technology is its ability to show connections between complex and often disparate data sets. As you work with Big Data, you tend to see the value of these multi-layered and often multi-dimensional perspectives of a trend or event. While that can lead to incredible results, it can also lead to spurious data correlations.

First, let me state that I am not a Data Scientist or Statistician, and there are definitely people far more expert on this topic than myself.  But, if you are like the majority of companies out there experimenting with geospatial and big data, it is likely that your company doesn’t have these experts on staff. So, a little awareness, understanding, and caution can go a long way in this scenario.

Before we dig into that more, let’s think about what your goal is:

  • Do you want to be able to identify and understand a particular trend – reinforcing actions and/or behavior? –OR–
  • Do you want to understand what triggers a specific event – initiating a specific behavior?

Both are important, but they are both different. My focus has been identifying trends so that you can leverage or exploit them for commercial gain. While that may sound a bit ominous, it is really what business is all about.

A popular saying goes, “Correlation does not imply causation.”  A common example is that you may see many fire trucks for a large fire.  There is a correlation, but it does not imply that fire trucks cause fires. Now, extending this analogy, let’s assume that the probability of a fire starting in a multi-tenant building in a major city is relatively high. Since it is a big city, it is likely that most of those apartments or condos have WiFi hotspots. A spurious correlation would be to imply that WiFi hotspots cause fires.

As you can see, there is definitely the potential to misunderstand the results of correlated data. A more logical analysis would lead you to see the relationships between the type of building (multi-tenant residential housing) and technology (WiFi) or income (middle-class or higher). Taking the next step to understand the findings, rather than accepting them at face value, is very important.

Once you have what looks to be an interesting correlation, there are many fun and interesting things you can do to validate, refine, or refute your hypothesis. It is likely that even without high-caliber data experts and specialists, you will be able to identify correlations and trends that can provide you and your company with a competitive advantage.  Don’t let the potential complexity become an excuse for not getting started. As you can see, gaining insight and creating value with a little effort and simple analysis is possible.