Correlation between weather and website performance

This article should demonstrate one of many great features of Dataddo – fusing together data from many different sources and discovering valuable insights for your business. Writing it in the middle of extraordinary long heat wave is giving me an impulse to use long-observed correlation between performance of certain websites and weather as an example.

Methodology

Correlation is “a statistical technique that can show whether and how strongly pairs of variables are related”, therefore it is important to choose suitable datasets representing both variables. In following example I have chosen daily visits reported in Google Analytics in June 2015 to represent website performance and daily temperatures in June 2015 to represent weather. Of course, it is possible to choose other datasets that might fit better to your case such as daily transactions, article views or content interactions for website performance and daily rainfall or humidity for weather.

Obtaining the data

Since Dataddo features many different data connectors, obtaining the data is rather simple. Website performance dataset (daily visits / sessions from Google Analytics) is retrieved using Dataddo Google Analytics connector, setting dimension to “Date” and metric to “Sessions”. Daily temperatures (Prague, Czech Republic) are obtained from Czech Hydrometeorological Institute in CSV file and imported to Dataddo using CSV connector.

Google Analytics connector - daily visits CSV connector

Merging the data

Dataddo allows you to define structure – a collection of one or multiple data sources. Within each structure, you can define 1:1/1:n relations between the sources and thus fuse the data together. In following example date (“ga:date” and Date) is used as “bonding key”.

designer

Calculating correlation

Finally, merged data in the structure can be explored using Data explorer. Moreover, within Data explorer interface many statistical computations, including correlation, can be conducted. The calculated value of Pearson correlation coefficient  for “ga:session” and “AVG temperature” is -0.68, representing a loose negative linear correlation between both variables. As a result, weather (temperature more precisely) has a certain impact on performance of examined website.

explorer