Time series analysis is a statistical analysis that deals with trend analysis. Time series analysis is done using a time series data that spans across a period of time. In summary, it involves looking for the correlation between your dependent variable and time.
Facebook Prophet algorithm is an algorithm designed by facebook which is an open source time series forecasting algorithm. It builds a model by finding the best smooth line represented by:
y(t) = g(t) + s(t) + h(t) + ϵ
where:
g(t) = overall growth trend
s(t) = yearly seasonality, weekly seasonality
h(t) = holiday effect
In this demonstration, I analyzed the data by splitting it into univariate and multivariate analysis using facebook prophet.
Univariate analysis deals with one variable while multivariate deals with multiple variables.
Data preprocessing:
After I imported the libraries and loaded the dataset, I carried out some visualizations which included the first 10 rows, statistical information, its information and checking for null values. I parsed the date-time in the dataset which formats it into a string.
Using the fill forward method, I filled the null values.
I created another column which was the product of the 'global active power' multiplied by 1000, divided by 60 and subtracted 'sub-metering 1 through 4'.
I created a variable named hpc_daily which is the resampling of D, then a hpc_index which reset the hpc_daily index and finally renamed the columns dt and global active power to ds and y respectively.
Univariate Analysis:
I split the dataset into train and test. The train dataset had the range from 16th December, 2006 to 26th November, 2009 and the test dataset was beyond 26th November 2009. After I imported the facebook prophet model, instantiated and trained the model, I predicted using the test data for a period of 365 days as a forecast.
Visualization:
I plotted the forecast and its components.
Multivariate:
Using the same split for the train and test and importing the model, I had to add external regression. This is because this algorithm is not able to model some of points in the training data.
After I trained the model, the prediction was shown as follows:
Plotting the components:
Finally, using its diagnostics for cross validation and performance metrics, the scores are as follows:
Conclusion:
It's a worthy notice that both models are in two separate notebooks, this is because I couldn't add a regressor to the multivariate after fitting the model as this process is supposed to come before fitting. This is the repo to this project and you can connect with me on LinkedIn . Thank you for reading.