Time series models are used to forecast future events based on previous events that have been observed (and data collected) at regular time intervals. We will be taking a small forecasting problem and try to solve it till the end, learning time series forecasting alongside. Time series forecasting is a technique for the prediction of events through a sequence of time. The technique is used across many fields of study, from geology to behavior to economics. The techniques predict future events by analyzing the trends of the past, on the assumption that future trends will hold similarity to the historical trends.
Predicting electricity consumption using time series analysis by ARIMA model
Stages in Time Series Forecasting:
Solving a time series problem is a little different as compared to a regular modeling task. A simple/basic journey of solving a time series problem can be demonstrated through the following processes. We will understand about tasks which one needs to perform in every stage. We will also look at the python implementation of each stage of our problem-solving journey.
Steps:
Step 1: Visualizing time series
Step 2. Stationarising time series
Step 3. Finding the best parameters for our model
Step 4. Fitting model
Once we have our optimal model parameters, we can fit an ARIMA model to learn the pattern of the series. Always remember that time series algorithms work on stationary data only. Hence, making a series stationary is an important aspect
Step 5. Predictions
After fitting our model, we will be predicting the future in this stage. Since we are now familiar with a basic flow of solving a time series problem, let us get to the implementation.
Scatter plot of time series data points
We can also visualize the data in our series through a distribution.
First, we need to check if a series is stationary or not.
ADF (Augmented Dickey-Fuller) Test
The Dickey-Fuller test is one of the most popular statistical tests. It can be used to determine the presence of unit root in the series, and hence helps us to understand if the series is stationary or not. The null and alternate hypothesis of this test is:
Null Hypothesis: The series has a unit root (value of a =1)
Alternate Hypothesis: The series has no unit root.
If we fail to reject the null hypothesis, we can say that the series is non-stationary. This means that the series can be linear or difference stationary (we will understand more about difference stationary in the next section).
Results of Dicky-Fuller test
We see that the p-value is greater than 0.05, so we cannot reject the Null hypothesis. Also, the test statistics is greater than the critical values. So, the data is non-stationary.
To get a stationary series, we need to eliminate the trend and seasonality from the series.
After finding the mean, we take the difference of the series and the mean at every point in the series.
From the above graph, we observed that the data attained stationarity. We also see that the test statistics and the critical value is relatively equal.
There can be cases when there is a high seasonality in the data. In those cases, just removing the trend will not help much. We need to also take care of the seasonality in the series. One such method for this task is differencing.
Differencing is a method of transforming a time series dataset.
It can be used to remove the series dependence on time, so-called temporal dependence. This includes structures like trends and seasonality. Differencing can help stabilize the mean of the time series by removing changes in the level of a time series, thus eliminating (or reducing) trend and seasonality.
Differencing is performed by subtracting the previous observation from the current observation.
Perform the Dickey-Fuller test (ADFT) once again.
Values of p and q come through ACF and PACF plots. So let us understand both ACF and PACF!
Autocorrelation Function(ACF)
Statistical correlation summarizes the strength of the relationship between two variables. Pearson’s correlation coefficient is a number between -1 and 1 that describes a negative or a positive correlation respectively. A value of zero indicates no correlation.
We can calculate the correlation for time series observations with previous time steps, called lags. Because the correlation of the time series observations is calculated with values of the same series at previous times, this is called a serial correlation, or an autocorrelation.
A plot of the autocorrelation of a time series by lag is called the AutoCorrelation Function, or the acronym ACF. This plot is sometimes called a correlogram or an autocorrelation plot.
Partial Autocorrelation Function(PACF)
A partial autocorrelation is a summary of the relationship between an observation in a time series and observations at prior time steps with the relationships of intervening observations removed.
The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to the terms at shorter lags.
The autocorrelation for observation and observation at a prior time step is comprised of both the direct correlation and indirect correlations. It is these indirect correlations that the partial autocorrelation function seeks to remove.
Below code plots, both ACF and PACF plots for us:
Fitting model
In order to find the p and q values from the above graphs, we need to check, where the graph cuts off the origin or drops to zero for the first time. In the above graphs, the p and q values are merely close to 3 where the graph cuts off the origin ( draw the line to x-axis). Now, we have p,d,q values which we can substitute in the ARIMA model and see the output.
Lower the RSS value, the more effective the model is. You check with (2,1,0),(3,1,1), etc. to look for the smallest values of RSS.
The following code helps us to forecast shampoo sales for the next 6 years.
From the above graph, we calculated the future predictions till 2024. The greyed out area is the confidence interval which means the predictions will not cross that area.
Finally, we were able to build an ARIMA model and actually forecast for a future time period.
You can find full code on DLTK Github repo: https://github.com/dltk-ai/future-work/tree/master/Electricity-consumption-prediction-using-Time-Series-Forecasting-master
Prof. Sanjay Verma is area chair for aligning IT Business at IIM-A and has been mentoring fortunate few on developing great IT products for business.
Dr. Sanjay Verma holds his doctorate in space of Artificial Intelligence and is mentoring CIOs of variety of businesses. Government has appointed him as Independent director for one of India’s largest Public sector bank.
Mr. Sada Iyer played pivotal role in establishing HPE in India. He redefined Service Integration space in India. Sada is considered encyclopaedia of Banking across the globe and has lead globally BFSI division in world class firms like HPE and Oracle.
Sada has been sounding board to several banking and Insurance policy makers.
Experienced in Internal Audits, Risk Management , Corporate Governance and Business Advisory Services. He is a Certified Internal Auditor from The Institute of Internal Auditors, (USA), Certified Information Systems Auditor from ISACA (USA) , Certified Fraud Examiner from Association of Certified Fraud Examiners, (USA) & Specializes in Organizational Transformation, Risk Management and Corporate Governance.
Prof. K.C. John established Qualcomm in India. He is associated to World Economic Forum’s Sustainability Chapter. He has demonstrated a massive success in startup space by establishing successful firms back to back.
Currently he is on advisory board at Qubit AI and mentoring startups associated to Great Lakes Institute of Management and considered finest Professor to impart leadership lessons to Chief Executives.
Highly’ experienced in Research & Development, Strong knowledge Systems, product development, interpretation of National & International standards. Identifying product requirements / risks. Can solve any mechanical & electrical problems related to product development. Very Good at learning new things & implementing. Have 3 international patents.
Skilled in Product Management, AI/ML/DL,Domain expertise in various domains,Design and Lead AI COE, Skilled AI Trainer, Designing courses, Graduated Business Analytics professional from ISB.
Professional Chartered Accountant with experience in both Audit and Finance.