Machine learning is becoming a mainstream data analysis and prediction tool. Though machine learning happens to be a branch of Artificial Intelligence, it is also a subfield of statistics. Nevertheless, it is used in all the fields of science and technology. Electrical engineering is not staying behind in employing this marvelous tool in various applications. We will be discussing such kind of application in different posts. This post will focus mostly on time series analysis, which predicts uncertain parameters where future values are mostly unpredictable.
Author: Sharif Atique et el
The author has used the ARIMA for both the seasonal and non-seasonal
variations. Authors done this in three steps
- The time-series data converted to a stationary one
- Determined the model parameter data
- validated using Akaike information criterion (AIC) and residual sum of squares (SSE)
model to forecast solar energy generated by a solar panel.
A bit of ARIMA: ARIMA stands for Autoregressive Integrated Moving Average, is a statistical model used for forecasting stationary time series model. Depending on the values of p,d, and q, an ARIMA process can undertake the form of purely moving average (MA), purely autoregressive (AR), or autoregressive moving average (ARMA) processes.
As per this paper ARIMA can only model stationary time series. if the time series is weakly stationary then also they consider the series as stationary. Stationary time series has fixed mean and variance till the entire series.
Transformation operation such as differencing logging and deflating will be performed to make the series stationary.
In terms of seasonality the seasonal ARIMA model will be used for if the series shows any seasonality. otherwise normal ARIM model will be used.
Seasonality is if there is any periodic pattern is visible over a given period of time in the data then it is called Seasonal time series data.
Seasonal ARIMA is expressed in the following way where p,d and q is seasonal AR order, seasonal MA order and seasonl differencing order respectively. S is for span of pattern in seasonality.
ARIMA(p, d, q) * (P,D,Q)s,
Model Parameter Selection:
ACF = Autocorrelation Function. It measures the correlation between a given value with its past lag values of that time-series data. for instance if in a given time series data is described as 1,2,3 and 4. We can say there is a correlation between 2 and 1, 3, and 1, 4, and 1. But 3 and 4 lag values are correlated to 1 via the middle lag values. It can be understood easily by this graphic
1 <——- 2| 1 <——-2 <——3 | 1 <——2 <—– 3 <——–4
PACF = Partial Auto-correlation Function, it ignores the middle values when calculates the particular lag values.
Augmented Dicky Fuller test= This is another test to be done to check the stationarity of the given time series data, also known as the unit root test. (Stationarity is where mean and SD is not changing over time).
- No unit root of the characteristic equation = Stationary time series
- Unit root exist = Non stationary time series
Null Hypothesis (H0): p >= 0.05. Confidence level is more than 95%. Null Hypothesis is true. Time series is non-stationary.
Alternative Hypothesis (H1) : p <= 0.05 . Reject Null Hypothesis. No unit root. Time series is stationary.
Model Selection and Validation:
If the time series is stationary then there is no need of differencing operation, value of d= 0,
Otherwise if the time series is non stationary then the differencing operation will be performed as long as the series is not transformed into a stationary one.
After getting the final stationary time series the ACF and PACF plot will tell the q and p value respectively. In this case the significant value is considered, though this method doesn’t guarantee the optimum model parameter is selected.
The final step is to check if the optimum ARIMA model is selected or not. There are few criteria which should be met, those are as follows
- Corrected AIC
- Bayesian Information Criterion (BIC)
- Residual Sum of Squares (SSE)
In this paper author collected his data from their lab’s solar energy generation plant (110 kW) and used ARIMA model to predict the last 30 days solar energy value.
Model was developed by using ACF, PACF for parameter selection and ADF testing done for testing if the given time series data is non-stationary. At first the this dataset shown non-stationary (p-value: 0.6693 ). But after first order differencing the p-value became 0.01. That means now time series is stationary.
At model validation author suggested their model performs better at ARIMA (0,1,2) (1,0,1)30.
The absolute percentage error of the model is 17.70%