Advanced Data Analysis in R organised by NUGS-China¶

Session: Four (last session for the program)

Date: 20/03/2021

Intructor: Clement Twumasi

Website: https://twumasiclement.wixsite.com/website

YouTube channel: https://www.youtube.com/channel/UCxrpjLYi_Akmbc2QICDvKaQ

The most important info you need to know about statistics and data analysis: https://www.youtube.com/watch?v=SRGpeSYyOoA&t=3467s

Main objective:¶

Time series analysis in R (we shall simulate time series data, learn how to declare time series data and fit its model).

Other objectives:

Introduction to other advanced statistical (univariate and multivariate) tests & different models (eg. machine learning algorithms, etc.) in R

Objectives: Inferential tests and model fitting in R¶

NB: The class/type of statistical tests and models to fit are predominantly dependent on the type/nature of your data.

Statistical tests:
- Mean tests (Parametric tests: Paired t-tests, independent t-test, ANOVA, Bonferroni pairwise comparison tests, Repeated Measures ANOVA, MANOVA,etc).
- Median tests (Non-parametric tests: Wilcoxon sign test, Mann-Whitney, Kruskal-Wallis test, Bonferroni-Dunn's test, Friedman test, Multivariate Kruskal-Wallis test,etc).
- Test of proportions for one or more groups, etc.
- Correlation test/analysis (Using the informative correlation matrix plot from the PerformanceAnalytics package should be enough).

Link to some of the aforementioned univariate tests in R: https://rpubs.com/sujith/IS

Multivariate analysis in R: https://little-book-of-r-for-multivariate-analysis.readthedocs.io/en/latest/src/multivariateanalysis.html

http://www.sthda.com/english/wiki/manova-test-in-r-multivariate-analysis-of-variance

PCA, SVD, Correspondence Analysis and Multidimensional Scaling:

https://web.stanford.edu/class/bios221/labs/multivariate/lab_5_multivariate.html

Mixed effect regression model (we shall fit a Generalized Linear Mixed Model based on an empirical).

Note some of the class of regression models available. Knowing which one is appropriate for any given data and research problem is very imperative.

GLM (OLS regression, binomial \& multinomial regression, poisson regression, quasi poisson regression, negative binomial regression, etc) [cross-sectional data]
Generalized Linear Mixed Models (for all types of dependent variables) [these class of models are for longitudinal data]
Generalized Additive Models (fixed and mixed-effect types; both cross-sectional \& longitudinal data) and Panel regression models [these class of models for longitudinal data]

Introduction to linear regression: https://www.machinelearningplus.com/machine-learning/complete-introduction-linear-regression-r/

OLS linear regression in R: http://r-statistics.co/Linear-Regression.html

Different class of linear models in R: https://www.statmethods.net/advstats/glm.html

Time series analysis (we shall simulate a time series data, learn how to declare time series data and fit its model).

Time series analysis in R:

https://ourcodingclub.github.io/tutorials/time/

https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/

Other class of models: Machine learning algorithms¶

Machine learning algorithms (Classification tree, Random forest, Gradient Boosting Machine, etc.) [both cross-sectional \& longitudinal data]

Variable selection methods using penalized regression & other methods¶

Method of variable selection: Stepwise regression, Penalized regression (Ridge, LASSO and Elastic net), Recursive Feature Selection.

options(repr.plot.width=8, repr.plot.height=8,repr.plot.res = 300) #Setting plot size

#setting working directory to import and/or export data
setwd("C:/Users/user/Desktop/DataAnalysis_results_R/NUGSChina_R_class")

Time series analysis based on a data¶

We shall simulate our own time series data for this session

Simulating a time series data with 300 data points from ARIMA(2,1,1) model

#Simulating  data from ARIMA(2,1,1)
#You should set a working directory when importing 
set.seed(1)
chosen_length <- 300; burnin <- 100
n <- chosen_length + burnin #n=300
e <- rnorm(n,0,1)
y <- e
mu <- 0.2; phi1 <- 0.3; phi2 <- 0.4; theta1 <- 0.5; sigma <- 1.2

for (i in 3:n) {
  y[i] <- mu + phi1*y[i-1] + phi2*y[i-2] + sigma*e[i] - theta1*e[i-1]
}

S_y <- cumsum(y)
data<- S_y[(burnin+1):n]

print(data)

  [1]  89.28906  89.88158  88.96761  89.77552  88.98778  91.72266  92.40444
  [8]  94.63678  95.78513  99.04909  99.08364 100.36349 102.91082 102.88998
 [15] 104.17916 104.38990 104.98120 105.26795 106.52308 106.75442 107.00739
 [22] 109.24045 109.28254 110.28021 110.76590 112.41597 112.86065 113.84571
 [29] 113.71992 114.22791 114.76433 114.59169 115.88671 114.31834 115.69290
 [36] 113.68091 114.23418 113.31192 112.93818 112.91492 110.68967 112.58187
 [43] 109.87317 110.29369 108.42902 107.89481 110.06869 109.68446 109.08648
 [50] 107.62782 108.51156 108.14585 108.21723 107.33616 105.98013 104.87438
 [57] 105.93788 104.76909 103.69318 106.03825 106.08688 106.74058 108.54564
 [64] 110.08310 110.28006 114.11108 114.13008 114.28631 115.07975 115.90152
 [71] 119.33123 119.86183 122.08839 122.84751 123.80364 124.71946 126.53919
 [78] 129.74792 131.83367 134.87868 134.74495 137.91917 138.78990 138.65014
 [85] 140.51536 140.76801 143.62677 143.13387 144.19629 143.62163 144.32473
 [92] 145.07676 144.70451 146.45597 145.16762 145.02815 146.92436 145.69784
 [99] 147.29066 146.81462 148.19075 150.43512 152.91836 153.57069 152.38286
[106] 156.62725 157.17709 159.55586 160.40268 162.42706 163.12080 164.92569
[113] 165.25402 164.83034 166.90508 168.88781 169.38216 169.17398 170.90660
[120] 171.16834 170.08241 170.93049 170.19311 170.41712 169.17197 171.83009
[127] 171.03053 170.29286 170.99113 171.32275 170.58697 167.72509 167.44809
[134] 167.42509 167.15047 167.17094 167.98930 166.73883 168.80040 168.56388
[141] 170.36900 171.90320 173.03663 173.02415 175.50869 173.46738 174.39514
[148] 174.02253 174.41034 175.88526 176.33609 177.68180 178.17868 178.80365
[155] 180.34839 182.28950 180.23290 182.48120 182.69634 183.16252 184.94229
[162] 184.92010 185.67877 187.26855 189.88380 190.96854 191.89841 191.59543
[169] 192.27380 191.59384 192.02039 192.67910 192.02791 195.90097 195.86504
[176] 198.88173 196.66031 199.43432 197.62796 200.15755 200.41173 201.01173
[183] 203.28627 202.90502 203.55435 202.88567 202.84351 204.23171 204.87915
[190] 206.81800 206.88791 208.53113 209.35688 208.62829 211.78726 211.91564
[197] 214.26924 215.99003 217.10946 217.99191 220.12974 219.62047 223.41207
[204] 223.09982 226.89957 229.10218 230.82632 233.06379 233.11161 235.12083
[211] 237.03356 238.20815 238.93111 239.25797 240.13000 242.02330 242.02476
[218] 243.07929 242.10377 243.67317 245.27611 247.72725 249.53289 248.60394
[225] 250.76084 251.54268 252.18784 253.27541 252.93766 254.71846 254.58546
[232] 253.73853 253.97486 255.72803 255.46516 257.33400 258.75904 259.66607
[239] 260.28321 260.57169 262.25744 261.43262 262.88665 262.72243 260.88166
[246] 259.90556 260.88007 260.29444 261.76814 264.03932 266.33508 268.20806
[253] 270.00558 271.07268 274.30169 275.82377 276.06559 277.34699 275.80326
[260] 276.77785 273.63957 276.16089 274.44232 274.93707 274.60988 275.72894
[267] 276.64168 277.90550 277.87872 277.22653 277.43534 277.96497 277.28072
[274] 277.81623 276.53641 277.14053 277.16862 277.37922 277.52988 280.05753
[281] 281.10942 283.58847 283.28938 285.05008 286.96228 287.79498 286.48270
[288] 288.10059 285.80267 285.93953 287.25581 287.98220 289.92881 290.82544
[295] 291.78758 291.58078 294.47725 294.72102 295.27209 297.13075

Saving and exporting data as a csv file into your working directory for use later if you wish

#Saving and exporting data as a csv file into your working directory
write.csv(data,"Simulated_ts_data.csv")

Declaring series as a monthly data in R¶

data_monthly<- ts(data,start=c(1997,1),end=c(2021,12),frequency=12)
data_monthly

write.csv(data_monthly,"Simulated_ts_Monthlydata.csv")

plot(data_monthly,type="l",lwd=3,ylab="Monthly times series",col="blue")

Suppose you wish to declare the time series from monthly to quarterly & yearly in R¶

data_quartely<- aggregate(data_monthly,nfrequency=4)
data_quartely

data_yearly<- aggregate(data_monthly,nfrequency=1)
data_yearly

#par(mfrow=c(2,1))
par(mfrow=c(2,2),mar=c(4,4,1,1))

#First plot (top left)
plot(data_monthly,type="l",lwd=3,ylab="Times series",xlab="Time",col="blue",las=1)
text(2000,300,"Monthly series",col="blue",font=1)

plot(data_quartely,type="l",lwd=3,ylab="Times series",col="red",las=1)
text(2000,890,"Quartely series",col="red",font=1)

plot(data_yearly,type="l",lwd=3,ylab="Times series",col="green",las=1)
text(2000,3500,"Yearly series",col="green",font=1)


plot(data_yearly,type="l",lwd=3,ylab="Times series",col="green",ylim=c(0,3600),las=1)
lines(data_quartely,type="l",lwd=3,ylab="Times series",col="red",las=1)
lines(data_monthly,type="l",lwd=3,ylab="Times series",col="blue",las=1)


legend(x=1997,y=3650,legend = c("Yearly","Quartely","Monthly"),col=c("green","red","blue"),
      cex=1, horiz = FALSE,pt.cex = 1,fill=c("green","red","blue"),ncol=1,
      title="Time frequency ",box.lwd = 2)

#You can save the quartely data for use later into your working repository if you want
write.csv(data_quartely,"Simulated_ts_Quartelydata.csv")
write.csv(data_yearly,"Simulated_ts_Yearlydata.csv")

Fitting univariate classical time series models (we shall use the monthly data)¶

Practice task: Fit the same class of time series models using the quartely and yearly ts data respectively

NB: There lots of other class of machine learning algorithms for time series prediction.

#Don't forget to install each of the packages with install.package("") before loading with library()
#install.packages("tseries")  #installing package tseries

library(tseries) #for time series
library("tsoutliers")#identifying outliers in time series
library(ggplot2) #its for other types of customized plots
library(forecast) # forecasting methods
library(gridExtra)# for combine ggplots using grid.arrange() fucntion
library("seastests")# for testing seasonality
library("TTR")#for simple moving average model
library(zoo)

Finding mean values for each year

year<- as.character(1997:2021)
means_year<- aggregate(data_monthly, nfrequency = 1, FUN = mean)
data.frame(year=year,mean=means_year)

Step 1: Detecting outliers in the series using an Automatic ARIMA procedure¶

The types of outliers it detects: "AO" additive outliers, "LS" level shifts, and "TC" temporary changes are selected; "IO" innovative outliers and "SLS" seasonal level shifts can also be selected.

?tso #to see the arguments of the function "tso" if you wish

outlier.monthlydata<- tso(data_monthly,maxit.iloop=10,tsmethod = "auto.arima")

outlier.monthlydata

Series:  
ARIMA(2,1,1) 

Coefficients:
         ar1     ar2      ma1
      0.4292  0.4681  -0.5681
s.e.  0.0802  0.0579   0.0889

sigma^2 estimated as 1.494:  log likelihood=-483.22
AIC=974.44   AICc=974.58   BIC=989.24

No outliers were detected.

Suppose the data had outliers, you can use the tsclean() to recompute good estimates of the outliers

NB: For this data, data cleaning is not necessary since there are no outliers.

cleaned_tseries<- tsclean(data_monthly)
cleaned_tseries

#Check whether the original series without outliers have the same values in the cleaned data
all(cleaned_tseries==data_monthly)

Step 2: Identify patterns in time series and decomposition¶

Why do we need to identify patterns?: Because they can also suggest the class of models to use/consider.

A time series with additive trend, seasonal, and irregular components can be decomposed using the stl() function.

NB: STL decomposition stands for Seasonal-Trend decomposition using LOESS

decomp_ts <- stl(data_monthly, s.window = "periodic", robust = TRUE)$time.series
#print(decomp_ts)

Seasonal_component<- decomp_ts[,1]
Trend_component<- decomp_ts[,2]
Random_component  <- decomp_ts[,3]
Deasonalized_data<-  data_monthl-Seasonal_component

par(mfrow=c(4,1),mar=c(4,4,1,1))
plot(data_monthly,ylab="Observed data",col="blue")
plot(Seasonal_component,col="blue",ylab="Seasonal")
plot(Trend_component,col="blue",ylab="Trend")
plot(Random_component,col="blue",ylab="Random")

Testing for stationarity using Augmented Dickey-Fuller unit root test (ADF test)

H0: There series is nonstationarity.

H1: There is stationary

Conclusion: The original monthly series is non-stationary (i.e at lag 0).

Clearly, the time series non-stationary or changes over time

adf.test(data_monthly)

	Augmented Dickey-Fuller Test

data:  data_monthly
Dickey-Fuller = -2.3985, Lag order = 6, p-value = 0.4083
alternative hypothesis: stationary

Notice that the series is stationary at lag 1 (first differencing $\Delta Y_t$) using the ADF test

where $\Delta Y_t=Y_t-Y_{t-1}$

adf.test(diff(data_monthly,lag=1)) # NB: diff(data_monthly,lag=1) gives first differenced series Y_t-Y_{t-1}

	Augmented Dickey-Fuller Test

data:  diff(data_monthly, lag = 1)
Dickey-Fuller = -4.7266, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

first_diff_series<- diff(data_monthly,lag=1)

par(mfrow=c(2,1),mar=c(4,4,1,1))

plot(data_monthly,type="l",lwd=2,col="blue",ylab="Original series")
plot(first_diff_series,type="l",lwd=2,col="blue",ylab="First differenced series")
abline(h=0,col="black",lwd=3)

WO-test Seasonality test

By default, the WO-test is used to assess the seasonality of a time series and returns a boolean. Alternatively, the QS test (test='qs'), Friedman test (test='fried'), Kruskall-Wallis (test='kw'), F-test on seasonal dummies (test='seasdum') or the Welch test (test='welch') can be used.

Webel, K. and Ollech, D. (2019). An overall seasonality test. Deutsche Bundesbank’s Discussion Paper series.

#The p-values indicate the p-values of the underlying test, i.e. the QS-test, the QS-R test and the KW-R-test.

#Testing for monthly seasonality
summary(wo(data_monthly,freq=12))

Test used:  WO 
 
Test statistic:  0 
P-value:  1 1 0.5132214 
 
The WO - test does not identify  seasonality

#Clearly there are no seasonal patterns
#But the series generally increase over time for each year
seasonplot(data_monthly, col=rainbow(12), year.labels=TRUE,main="Seasonal plots across years")

ggseasonplot(data_monthly, col=rainbow(n=25),polar=TRUE) +
  ylab("Series") +
  ggtitle("Polar seasonal plot")

ACF and PACF plots using ggplot with forecast package

options(warn=-1)#avoid title warning message in ggAcf or ggPacf function

acf_plot1=ggAcf(data_monthly,main="Original series")

pacf_plot1=ggPacf(data_monthly,main="Original series")


acf_plot2=ggAcf(first_diff_series,main="First differenced series")

pacf_plot2=ggPacf(first_diff_series,main="First differenced series")

grid.arrange(acf_plot1,pacf_plot1,acf_plot2,pacf_plot2,ncol=2)

Divide the data into training and testing sets¶

train_set<- window(data_monthly, start=c(1997, 1), end=c(2015, 12)) #from 1997 to 2015 from training
test_set<- window(data_monthly, start=c(2016, 1), end=c(2021, 12)) # from 2016 to 2021 for testing

Fitting ARIMA time series model

Automatic_ARIMA_model<- auto.arima(y=train_set,seasonal = FALSE)
Automatic_ARIMA_model

AutoArimaModel_forecasts=forecast(Automatic_ARIMA_model, PI=TRUE,h=length(test_set),level = c(95),interval = "c")


#accuracy measures
accuracy(AutoArimaModel_forecasts,test_set)[,c(2,3,5)]

Series: train_set 
ARIMA(2,1,1) with drift 

Coefficients:
         ar1     ar2      ma1   drift
      0.3831  0.4496  -0.6130  0.7160
s.e.  0.0964  0.0600   0.1004  0.1765

sigma^2 estimated as 1.415:  log likelihood=-359.77
AIC=729.54   AICc=729.81   BIC=746.67

Fitting Naive time series model

naive_model<- snaive(y=train_set,h=length(test_set))

#accuracy measures
accuracy(naive_model,test_set)[,c(2,3,5)]

Fitting non-seasonal Holt-Winters model (HW)

HW_model <- HoltWinters(train_set, beta=FALSE, gamma=FALSE)

HW_model 

HW_forecasts=forecast(HW_model , PI=TRUE,h=length(test_set),level = c(95),interval = "c")


#accuracy measures
accuracy(HW_forecasts,test_set)[,c(2,3,5)]

Holt-Winters exponential smoothing without trend and without seasonal component.

Call:
HoltWinters(x = train_set, beta = FALSE, gamma = FALSE)

Smoothing parameters:
 alpha: 0.999943
 beta : FALSE
 gamma: FALSE

Coefficients:
      [,1]
a 253.2753

Fitting Simple Exponential Smoothing model

ses_model <- ses(train_set, h = length(test_set))
#accuracy measures
accuracy(ses_model,test_set)[,c(2,3,5)]

Plotting forecasts¶

p1=autoplot(AutoArimaModel_forecasts,ylab="Series",xlab="Time",main="Non-seasonal ARIMA(2,1,1) model") 

p2=autoplot(naive_model,ylab="Series",xlab="Time",main="Naive model")

p3=autoplot(HW_forecasts,ylab="Series",xlab="Time",main="Non-seasonal Holt-Winter model")

p4=autoplot(ses_model,ylab="Series",xlab="Time",main="Simple Exponential Smoothing model")


grid.arrange(p1,p2,p3,p4,ncol=2)

	Qtr1	Qtr2	Qtr3	Qtr4
1997	268.1383	270.4860	282.8263	298.4962
1998	309.9800	314.6390	320.2849	328.8032
1999	336.0425	341.7935	345.2427	343.6921
2000	340.4843	336.1865	328.5959	327.6480
2001	325.2259	323.6993	316.7924	314.5005
2002	321.3731	334.4742	343.4961	355.0946
2003	368.7395	381.0066	401.4573	415.3592
2004	424.9101	430.9518	434.1060	436.6517
2005	439.9129	445.4405	458.8719	473.3602
2006	485.9505	495.0101	505.1750	511.2489
2007	511.2060	511.4192	512.3145	509.6348
2008	502.0237	501.8991	507.7333	517.9640
2009	523.3712	524.3181	532.1966	541.4415
2010	545.4104	553.0249	562.8311	574.4624
2011	575.8880	580.6080	591.4071	597.2198
2012	604.7097	609.3450	611.9544	622.2371
2013	629.7724	642.1749	655.2311	666.1324
2014	686.8281	701.2962	714.1728	721.4113
2015	727.2078	736.6765	748.8977	757.0059
2016	762.2416	763.4414	771.5582	780.5210
2017	786.5767	783.5097	782.9427	798.5825
2018	815.3800	829.2364	826.2207	825.5403
2019	826.9805	833.0107	832.6810	831.4932
2020	832.0777	844.7554	855.3017	862.3783
2021	858.9980	868.7364	877.8456	887.1239

year	mean
<fct>	<ts>
1997	93.3289
1998	106.1423
1999	113.8976
2000	111.0762
2001	106.6848
2002	112.8698
2003	130.5469
2004	143.8850
2005	151.4655
2006	166.4487
2007	170.3812
2008	169.1350
2009	176.7773
2010	186.3107
2011	195.4269
2012	204.0205
2013	216.1092
2014	235.3090
2015	247.4823
2016	256.4802
2017	262.6343
2018	274.6981
2019	277.0138
2020	282.8761
2021	291.0587

	RMSE	MAE	MAPE
Training set	1.17651	0.9369767	0.6344676
Test set	7.75878	6.5982500	2.3671166

	RMSE	MAE	MAPE
Training set	11.02135	9.693043	6.118867
Test set	29.20448	26.644526	9.556393

	RMSE	MAE	MAPE
Training set	1.515392	1.201059	0.7990761
Test set	24.052452	20.860884	7.4321282

	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
1997	89.28906	89.88158	88.96761	89.77552	88.98778	91.72266	92.40444	94.63678	95.78513	99.04909	99.08364	100.36349
1998	102.91082	102.88998	104.17916	104.38990	104.98120	105.26795	106.52308	106.75442	107.00739	109.24045	109.28254	110.28021
1999	110.76590	112.41597	112.86065	113.84571	113.71992	114.22791	114.76433	114.59169	115.88671	114.31834	115.69290	113.68091
2000	114.23418	113.31192	112.93818	112.91492	110.68967	112.58187	109.87317	110.29369	108.42902	107.89481	110.06869	109.68446
2001	109.08648	107.62782	108.51156	108.14585	108.21723	107.33616	105.98013	104.87438	105.93788	104.76909	103.69318	106.03825
2002	106.08688	106.74058	108.54564	110.08310	110.28006	114.11108	114.13008	114.28631	115.07975	115.90152	119.33123	119.86183
2003	122.08839	122.84751	123.80364	124.71946	126.53919	129.74792	131.83367	134.87868	134.74495	137.91917	138.78990	138.65014
2004	140.51536	140.76801	143.62677	143.13387	144.19629	143.62163	144.32473	145.07676	144.70451	146.45597	145.16762	145.02815
2005	146.92436	145.69784	147.29066	146.81462	148.19075	150.43512	152.91836	153.57069	152.38286	156.62725	157.17709	159.55586
2006	160.40268	162.42706	163.12080	164.92569	165.25402	164.83034	166.90508	168.88781	169.38216	169.17398	170.90660	171.16834
2007	170.08241	170.93049	170.19311	170.41712	169.17197	171.83009	171.03053	170.29286	170.99113	171.32275	170.58697	167.72509
2008	167.44809	167.42509	167.15047	167.17094	167.98930	166.73883	168.80040	168.56388	170.36900	171.90320	173.03663	173.02415
2009	175.50869	173.46738	174.39514	174.02253	174.41034	175.88526	176.33609	177.68180	178.17868	178.80365	180.34839	182.28950
2010	180.23290	182.48120	182.69634	183.16252	184.94229	184.92010	185.67877	187.26855	189.88380	190.96854	191.89841	191.59543
2011	192.27380	191.59384	192.02039	192.67910	192.02791	195.90097	195.86504	198.88173	196.66031	199.43432	197.62796	200.15755
2012	200.41173	201.01173	203.28627	202.90502	203.55435	202.88567	202.84351	204.23171	204.87915	206.81800	206.88791	208.53113
2013	209.35688	208.62829	211.78726	211.91564	214.26924	215.99003	217.10946	217.99191	220.12974	219.62047	223.41207	223.09982
2014	226.89957	229.10218	230.82632	233.06379	233.11161	235.12083	237.03356	238.20815	238.93111	239.25797	240.13000	242.02330
2015	242.02476	243.07929	242.10377	243.67317	245.27611	247.72725	249.53289	248.60394	250.76084	251.54268	252.18784	253.27541
2016	252.93766	254.71846	254.58546	253.73853	253.97486	255.72803	255.46516	257.33400	258.75904	259.66607	260.28321	260.57169
2017	262.25744	261.43262	262.88665	262.72243	260.88166	259.90556	260.88007	260.29444	261.76814	264.03932	266.33508	268.20806
2018	270.00558	271.07268	274.30169	275.82377	276.06559	277.34699	275.80326	276.77785	273.63957	276.16089	274.44232	274.93707
2019	274.60988	275.72894	276.64168	277.90550	277.87872	277.22653	277.43534	277.96497	277.28072	277.81623	276.53641	277.14053
2020	277.16862	277.37922	277.52988	280.05753	281.10942	283.58847	283.28938	285.05008	286.96228	287.79498	286.48270	288.10059
2021	285.80267	285.93953	287.25581	287.98220	289.92881	290.82544	291.78758	291.58078	294.47725	294.72102	295.27209	297.13075

	RMSE	MAE	MAPE
Training set	1.51207	1.195812	0.7955904
Test set	24.05249	20.860930	7.4321448