eCommerce Example
This notebook details a time series analysis and forecasting application performed with scalecast using the eCommerce dataset. The notebook starts with an exploratory data analysis; moves to time series decomposition; forecasts with an exponential smoothing model; an ARIMA model; a multiple linear regression model; moves to automated forecasting with scikit-learn models, Facebook Prophet, and LinkedIn Greykite/Silverkite; explores TensorFlow recurrent neural nets; and finally finishes with combination modeling.
The utilized dataset is available on kaggle: https://www.kaggle.com/carrie1/ecommerce-data/
Library Imports
[1]:
import pandas as pd
import numpy as np
import datetime
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from scalecast.Forecaster import Forecaster
from scalecast import GridGenerator
from scalecast.notebook import tune_test_forecast
plot_dim = (15,7)
sns.set(rc={'figure.figsize':plot_dim})
[2]:
data = pd.read_csv('eCommerce.csv',parse_dates=['InvoiceDate'])
Exploratory Data Analysis
[3]:
data.head()
[3]:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010-12-01 08:26:00 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom |
2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2010-12-01 08:26:00 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom |
Now, let’s view the data’s dimensions.
[4]:
data.shape
[4]:
(541909, 8)
Let’s see how much time these 550,000 observations span.
[5]:
print('first date in data:',data['InvoiceDate'].min())
print('last date in data:',data['InvoiceDate'].max())
first date in data: 2010-12-01 08:26:00
last date in data: 2011-12-09 12:50:00
In spite of there being over half a million rows, there is only about a year’s worth of data to analyze. Before proceding, we should decide a datetime frequency to aggregate the data to, otherwise we will be trying to forecast with an incosistent interval of time between observations. This decision depends on what question we are trying to answer. For this example, let’s try answering the question of if we can accurately predict daily gross sales revenues in the United Kingdom over 30 days. This
means we will be removing sales that were negative in value (probably representing returns), creating a “Sales” column by multiplying quantity by price, and aggregating the entire dataframe to the daily level. Then, we subset to country=='United Kingdom'
and fill any days that show no sales with 0.
[6]:
# drop negative sales quantities
data = data.loc[(data['Quantity'] > 0) & (data['Country'] == 'United Kingdom')]
# create the Sales column
data['Sales'] = data['Quantity']*data['UnitPrice']
# aggregate the dataframe to the daily level
dt_aggr = 'D'
data['DateTime'] = data['InvoiceDate'].dt.floor(dt_aggr)
tbl = data.groupby('DateTime')['Sales'].sum().reset_index()
# view first 5 rows
tbl.head()
[6]:
DateTime | Sales | |
---|---|---|
0 | 2010-12-01 | 54818.08 |
1 | 2010-12-02 | 47570.53 |
2 | 2010-12-03 | 41308.69 |
3 | 2010-12-05 | 25853.20 |
4 | 2010-12-06 | 53322.12 |
[7]:
tbl.shape
[7]:
(305, 2)
[8]:
print('first date in data:',tbl['DateTime'].min())
print('last date in data:',tbl['DateTime'].max())
first date in data: 2010-12-01 00:00:00
last date in data: 2011-12-09 00:00:00
It is possible that after making this aggregation to the daily level, some date observations in the range are missing. Some forecasting libraries will process missing data for you automatically, but because scalecast mixes so many model concepts, it is necessary to have every possible date in a given range represented. If we run the code below, we will limit the dataframe to only days and any missing dates will have their sales filled with 0.
[9]:
all_dates = pd.DataFrame({'DateTime':pd.date_range(start=tbl['DateTime'].min(),end=tbl['DateTime'].max(),freq=dt_aggr)})
full_data = all_dates.merge(tbl,on='DateTime',how='left').sort_values(['DateTime']).fillna(0)
full_data.head()
[9]:
DateTime | Sales | |
---|---|---|
0 | 2010-12-01 | 54818.08 |
1 | 2010-12-02 | 47570.53 |
2 | 2010-12-03 | 41308.69 |
3 | 2010-12-04 | 0.00 |
4 | 2010-12-05 | 25853.20 |
Let’s see how that looks plotted.
[10]:
full_data.plot(x='DateTime',y='Sales',title='eCommerce Sales Original')
plt.show()

We notice the last observation in the dataframe is an outlier. Let’s look at it more closely.
[11]:
full_data.sort_values(['Sales'],ascending=False).head(1)
[11]:
DateTime | Sales | |
---|---|---|
373 | 2011-12-09 | 196134.1 |
There are different ways to handle outliers in time series, but we will try ignoring it to see how well our models can predict it.
The last preprocessing function we want to perform is removing the first month or so of observations because so many of them are 0 or close-to-0 in value. Starting January 4, 2011, we see a more normal pattern.
[12]:
full_data = full_data.loc[full_data['DateTime'] >= datetime.datetime(2011,1,4)]
full_data.plot(x='DateTime',y='Sales',title='eCommerce Sales Starting Jan 4')
plt.show()

Much better. In the real world, we might want to know more about that outlier – why it exists, the best way to mitigate it, etc. This example is more interested in producing forecasts and showing the scalecast process, so we are going to sweep the issue of this outlier under the rug and forget about it. Let’s now try to get a better idea of how the data is distributed.
[13]:
full_data.describe()
[13]:
Sales | |
---|---|
count | 340.000000 |
mean | 24278.908776 |
std | 20781.112463 |
min | 0.000000 |
25% | 12518.370000 |
50% | 21182.990000 |
75% | 31394.397500 |
max | 196134.100000 |
[14]:
full_data['Sales'].hist(bins=25)
plt.show()

The remaining data looks somewhat normally distributed with a slight right skew. We see that one outlier far above all other values.
Forecast with Scalecast
y
and current_dates
parameters specified. If we hadn’t already dropped the first observations of the data before calling the object, we could have done it by using the keep_smaller_history()
function as shown below. We can then plot the values that we will be using for forecasting.[15]:
f = Forecaster(y=full_data['Sales'],current_dates=full_data['DateTime'])
f
[15]:
Forecaster(
DateStartActuals=2011-01-04T00:00:00.000000000
DateEndActuals=2011-12-09T00:00:00.000000000
Freq=D
ForecastLength=0
Xvars=[]
Differenced=0
TestLength=1
ValidationLength=1
ValidationMetric=rmse
ForecastsEvaluated=[]
CILevel=0.95
BootstrapSamples=100
)
The first thing you should do after initializing the object is set its test length. What that length is is up to you. The longer the length, the more confident you can be about your reported error/accuracy metrics. The library requires a test length of at least 1. Let’s set our test length to be the same size as our forecast length: 30 days.
[16]:
f.set_test_length(30)
Before beginning the forecasting process, we should get a better idea of the signals within the time series itself. Using ACF, PACF, and Periodogram plots, we can observe how the series is auto-correlated. We leave the test set out of all visualizations (train_only=True
) to not leak data when making decisions about which signals exist in the data.
[17]:
f.plot_acf(train_only=True,lags=30)
plt.show()

[18]:
f.plot_pacf(train_only=True,lags=30)
plt.show()

[19]:
a, b = f.plot_periodogram(diffy=True,train_only=True)
plt.semilogy(a, b)
plt.show()

[20]:
f.seasonal_decompose(train_only=True).plot()
plt.show()

From these graphs, we get a sense that the data has a seasonal pattern over seven periods, or one week. The data appears to be significantly autocorrelated back 1, 6, 7, and 13, and 14 periods, possibly more. This is good information to use when deciding how to specify the forecasts.
Now, let’s test the data’s stationarity using the Augmented Dickey-Fuller test. The null hypothesis of this test is that the data is not stationary. To return the p-val and critical value from this test, pass full_res=True
as an argument. The way we have the test specified, it will simply print out the implications from the test.
[21]:
isstationary = f.adf_test(quiet=False)
series might not be stationary
Since our test implies non-stationarity, let’s view all these plots again, but pass differenced data into them by using the diffy=1
argument.
[22]:
f.plot_acf(diffy=1,train_only=True,lags=30)
plt.show()

[23]:
f.plot_pacf(diffy=1,train_only=True,lags=30)
plt.show()

[24]:
f.seasonal_decompose(diffy=1,train_only=True).plot()
plt.show()

This doesn’t give us a very different view of the data, but reinforces that there is strong autocorrelation in our dataset. Some of that can be controlled by using the series’ first difference.
You can see below all functions available to plot the information in the object:
[25]:
print(*f.get_funcs('plotter'),sep='\n')
plot_acf
plot_pacf
plot_periodogram
seasonal_decompose
plot
plot_test_set
plot_fitted
All setter functions:
[26]:
print(*f.get_funcs('setter'),sep='\n')
set_last_future_date
set_test_length
set_validation_length
set_cilevel
set_bootstrap_samples
set_estimator
set_validation_metric
All getter functions:
[27]:
print(*f.get_funcs('getter'),sep='\n')
get_regressor_names
get_freq
HWES
Before running any forecast, we need to generate a forecast period. We are attempting to predict 30 days into the future.
[28]:
f.generate_future_dates(30)
To run an HWES model, we first set the estimator to hwes, then call the manual_forecast()
function. We can run as many hwes models as we like and differentiate them by using the call_me
argument. By default, the model will be called whatever the estimator is (“hwes” is “hwes”, “arima” is “arima”, etc.)
[29]:
f.set_estimator('hwes')
f.manual_forecast(trend='add')
f.save_summary_stats()
f.manual_forecast(trend='add',seasonal='add',call_me='hwes_seasonal')
f.save_summary_stats()
We can view the results of the model by plotting the test-set predictions with the actual test-set observations, setting ci=True to show 95% confidence intervals. Any confidence level can be set by using f.setcilevel(...)
. The intervals are evaluated with bootstrapping on the fitted values, and could be different than what is returned from the underlying statsmodels function. This is to make all forecast comparisons consistent across all model types. If the forecast shows tight confidence
intervals but are less accurate on the test set, this is a good indication that your model is overfit.
[30]:
f.plot_test_set(ci=True)

We can see the models’ performance over the 30-day forecast horizon as well:
[31]:
f.plot(print_attr=['LevelTestSetRMSE'],ci=True)
hwes LevelTestSetRMSE: 36031.56799119545
hwes_seasonal LevelTestSetRMSE: 29432.19251120192

You may have noticed we called a function save_summary_stats() after running each ARIMA model. Most models allow feature information to be saved from them using save_feature_importance()
or save_summary_stats()
. Some models don’t allow either. They should be called before running a new model because they automatically save that information for the last model run only. As a rule of thumb, saving summary stats is much less computationally expensive than saving feature importance, which
will be explored later.
We can see the summary statistics for the better-peforming HWES model exporting the saved summary stats to a dataframe.
[32]:
f.export_summary_stats(model='hwes_seasonal')
[32]:
coeff | code | optimized | |
---|---|---|---|
smoothing_level | 0.111071 | alpha | True |
smoothing_trend | 0.000100 | beta | True |
smoothing_seasonal | 0.197540 | gamma | True |
initial_level | 22137.727000 | l.0 | True |
initial_trend | 190.311940 | b.0 | True |
initial_seasons.0 | 31545.165000 | s.0 | True |
initial_seasons.1 | -1669.115200 | s.1 | True |
initial_seasons.2 | -911.148380 | s.2 | True |
initial_seasons.3 | 1028.341700 | s.3 | True |
initial_seasons.4 | -20410.079000 | s.4 | True |
initial_seasons.5 | -10495.992000 | s.5 | True |
initial_seasons.6 | 912.828050 | s.6 | True |
ARIMA
To run an ARIMA model, we first set the estimator to ARIMA, then call the manual_forecast()
function. We can run as many ARIMA models as we like, and differentiate them by using the call_me
argument. By default, the model will be called whatever the estimator is (“arima” is “arima”, “mlr” is “mlr”, etc.)
[33]:
f.set_estimator('arima')
f.manual_forecast(order=(1,1,0),seasonal_order=(1,1,0,7))
f.save_summary_stats()
f.manual_forecast(order=(1,1,1),seasonal_order=(0,1,1,7),call_me='arima_ma_terms')
f.save_summary_stats()
We can view the results of the model by plotting the test-set predictions with the actual test-set observations.
[34]:
f.plot_test_set(models=['arima','arima_ma_terms'],ci=True)

Those models appear to capture the daily trend fairly well. Let’s see how they look into future periods compared to the HWES models previously evaluated.
[35]:
f.plot(print_attr=['LevelTestSetRMSE'],ci=True)
hwes LevelTestSetRMSE: 36031.56799119545
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
arima LevelTestSetRMSE: 49010.85466653722
arima_ma_terms LevelTestSetRMSE: 46563.90435412486

The ARIMA model without MA terms has really gone in a strange direction. It is also the worst performing model we have evaluated yet, using the test-set RMSE as our comparison metric. Let’s delete this model from memory.
[36]:
f.pop('arima')
We can see the summary statistics for the remaining ARIMA model by exporting the saved summary stats to a dataframe, just like we did for the HWES model.
[37]:
f.export_summary_stats(model='arima_ma_terms')
[37]:
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
ar.L1 | 4.210000e-02 | 1.120000e-01 | 3.750000e-01 | 0.707 | -1.780000e-01 | 2.620000e-01 |
ma.L1 | -8.260000e-01 | 8.900000e-02 | -9.300000e+00 | 0.000 | -1.000000e+00 | -6.520000e-01 |
ma.S.L7 | -8.010000e-01 | 5.000000e-02 | -1.598900e+01 | 0.000 | -8.990000e-01 | -7.030000e-01 |
sigma2 | 2.927000e+08 | 2.130000e-10 | 1.380000e+18 | 0.000 | 2.930000e+08 | 2.930000e+08 |
MLR
First, let’s begin with autoregressive terms, which are lags of the dependent variable values. We can 28 lags to make sure we are capturing all statistically signficant terms.
[38]:
f.add_ar_terms(28) # 1-4 lags
To account for stationarity, which is done by setting the middle term to 1 in ARIMA, we have to difference our data before modeling with linear regression.
[39]:
f.diff()
We can confirm the first-differenced data’s (probable) stationarity with another ADF test.
[40]:
isstationary = f.adf_test(quiet=False)
series appears to be stationary
Let’s plot its first difference.
[41]:
f.plot(models=None)

Let’s now add additional seasonality to the model regressors. The main seasonality that we have been able to confirm is weekly, so we can add the ‘dayofweek’ regressor to our object. We have three options: we can use a sin/cos transformation that accounts for regular fluctuations in the data in a cyclical form; we can add the data as 6-7 dummies using dummy=True
and specifying the drop_first
parameter; or we can just use the raw 1-5 numerical output, which is the default (a decision tree
model may handle this last kind of regressor better than a linear model). Any decision we make in this regard has its pros and cons. For this example, we will use the dummy transformation. Also, to add complexity that the ARIMA couldn’t caputre, we add weekly, monthly, and quarterly seasonality with the sin/cos transformation.
Other seasonal regressors are available and can be specified in the same way, including ‘day’, ‘hour’, ‘minute’ and more.
[42]:
f.add_seasonal_regressors('dayofweek',raw=False,dummy=True,drop_first=True)
f.add_seasonal_regressors('week','month','quarter',raw=False,sincos=True)
Let’s also add a time trend.
[43]:
f.add_time_trend()
All options for adding regressors are:
[44]:
print(*f.get_funcs('adder'),sep='\n')
add_ar_terms
add_AR_terms
add_seasonal_regressors
add_time_trend
add_other_regressor
add_covid19_regressor
add_combo_regressors
add_poly_terms
add_exp_terms
add_logged_terms
dd_pt_terms
add_diffed_terms
add_lagged_terms
See the documentation
Let’s see what calling our object now that we have evaluated some models looks like.
[45]:
f
[45]:
Forecaster(
DateStartActuals=2011-01-04T00:00:00.000000000
DateEndActuals=2011-12-09T00:00:00.000000000
Freq=D
ForecastLength=30
Xvars=['AR1', 'AR2', 'AR3', 'AR4', 'AR5', 'AR6', 'AR7', 'AR8', 'AR9', 'AR10', 'AR11', 'AR12', 'AR13', 'AR14', 'AR15', 'AR16', 'AR17', 'AR18', 'AR19', 'AR20', 'AR21', 'AR22', 'AR23', 'AR24', 'AR25', 'AR26', 'AR27', 'AR28', 'dayofweek_1', 'dayofweek_2', 'dayofweek_3', 'dayofweek_4', 'dayofweek_5', 'dayofweek_6', 'weeksin', 'weekcos', 'monthsin', 'monthcos', 'quartersin', 'quartercos', 't']
Differenced=1
TestLength=30
ValidationLength=1
ValidationMetric=rmse
ForecastsEvaluated=['hwes', 'hwes_seasonal', 'arima_ma_terms']
CILevel=0.95
BootstrapSamples=100
)
When adding these and other kinds of regressors, it is possible to change the names of some of them. This can come in handy to reference later. If any regressors begin with uppercase “AR”, the forecasting mechanisms in most of the models will assume such terms are autoregressive in nature and those terms are handled differently. So, be careful when naming variables.
Let’s move to modeling with a linear model. Any arguments that the linear model from the scikit-learn library accepts can also be accepted here. In addition, the following arguments are available for all sklearn models: - Xvars
(arguments include “all”, None, and list-like objects) – default is always None, but for models that require regressors, this is treated the same as “all” - normalizer
(arguments are None, “minmax”, “normalize”, “pt”, and “scale”) – default is always “minmax” -
call_me
– does not affect the model’s evaluation at all, just names the model for reference later
[46]:
f.set_estimator('mlr')
f.manual_forecast(normalizer=None,Xvars=None)
Like the ARIMA model, we can see its performance on the test set. Note that all forecasting in this module, even on the test set, is dynamic in nature so that no training-set information is leaked but autoregressive and seasonal patterns can still be predicted. We can trust that this is a true performance on 20 days of out-of-sample data.
The way the forecasting mechanism works (when AR terms are involved) is by making a prediction one step into the future then filling in those predictions to create new AR terms, until the entire forecast interval has been predicted. This is true for testing and forecasting, but validating is non-dynamic by default. Both validating and testing can be either dynamic or non-dynamic; this will be explored later. With large test, validation, or forecast intervals, the forecasting may slow down considerably if everything is kept dynamic. However, if AR terms have not been added to the regressors, forecasting times are similar to any non-time-series prediction application.
[47]:
f.plot_test_set(models='mlr',ci=True)

Since the other models were run on level data and the MLR was run on differenced data, to compare them, the plots of the test set and forecasts will revert to level automatically, but level=True
is available as an argument if all models were run on differenced data but you want to see their performance on the level test set.
[48]:
f.plot_test_set(models=['mlr','hwes_seasonal'])

Confidence intervals are no longer available since models were run at different levels.
Let’s now plot both models’ forecasted values on level data. Let’s order the way these models are displayed based on their test-set performance as well.
[49]:
f.plot(print_attr=['LevelTestSetRMSE'],order_by='LevelTestSetRMSE')
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
mlr LevelTestSetRMSE: 32487.00905488084
hwes LevelTestSetRMSE: 36031.56799119545
arima_ma_terms LevelTestSetRMSE: 46563.90435412486

We see that the MLR model performed better on the test set than the remaining ARIMA model, but still not as well as the best HWES model.
With scikit-learn models, such as MLR, permutation feature importance information from the eli5 package can be saved and exported to a dataframe, analogous to saving summary stats from an ARIMA model. Each weight is the average decrease in accuracy when that particular variable is substituted for random values over 10 iterations, so the variables that cause the largest decreases are considered the most important. The output of the exported data is therefore ranked in terms of which variables are most-to-least important, according to this methodology.
[50]:
f.save_feature_importance()
[51]:
f.export_feature_importance(model='mlr')
[51]:
weight | std | |
---|---|---|
feature | ||
AR1 | 1.314080 | 0.073512 |
AR2 | 1.071012 | 0.060343 |
AR3 | 0.863518 | 0.033001 |
AR12 | 0.543600 | 0.034335 |
AR4 | 0.453675 | 0.019668 |
AR13 | 0.398375 | 0.036643 |
AR6 | 0.384533 | 0.018344 |
AR5 | 0.370097 | 0.021654 |
dayofweek_5 | 0.313611 | 0.034527 |
AR23 | 0.307204 | 0.015959 |
AR15 | 0.295117 | 0.030526 |
AR14 | 0.277490 | 0.025026 |
AR11 | 0.248124 | 0.023914 |
AR24 | 0.208423 | 0.019388 |
AR9 | 0.206587 | 0.013182 |
AR22 | 0.206181 | 0.038638 |
AR20 | 0.173113 | 0.009632 |
AR8 | 0.169191 | 0.023569 |
AR10 | 0.127925 | 0.015004 |
AR19 | 0.126180 | 0.026513 |
AR17 | 0.124999 | 0.011934 |
AR21 | 0.113865 | 0.010676 |
AR16 | 0.111326 | 0.020616 |
AR18 | 0.104127 | 0.008001 |
AR7 | 0.094152 | 0.010217 |
monthcos | 0.090720 | 0.026520 |
dayofweek_6 | 0.077548 | 0.024611 |
t | 0.060184 | 0.017902 |
AR27 | 0.057364 | 0.006633 |
weeksin | 0.051128 | 0.007120 |
weekcos | 0.046868 | 0.005986 |
dayofweek_4 | 0.035563 | 0.005710 |
AR26 | 0.029666 | 0.002313 |
AR25 | 0.026536 | 0.006346 |
dayofweek_1 | 0.019746 | 0.001437 |
AR28 | 0.019454 | 0.007024 |
dayofweek_2 | 0.013402 | 0.004501 |
quartersin | 0.008244 | 0.002880 |
dayofweek_3 | 0.005167 | 0.002338 |
monthsin | 0.000366 | 0.001204 |
quartercos | 0.000269 | 0.000377 |
At first glance, it looks as though all added regressors were useful to the MLR model, but we can explore regularization now to further refine our forecasting approach.
Elasticnet and Auto-Forecasting
[52]:
GridGenerator.get_example_grids()
GridGenerator.get_empty_grids()
is also available. We could call this then open the created file (Grids.py) and fill in the empty dictionary with hyperparameter values that we want to use. The example grids can usually be used for adequate performance, but contributions to improving the default values are welcome. You are also more-than-welcome to open the Grids.py file locally and making adjustments as you see fit.
These grids are saved to your working directory as Grids.py. Their structure is that of Dict[str:list-like]
and scalecast automatically knows how to look for them. You can also pass your own grid manually by using ingest_grid()
and passing a str
value, which corresponds to a namespace of a grid in the Grids.py file, or a grid of dict
type. By default, the code below will ingest the grid named elasticnet from the Grids.py file in this same directory.
You can also create really big grids and limit them randomly by calling limit_grid_size(n:int|float)
. If int
, must be >0 to denote the number of combinations in the grid to keep. If float
, must be between 0 and 1 to represent a portion of the grid’s original size to randomly keep. random_seed
is available as an argument in this method and is None
by default.
[53]:
f.set_validation_length(15)
f.set_estimator('elasticnet')
# unlike testing, tuning is non-dynamic by default, but this can be changed by passing dynamic_tuning=True as an argument below
f.tune() # automatically imports the elasticnet grid from Grids.py
# unlike tuning, testing is dynamic by default, but this can be changed by passing dynamic_testing=False as an argument below
f.auto_forecast()
f.save_feature_importance()
[54]:
f.plot_test_set(models=['mlr','elasticnet'],ci=True)

Regularization really took the seasonal pattern out of this model and reverted the prediction to nearly a straight line. Let’s compare all model forecasts as we did before.
[55]:
f.plot(print_attr=['LevelTestSetRMSE'],order_by='LevelTestSetRMSE')
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
mlr LevelTestSetRMSE: 32487.00905488084
hwes LevelTestSetRMSE: 36031.56799119545
elasticnet LevelTestSetRMSE: 39522.6929766469
arima_ma_terms LevelTestSetRMSE: 46563.90435412486

In this instance, the regularization performed by the Elasticnet model resulted in worse performance than the MLR. We can see what hyperparameter values were selected from the tuning process by exporting the evaluated validation grid to a dataframe.
[56]:
f.export_validation_grid(model='elasticnet').sort_values(['metric_value']).head(15)
[56]:
alpha | l1_ratio | normalizer | validation_length | validation_metric | metric_value | |
---|---|---|---|---|---|---|
286 | 2.0 | 0.00 | minmax | 15 | rmse | 28047.227257 |
271 | 1.9 | 0.00 | minmax | 15 | rmse | 28068.118873 |
256 | 1.8 | 0.00 | minmax | 15 | rmse | 28091.256243 |
241 | 1.7 | 0.00 | minmax | 15 | rmse | 28117.021134 |
226 | 1.6 | 0.00 | minmax | 15 | rmse | 28145.886826 |
289 | 2.0 | 0.25 | minmax | 15 | rmse | 28178.109481 |
211 | 1.5 | 0.00 | minmax | 15 | rmse | 28178.447218 |
274 | 1.9 | 0.25 | minmax | 15 | rmse | 28205.401145 |
196 | 1.4 | 0.00 | minmax | 15 | rmse | 28215.457771 |
259 | 1.8 | 0.25 | minmax | 15 | rmse | 28235.588063 |
181 | 1.3 | 0.00 | minmax | 15 | rmse | 28257.894291 |
244 | 1.7 | 0.25 | minmax | 15 | rmse | 28269.155026 |
229 | 1.6 | 0.25 | minmax | 15 | rmse | 28306.701080 |
166 | 1.2 | 0.00 | minmax | 15 | rmse | 28307.039331 |
214 | 1.5 | 0.25 | minmax | 15 | rmse | 28348.975179 |
Since this model has a 0 value for its l1_ratio, it is functionally identical to a ridge model. The selected alpha value was 2.0, meaning its coefficient values were enhanced. Like the MLR, we can export Elasticnet feature importance:
[57]:
f.export_feature_importance(model='elasticnet')
[57]:
weight | std | |
---|---|---|
feature | ||
dayofweek_5 | 2.749117e-02 | 7.907711e-04 |
dayofweek_6 | 7.436973e-03 | 6.541601e-04 |
AR7 | 4.317314e-03 | 2.615517e-04 |
AR14 | 2.717978e-03 | 4.329003e-04 |
AR21 | 2.573850e-03 | 3.267257e-04 |
AR28 | 2.196061e-03 | 1.418368e-04 |
AR1 | 9.725242e-04 | 1.756827e-04 |
AR12 | 8.878815e-04 | 3.761012e-05 |
dayofweek_4 | 8.175025e-04 | 4.177929e-04 |
dayofweek_2 | 8.131714e-04 | 5.075441e-04 |
AR2 | 8.120865e-04 | 2.122303e-04 |
AR23 | 7.662798e-04 | 1.259909e-04 |
AR9 | 7.189180e-04 | 2.141193e-04 |
AR19 | 6.809245e-04 | 1.632880e-04 |
dayofweek_3 | 5.958526e-04 | 4.543544e-04 |
AR26 | 3.956908e-04 | 9.078496e-05 |
AR5 | 3.227825e-04 | 1.072438e-04 |
AR16 | 2.884897e-04 | 6.450203e-05 |
dayofweek_1 | 2.188073e-04 | 3.046482e-04 |
monthcos | 1.825427e-04 | 1.120517e-04 |
AR6 | 1.677917e-04 | 1.095012e-04 |
weekcos | 1.174737e-04 | 1.778393e-04 |
AR8 | 1.129823e-04 | 4.858206e-05 |
quartercos | 6.973386e-05 | 2.521774e-04 |
AR18 | 6.639154e-05 | 4.093531e-05 |
AR4 | 2.102775e-05 | 1.002385e-05 |
AR22 | 1.986738e-05 | 2.717691e-05 |
AR25 | 1.664419e-05 | 1.248833e-05 |
AR15 | 1.512428e-05 | 3.388058e-05 |
AR27 | 1.303510e-05 | 3.138985e-05 |
AR17 | 1.016360e-05 | 2.489665e-05 |
AR13 | 3.045983e-06 | 2.660378e-06 |
AR24 | 2.851832e-06 | 3.287063e-06 |
AR20 | 1.742056e-06 | 1.811364e-06 |
monthsin | 3.803045e-07 | 1.101982e-05 |
AR10 | 1.464026e-07 | 7.522306e-06 |
weeksin | -1.380882e-07 | 5.346109e-07 |
AR11 | -1.707905e-06 | 7.185485e-06 |
AR3 | -5.989313e-06 | 1.292480e-05 |
quartersin | -7.928743e-06 | 4.664887e-05 |
t | -7.204049e-05 | 1.807528e-04 |
The elasticnet may have been over-parameterized as it now finds some features as slightly harmful to the model, although those low of values could be due to chance.
Auto-Forecasting the Scikit-learn Models
notebook.tune_test_forecast()
. We will be tuning and forecasting with all available scikit-learn models. Below is a list of all models available, whether they are from scikit-learn, and whether they can be tuned.[58]:
from scalecast.Forecaster import _estimators_, _can_be_tuned_, _sklearn_estimators_
for m in _estimators_:
print(f'{m}; can be tuned: {m in _can_be_tuned_}; from scikit-learn: {m in _sklearn_estimators_}')
arima; can be tuned: True; from scikit-learn: False
combo; can be tuned: False; from scikit-learn: False
elasticnet; can be tuned: True; from scikit-learn: True
gbt; can be tuned: True; from scikit-learn: True
hwes; can be tuned: True; from scikit-learn: False
knn; can be tuned: True; from scikit-learn: True
lightgbm; can be tuned: True; from scikit-learn: True
lstm; can be tuned: False; from scikit-learn: False
mlp; can be tuned: True; from scikit-learn: True
mlr; can be tuned: True; from scikit-learn: True
prophet; can be tuned: True; from scikit-learn: False
rf; can be tuned: True; from scikit-learn: True
rnn; can be tuned: False; from scikit-learn: False
silverkite; can be tuned: True; from scikit-learn: False
svr; can be tuned: True; from scikit-learn: True
xgboost; can be tuned: True; from scikit-learn: True
Let’s call the function and the scikit-learn models only and save all feature info so that we can export that information later. To run the next function, note the following dependencies:
pip install tqdm
pip install ipython
pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension
if using Jupyter Lab:
jupyter labextension install @jupyter-widgets/jupyterlab-manager
[59]:
tune_test_forecast(f,
_sklearn_estimators_,
dynamic_tuning=False,
dynamic_testing=True,
feature_importance=True) # dynamic_tuning = False and dynamic_testing = True are defaults in this function
See the results on the test set.
[60]:
f.plot_test_set(order_by='LevelTestSetRMSE',models=_sklearn_estimators_,ci=True)

All of these models appear to have performed well on test-set data. Using a variety of models and tuning them carefully, a couple (or more) usually emerge as appearing reasonable to implement. Let’s now see the forecasts and level test-set RMSE from all these models.
[61]:
f.plot(models=_sklearn_estimators_,order_by='LevelTestSetRMSE',print_attr=['LevelTestSetRMSE'],ci=True)
mlr LevelTestSetRMSE: 32487.009054880975
lightgbm LevelTestSetRMSE: 36585.121542121844
rf LevelTestSetRMSE: 37323.21380044613
mlp LevelTestSetRMSE: 39120.33092263862
elasticnet LevelTestSetRMSE: 39522.6929766469
svr LevelTestSetRMSE: 44003.8044654691
gbt LevelTestSetRMSE: 48686.507886779866
xgboost LevelTestSetRMSE: 62488.67436538539
knn LevelTestSetRMSE: 92000.73638283063

Surprisingly perhaps, nothing outperformed the MLR model. Sometimes the simplest model is the best model. Let’s see the forecasted plots for each model, but to avoid clutter, we focus on only the models that came out further ahead than the others. We pass the models='top_5'
argument to this function and rerun. Note, now that the HWES model will not be displayed (which was run on level data), we will have to set level=True
to view the forecasts at level.
[62]:
f.plot(models='top_5',order_by='LevelTestSetRMSE',print_attr=['LevelTestSetRMSE'],level=True)
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
mlr LevelTestSetRMSE: 32487.009054880975
hwes LevelTestSetRMSE: 36031.56799119545
lightgbm LevelTestSetRMSE: 36585.121542121844
rf LevelTestSetRMSE: 37323.21380044613

These various patterns are interesting, and one has to believe the last value being an outlier signficantly influenced the resulting trends of the random forest and lightGBM models.
Prophet and Silverkite
Xvars
argument is available in both models–you can also tune them using the grid-search method. But, since they select a lot of their own regressors and optimize themselves by default, it is sometimes time-saving to manually forecast them with default parameters. We can also run them on un-differenced data and let them create their own regressors. Note,
undifferencing the series deletes all added regressors.[63]:
f.undiff()
f
[63]:
Forecaster(
DateStartActuals=2011-01-04T00:00:00.000000000
DateEndActuals=2011-12-09T00:00:00.000000000
Freq=D
ForecastLength=30
Xvars=[]
Differenced=0
TestLength=30
ValidationLength=15
ValidationMetric=rmse
ForecastsEvaluated=['hwes', 'hwes_seasonal', 'arima_ma_terms', 'mlr', 'elasticnet', 'gbt', 'knn', 'lightgbm', 'mlp', 'rf', 'svr', 'xgboost']
CILevel=0.95
BootstrapSamples=100
)
[64]:
f.set_estimator('prophet')
f.manual_forecast()
[65]:
f.set_estimator('silverkite')
f.manual_forecast()
f.save_summary_stats()
Note, summary stats are available for silverkite but not Prophet – contributions are welcome to get summary stats for Prophet.
After forecasting with Silverkite, we need to reset some matplotlib parameters for plotting to work.
[66]:
matplotlib.use('nbAgg')
%matplotlib inline
sns.set(rc={'figure.figsize':plot_dim})
Let’s see how these 2 new models performed compared with the top-2 models identified before.
[67]:
f.plot_test_set(models=['hwes_seasonal','mlr','prophet','silverkite'],order_by='LevelTestSetRMSE',level=True)

And their trends into the future:
[68]:
f.plot(models=['hwes_seasonal','mlr','prophet','silverkite'],order_by='LevelTestSetRMSE',print_attr=['LevelTestSetRMSE'],level=True)
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
mlr LevelTestSetRMSE: 32487.009054880975
prophet LevelTestSetRMSE: 34148.79734132142
silverkite LevelTestSetRMSE: 44999.94290460528

Neither the prophet or silverkite models proved to be as accurate as the MLR or HWES models.
TensorFlow Recurrent Neural Nets
The most advanced models available in scalecast are recurrent neural networks. There are two kinds: the SimpleRNN and the Long Short-Term Memory. The SimpleRNN, perhaps ironically, has the more complicated implementation in this framework. It allows for nearly all customization of the model that TensorFlow itselt would offer. What scalecast will do for you is preprocess the data correctly, scale all of the data when training the model, and unscale it appropriately to compare to any other models you run. It also tests the model on the same test set and bootsraps confidence intervals. The fact that it tests all the models also means it fits all the models twice, which can slow you down sometimes, but having two fits is also a blessing: it really lets you know how generalizable your model is when new data is added. Let’s explore the SimpleRNN more carefully.
SimpleRNN
See the documentation for the ‘rnn’ estimator. In practice, calling this model looks like:
[69]:
f.set_estimator('rnn')
f.manual_forecast(lags=30,
hidden_layers_struct={'simple':{'units':64},'simple':{'units':64},'simple':{'units':64}},
validation_split=0.2,
epochs=15,
plot_loss=True,
call_me='simple_rnn')
f.manual_forecast(lags=30,
hidden_layers_struct={'lstm':{'units':64},'lstm':{'units':64},'lstm':{'units':64}},
validation_split=0.2,
epochs=15,
plot_loss=True,
call_me='lstm')
Epoch 1/15
7/7 [==============================] - 1s 46ms/step - loss: 0.2078 - val_loss: 0.1211
Epoch 2/15
7/7 [==============================] - 0s 8ms/step - loss: 0.1542 - val_loss: 0.0969
Epoch 3/15
7/7 [==============================] - 0s 8ms/step - loss: 0.1322 - val_loss: 0.0845
Epoch 4/15
7/7 [==============================] - 0s 8ms/step - loss: 0.1232 - val_loss: 0.0810
Epoch 5/15
7/7 [==============================] - 0s 9ms/step - loss: 0.1185 - val_loss: 0.0811
Epoch 6/15
7/7 [==============================] - 0s 10ms/step - loss: 0.1161 - val_loss: 0.0733
Epoch 7/15
7/7 [==============================] - 0s 10ms/step - loss: 0.1130 - val_loss: 0.0711
Epoch 8/15
7/7 [==============================] - 0s 9ms/step - loss: 0.1104 - val_loss: 0.0746
Epoch 9/15
7/7 [==============================] - 0s 9ms/step - loss: 0.1073 - val_loss: 0.0714
Epoch 10/15
7/7 [==============================] - 0s 8ms/step - loss: 0.1037 - val_loss: 0.0718
Epoch 11/15
7/7 [==============================] - 0s 9ms/step - loss: 0.1006 - val_loss: 0.0641
Epoch 12/15
7/7 [==============================] - 0s 9ms/step - loss: 0.0961 - val_loss: 0.0619
Epoch 13/15
7/7 [==============================] - 0s 10ms/step - loss: 0.0925 - val_loss: 0.0618
Epoch 14/15
7/7 [==============================] - 0s 9ms/step - loss: 0.0909 - val_loss: 0.0589
Epoch 15/15
7/7 [==============================] - 0s 9ms/step - loss: 0.0893 - val_loss: 0.0580
Epoch 1/15
7/7 [==============================] - 1s 32ms/step - loss: 0.1212 - val_loss: 0.0673
Epoch 2/15
7/7 [==============================] - 0s 7ms/step - loss: 0.0921 - val_loss: 0.0555
Epoch 3/15
7/7 [==============================] - 0s 7ms/step - loss: 0.0784 - val_loss: 0.0483
Epoch 4/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0713 - val_loss: 0.0462
Epoch 5/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0677 - val_loss: 0.0443
Epoch 6/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0658 - val_loss: 0.0437
Epoch 7/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0646 - val_loss: 0.0419
Epoch 8/15
7/7 [==============================] - 0s 7ms/step - loss: 0.0624 - val_loss: 0.0414
Epoch 9/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0599 - val_loss: 0.0398
Epoch 10/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0585 - val_loss: 0.0388
Epoch 11/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0578 - val_loss: 0.0373
Epoch 12/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0568 - val_loss: 0.0395
Epoch 13/15
7/7 [==============================] - 0s 7ms/step - loss: 0.0560 - val_loss: 0.0367
Epoch 14/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0556 - val_loss: 0.0377
Epoch 15/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0549 - val_loss: 0.0353

Epoch 1/15
7/7 [==============================] - 2s 71ms/step - loss: 0.1927 - val_loss: 0.1278
Epoch 2/15
7/7 [==============================] - 0s 14ms/step - loss: 0.1520 - val_loss: 0.1011
Epoch 3/15
7/7 [==============================] - 0s 14ms/step - loss: 0.1279 - val_loss: 0.0797
Epoch 4/15
7/7 [==============================] - 0s 13ms/step - loss: 0.1194 - val_loss: 0.0804
Epoch 5/15
7/7 [==============================] - 0s 14ms/step - loss: 0.1178 - val_loss: 0.0758
Epoch 6/15
7/7 [==============================] - 0s 15ms/step - loss: 0.1174 - val_loss: 0.0755
Epoch 7/15
7/7 [==============================] - 0s 15ms/step - loss: 0.1169 - val_loss: 0.0742
Epoch 8/15
7/7 [==============================] - 0s 14ms/step - loss: 0.1163 - val_loss: 0.0741
Epoch 9/15
7/7 [==============================] - 0s 15ms/step - loss: 0.1162 - val_loss: 0.0740
Epoch 10/15
7/7 [==============================] - 0s 15ms/step - loss: 0.1165 - val_loss: 0.0738
Epoch 11/15
7/7 [==============================] - 0s 15ms/step - loss: 0.1160 - val_loss: 0.0737
Epoch 12/15
7/7 [==============================] - 0s 15ms/step - loss: 0.1160 - val_loss: 0.0740
Epoch 13/15
7/7 [==============================] - 0s 15ms/step - loss: 0.1159 - val_loss: 0.0737
Epoch 14/15
7/7 [==============================] - 0s 15ms/step - loss: 0.1158 - val_loss: 0.0739
Epoch 15/15
7/7 [==============================] - 0s 15ms/step - loss: 0.1158 - val_loss: 0.0742
Epoch 1/15
7/7 [==============================] - 2s 63ms/step - loss: 0.1221 - val_loss: 0.0743
Epoch 2/15
7/7 [==============================] - 0s 11ms/step - loss: 0.1009 - val_loss: 0.0540
Epoch 3/15
7/7 [==============================] - 0s 12ms/step - loss: 0.0815 - val_loss: 0.0468
Epoch 4/15
7/7 [==============================] - 0s 12ms/step - loss: 0.0726 - val_loss: 0.0465
Epoch 5/15
7/7 [==============================] - 0s 12ms/step - loss: 0.0720 - val_loss: 0.0447
Epoch 6/15
7/7 [==============================] - 0s 12ms/step - loss: 0.0712 - val_loss: 0.0446
Epoch 7/15
7/7 [==============================] - 0s 12ms/step - loss: 0.0708 - val_loss: 0.0426
Epoch 8/15
7/7 [==============================] - 0s 13ms/step - loss: 0.0705 - val_loss: 0.0441
Epoch 9/15
7/7 [==============================] - 0s 14ms/step - loss: 0.0704 - val_loss: 0.0430
Epoch 10/15
7/7 [==============================] - 0s 14ms/step - loss: 0.0703 - val_loss: 0.0419
Epoch 11/15
7/7 [==============================] - 0s 14ms/step - loss: 0.0697 - val_loss: 0.0429
Epoch 12/15
7/7 [==============================] - 0s 13ms/step - loss: 0.0696 - val_loss: 0.0424
Epoch 13/15
7/7 [==============================] - 0s 14ms/step - loss: 0.0695 - val_loss: 0.0420
Epoch 14/15
7/7 [==============================] - 0s 14ms/step - loss: 0.0693 - val_loss: 0.0424
Epoch 15/15
7/7 [==============================] - 0s 14ms/step - loss: 0.0694 - val_loss: 0.0423

Anything the lstm can do in scalecast, the rnn can also do. Using the lstm estimator instead of rnn is simpler, and that is explored below.
LSTM
See the documentation for the ‘lstm’ estimator. There are more parameters to explicitly specify, but that makes it a lot easier to navigate as well. In practice, calling this model looks like:
[70]:
f.set_estimator('lstm')
f.manual_forecast(lags=30,
lstm_layer_sizes=(64,64,64),
dropout=(0.2,0,0),
validation_split=0.2,
epochs=15,
plot_loss=True,
call_me='lstm_regularized')
Epoch 1/15
7/7 [==============================] - 5s 188ms/step - loss: 0.1909 - val_loss: 0.1114
Epoch 2/15
7/7 [==============================] - 0s 34ms/step - loss: 0.1384 - val_loss: 0.0840
Epoch 3/15
7/7 [==============================] - 0s 36ms/step - loss: 0.1234 - val_loss: 0.0775
Epoch 4/15
7/7 [==============================] - 0s 39ms/step - loss: 0.1189 - val_loss: 0.0797
Epoch 5/15
7/7 [==============================] - 0s 39ms/step - loss: 0.1188 - val_loss: 0.0753
Epoch 6/15
7/7 [==============================] - 0s 43ms/step - loss: 0.1179 - val_loss: 0.0775
Epoch 7/15
7/7 [==============================] - 0s 43ms/step - loss: 0.1181 - val_loss: 0.0754
Epoch 8/15
7/7 [==============================] - 0s 42ms/step - loss: 0.1178 - val_loss: 0.0775
Epoch 9/15
7/7 [==============================] - 0s 42ms/step - loss: 0.1180 - val_loss: 0.0748
Epoch 10/15
7/7 [==============================] - 0s 41ms/step - loss: 0.1179 - val_loss: 0.0743
Epoch 11/15
7/7 [==============================] - 0s 40ms/step - loss: 0.1176 - val_loss: 0.0753
Epoch 12/15
7/7 [==============================] - 0s 42ms/step - loss: 0.1176 - val_loss: 0.0762
Epoch 13/15
7/7 [==============================] - 0s 40ms/step - loss: 0.1171 - val_loss: 0.0758
Epoch 14/15
7/7 [==============================] - 0s 41ms/step - loss: 0.1169 - val_loss: 0.0750
Epoch 15/15
7/7 [==============================] - 0s 44ms/step - loss: 0.1173 - val_loss: 0.0757
Epoch 1/15
7/7 [==============================] - 4s 187ms/step - loss: 0.1096 - val_loss: 0.0658
Epoch 2/15
7/7 [==============================] - 0s 30ms/step - loss: 0.0816 - val_loss: 0.0456
Epoch 3/15
7/7 [==============================] - 0s 34ms/step - loss: 0.0744 - val_loss: 0.0483
Epoch 4/15
7/7 [==============================] - 0s 36ms/step - loss: 0.0717 - val_loss: 0.0437
Epoch 5/15
7/7 [==============================] - 0s 36ms/step - loss: 0.0718 - val_loss: 0.0440
Epoch 6/15
7/7 [==============================] - 0s 38ms/step - loss: 0.0709 - val_loss: 0.0435
Epoch 7/15
7/7 [==============================] - 0s 38ms/step - loss: 0.0710 - val_loss: 0.0421
Epoch 8/15
7/7 [==============================] - 0s 38ms/step - loss: 0.0718 - val_loss: 0.0445
Epoch 9/15
7/7 [==============================] - 0s 38ms/step - loss: 0.0710 - val_loss: 0.0423
Epoch 10/15
7/7 [==============================] - 0s 39ms/step - loss: 0.0702 - val_loss: 0.0423
Epoch 11/15
7/7 [==============================] - 0s 39ms/step - loss: 0.0701 - val_loss: 0.0426
Epoch 12/15
7/7 [==============================] - 0s 39ms/step - loss: 0.0709 - val_loss: 0.0431
Epoch 13/15
7/7 [==============================] - 0s 39ms/step - loss: 0.0704 - val_loss: 0.0427
Epoch 14/15
7/7 [==============================] - 0s 38ms/step - loss: 0.0698 - val_loss: 0.0430
Epoch 15/15
7/7 [==============================] - 0s 38ms/step - loss: 0.0706 - val_loss: 0.0430

Finally, we can see how all these models performed on the test set:
[71]:
f.plot_test_set(models=['simple_rnn','lstm','lstm_regularized'],ci=True,order_by='LevelTestSetRMSE')

And into the future:
[72]:
f.plot(models=['simple_rnn','lstm','lstm_regularized'],ci=True,order_by='LevelTestSetRMSE',level=True,print_attr=['LevelTestSetRMSE'])
simple_rnn LevelTestSetRMSE: 39136.43125093719
lstm LevelTestSetRMSE: 40525.52590577089
lstm_regularized LevelTestSetRMSE: 42748.6379268873

None of these models arrived at the level of the MLR or HWES models. That’s not to say they couldn’t if more time weren’t spent with them. Theoretically, they can specify thousands of parameters and find the true long-term trends in the data that none of the other models would be capable of. For some datasets, this makes a big difference. But some datasets are too random to be fit effectively by one of these more advanced models. Sometimes the simplest is most accurate, fastest, and most interpretable.
Combination Modeling
determine_best_by
parameter. - splice: a splice of two or more models at one or more future splice points. All metrics, fitted values, and test-set metrics from this model will be
identical to the simple average.We will be using simple and weighted average combination modeling only.
[73]:
f.set_estimator('combo')
f.manual_forecast(how='simple',models=['hwes','hwes_seasonal','arima_ma_terms','prophet','silverkite','simple_rnn','lstm','lstm_regularized'],call_me='avg_lvl_models')
f.manual_forecast(how='weighted',models=_sklearn_estimators_,determine_best_by='ValidationMetricValue',call_me='weighted_differenced_models')
It could be argued that combination modeling is prone to problems related to data leakage–we shouldn’t select model combinations based based on their performance on the test set and then recompare them to the test set. By default, deterimine_best_by in both of these combination types is 'ValidationMetricValue'
. This is a way to ensure data leakage does not occur. Since we didn’t tune all the models we want to combine in the first function, we simply write out the models we want to combine in
a list.
The weighted average model can accept weights as arguments that add to 0 or not–if not, they will be rebalanced to do so. If determine_best_by is specified, weights can be left to None and the weights will be chosen automatically. Again, be careful to not overfit models this way.
All information available for other models, including fitted values and test-set metrics, are also available for these combo models.
[74]:
f.plot_test_set(models='top_5',order_by='LevelTestSetRMSE',level=True)

Again, we see the usual suspects. One more time on future data:
[75]:
f.plot(models='top_5',order_by='LevelTestSetRMSE',print_attr=['LevelTestSetRMSE'],level=True)
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
mlr LevelTestSetRMSE: 32487.009054880975
prophet LevelTestSetRMSE: 34148.79734132142
hwes LevelTestSetRMSE: 36031.56799119545
lightgbm LevelTestSetRMSE: 36585.121542121844

And just as a grand finale, let’s see all evaluated models plotted together:
[76]:
f.plot(order_by='LevelTestSetRMSE',print_attr=['LevelTestSetRMSE'],level=True)
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
mlr LevelTestSetRMSE: 32487.009054880975
prophet LevelTestSetRMSE: 34148.79734132142
hwes LevelTestSetRMSE: 36031.56799119545
lightgbm LevelTestSetRMSE: 36585.121542121844
rf LevelTestSetRMSE: 37323.21380044613
avg_lvl_models LevelTestSetRMSE: 37405.41340106671
mlp LevelTestSetRMSE: 39120.33092263862
simple_rnn LevelTestSetRMSE: 39136.43125093719
elasticnet LevelTestSetRMSE: 39522.6929766469
lstm LevelTestSetRMSE: 40525.52590577089
lstm_regularized LevelTestSetRMSE: 42748.6379268873
svr LevelTestSetRMSE: 44003.8044654691
silverkite LevelTestSetRMSE: 44999.94290460528
arima_ma_terms LevelTestSetRMSE: 46563.90435412486
gbt LevelTestSetRMSE: 48686.507886779866
weighted_differenced_models LevelTestSetRMSE: 60531.99069667154
xgboost LevelTestSetRMSE: 62488.67436538539
knn LevelTestSetRMSE: 92000.73638283063

Export Results
[77]:
results = f.export(to_excel=True,determine_best_by='LevelTestSetRMSE',excel_name='eCommerce_results.xlsx')
f.all_feature_info_to_excel(excel_name='eCommerce_feature_info.xlsx')
f.all_validation_grids_to_excel(sort_by_metric_value=True,excel_name='eCommerce_validation_grids.xlsx')
One of the most interesting of the exports above is the model summaries dataframe. It has many columns that you can view in the exported Excel workbook, but a sample of this dataframe is displayed below.
[81]:
results['model_summaries'][['ModelNickname','Estimator','HyperParams','LevelTestSetRMSE','LevelTestSetR2']]
[81]:
ModelNickname | Estimator | HyperParams | LevelTestSetRMSE | LevelTestSetR2 | |
---|---|---|---|---|---|
0 | hwes_seasonal | hwes | {'trend': 'add', 'seasonal': 'add'} | 29432.192511 | 0.354237 |
1 | mlr | mlr | {} | 32487.009055 | 0.213230 |
2 | prophet | prophet | {} | 34148.797341 | 0.130682 |
3 | hwes | hwes | {'trend': 'add'} | 36031.567991 | 0.032180 |
4 | lightgbm | lightgbm | {'max_depth': 3} | 36585.121542 | 0.002215 |
5 | rf | rf | {'max_depth': 2, 'n_estimators': 1000} | 37323.213800 | -0.038451 |
6 | avg_lvl_models | combo | {'how': 'simple', 'models': ['hwes', 'hwes_sea... | 37405.413401 | -0.043030 |
7 | mlp | mlp | {'activation': 'tanh', 'hidden_layer_sizes': (... | 39120.330923 | -0.140862 |
8 | simple_rnn | rnn | {'lags': 30, 'hidden_layers_struct': {'simple'... | 39136.431251 | -0.141801 |
9 | elasticnet | elasticnet | {'alpha': 2.0, 'l1_ratio': 0.0} | 39522.692977 | -0.164451 |
10 | lstm | rnn | {'lags': 30, 'hidden_layers_struct': {'lstm': ... | 40525.525906 | -0.224293 |
11 | lstm_regularized | lstm | {'lags': 30, 'lstm_layer_sizes': (64, 64, 64),... | 42748.637927 | -0.362300 |
12 | svr | svr | {'kernel': 'linear', 'C': 0.5, 'epsilon': 0.01} | 44003.804465 | -0.443473 |
13 | silverkite | silverkite | {} | 44999.942905 | -0.509566 |
14 | arima_ma_terms | arima | {'order': (1, 1, 1), 'seasonal_order': (0, 1, ... | 46563.904354 | -0.616318 |
15 | gbt | gbt | {'max_depth': 2, 'max_features': 'sqrt'} | 48686.507887 | -0.767036 |
16 | weighted_differenced_models | combo | {'how': 'weighted', 'models': ['elasticnet', '... | 60531.990697 | -1.731480 |
17 | xgboost | xgboost | {'max_depth': 3} | 62488.674365 | -1.910923 |
18 | knn | knn | {'n_neighbors': 20, 'weights': 'uniform'} | 92000.736383 | -5.309728 |
Here is a list of all export functions. Most of these output results to a pandas dataframe.
[79]:
print(*f.get_funcs('exporter'),sep='\n')
export
export_summary_stats
export_feature_importance
export_validation_grid
all_feature_info_to_excel
all_validation_grids_to_excel
export_Xvars_df
export_forecasts_with_cis
export_test_set_preds_with_cis
export_fitted_vals
[ ]: