eCommerce Example

This notebook details a time series analysis and forecasting application performed with scalecast using the eCommerce dataset. The notebook starts with an exploratory data analysis; moves to time series decomposition; forecasts with an exponential smoothing model; an ARIMA model; a multiple linear regression model; moves to automated forecasting with scikit-learn models, Facebook Prophet, and LinkedIn Greykite/Silverkite; explores TensorFlow recurrent neural nets; and finally finishes with combination modeling.

The utilized dataset is available on kaggle: https://www.kaggle.com/carrie1/ecommerce-data/

Export Results

Library Imports

First, let’s import the libraries and read the data. Some data preprocessing in pandas will be necessary before calling scalecast.
[1]:
import pandas as pd
import numpy as np
import datetime
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from scalecast.Forecaster import Forecaster
from scalecast import GridGenerator
from scalecast.notebook import tune_test_forecast

plot_dim = (15,7)
sns.set(rc={'figure.figsize':plot_dim})

It is important to let pandas know that the date column should be datetime type. That’s why we pass the argument parse_dates=['InvoiceDate'].

[2]:
data = pd.read_csv('eCommerce.csv',parse_dates=['InvoiceDate'])

Exploratory Data Analysis

Let’s view the data’s first five rows.
[3]:
data.head()
[3]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom

Now, let’s view the data’s dimensions.

[4]:
data.shape
[4]:
(541909, 8)

Let’s see how much time these 550,000 observations span.

[5]:
print('first date in data:',data['InvoiceDate'].min())
print('last date in data:',data['InvoiceDate'].max())
first date in data: 2010-12-01 08:26:00
last date in data: 2011-12-09 12:50:00

In spite of there being over half a million rows, there is only about a year’s worth of data to analyze. Before proceding, we should decide a datetime frequency to aggregate the data to, otherwise we will be trying to forecast with an incosistent interval of time between observations. This decision depends on what question we are trying to answer. For this example, let’s try answering the question of if we can accurately predict daily gross sales revenues in the United Kingdom over 30 days. This means we will be removing sales that were negative in value (probably representing returns), creating a “Sales” column by multiplying quantity by price, and aggregating the entire dataframe to the daily level. Then, we subset to country=='United Kingdom' and fill any days that show no sales with 0.

[6]:
# drop negative sales quantities
data = data.loc[(data['Quantity'] > 0) & (data['Country'] == 'United Kingdom')]
# create the Sales column
data['Sales'] = data['Quantity']*data['UnitPrice']
# aggregate the dataframe to the daily level
dt_aggr = 'D'
data['DateTime'] = data['InvoiceDate'].dt.floor(dt_aggr)
tbl = data.groupby('DateTime')['Sales'].sum().reset_index()
# view first 5 rows
tbl.head()
[6]:
DateTime Sales
0 2010-12-01 54818.08
1 2010-12-02 47570.53
2 2010-12-03 41308.69
3 2010-12-05 25853.20
4 2010-12-06 53322.12
[7]:
tbl.shape
[7]:
(305, 2)
[8]:
print('first date in data:',tbl['DateTime'].min())
print('last date in data:',tbl['DateTime'].max())
first date in data: 2010-12-01 00:00:00
last date in data: 2011-12-09 00:00:00

It is possible that after making this aggregation to the business-daily level, some date observations in the range are missing. Some forecasting libraries will process missing data for you automatically, but because scalecast mixes so many model concepts, it is necessary to have every possible date in a given range represented. If we run the code below, we will limit the dataframe to only business days and any missing dates will have their sales filled with 0.

[9]:
all_dates = pd.DataFrame({'DateTime':pd.date_range(start=tbl['DateTime'].min(),end=tbl['DateTime'].max(),freq=dt_aggr)})
full_data = all_dates.merge(tbl,on='DateTime',how='left').sort_values(['DateTime']).fillna(0)
full_data.head()
[9]:
DateTime Sales
0 2010-12-01 54818.08
1 2010-12-02 47570.53
2 2010-12-03 41308.69
3 2010-12-04 0.00
4 2010-12-05 25853.20

Let’s see how that looks plotted.

[10]:
full_data.plot(x='DateTime',y='Sales',title='eCommerce Sales Original')
plt.show()
../../_images/Forecaster_examples_eCommerce_17_0.png

We notice the last observation in the dataframe is an outlier. Let’s look at it more closely.

[11]:
full_data.sort_values(['Sales'],ascending=False).head(1)
[11]:
DateTime Sales
373 2011-12-09 196134.1

There are different ways to handle outliers in time series, but we will try ignoring it to see how well our models can predict it.

The last preprocessing function we want to perform is removing the first month or so of observations because so many of them are 0 or close-to-0 in value. Starting January 4, 2011, we see a more normal pattern.

[12]:
full_data = full_data.loc[full_data['DateTime'] >= datetime.datetime(2011,1,4)]
full_data.plot(x='DateTime',y='Sales',title='eCommerce Sales Starting Jan 4')
plt.show()
../../_images/Forecaster_examples_eCommerce_22_0.png

Much better. In the real world, we might want to know more about that outlier – why it exists, the best way to mitigate it, etc. This example is more interested in producing forecasts and showing the scalecast process, so we are going to sweep the issue of this outlier under the rug and forget about it. Let’s now try to get a better idea of how the data is distributed.

[13]:
full_data.describe()
[13]:
Sales
count 340.000000
mean 24278.908776
std 20781.112463
min 0.000000
25% 12518.370000
50% 21182.990000
75% 31394.397500
max 196134.100000
[14]:
full_data['Sales'].hist(bins=25)
plt.show()
../../_images/Forecaster_examples_eCommerce_25_0.png

The remaining data looks somewhat normally distributed with a slight right skew. We see that one outlier far above all other values.

Forecast with Scalecast

To load the object, we call the Forecaster() function with the y and current_dates parameters specified. If we hadn’t already dropped the first observations of the data before calling the object, we could have done it by using the keep_smaller_history() function as shown below. We can then plot the values that we will be using for forecasting.
[15]:
f = Forecaster(y=full_data['Sales'],current_dates=full_data['DateTime'])
f
[15]:
Forecaster(
    DateStartActuals=2011-01-04T00:00:00.000000000
    DateEndActuals=2011-12-09T00:00:00.000000000
    Freq=D
    ForecastLength=0
    Xvars=[]
    Differenced=0
    TestLength=1
    ValidationLength=1
    ValidationMetric=rmse
    ForecastsEvaluated=[]
    CILevel=0.95
    BootstrapSamples=100
)

The first thing you should do after initializing the object is set its test length. What that length is is up to you. The longer the length, the more confident you can be about your reported error/accuracy metrics. The library requires a test length of at least 1. Let’s set our test length to be the same size as our forecast length: 30 days.

[16]:
f.set_test_length(30)

Before beginning the forecasting process, we should get a better idea of the signals within the time series itself. Using ACF, PACF, and Periodogram plots, we can observe how the series is auto-correlated. We leave the test set out of all visualizations (train_only=True) to not leak data when making decisions about which signals exist in the data.

[17]:
f.plot_acf(train_only=True,lags=30)
plt.show()
../../_images/Forecaster_examples_eCommerce_32_0.png
[18]:
f.plot_pacf(train_only=True,lags=30)
plt.show()
../../_images/Forecaster_examples_eCommerce_33_0.png
[19]:
a, b = f.plot_periodogram(diffy=True,train_only=True)
plt.semilogy(a, b)
plt.show()
../../_images/Forecaster_examples_eCommerce_34_0.png
[20]:
f.seasonal_decompose(train_only=True).plot()
plt.show()
../../_images/Forecaster_examples_eCommerce_35_0.png

From these graphs, we get a sense that the data has a seasonal pattern over seven periods, or one week. The data appears to be significantly autocorrelated back 1, 6, 7, and 13, and 14 periods, possibly more. This is good information to use when deciding how to specify the forecasts.

Now, let’s test the data’s stationarity using the Augmented Dickey-Fuller test. The null hypothesis of this test is that the data is not stationary. To return the p-val and critical value from this test, pass full_res=True as an argument. The way we have the test specified, it will simply print out the implications from the test.

[21]:
isstationary = f.adf_test(quiet=False)
series might not be stationary

Since our test implies non-stationarity, let’s view all these plots again, but pass differenced data into them by using the diffy=1 argument.

[22]:
f.plot_acf(diffy=1,train_only=True,lags=30)
plt.show()
../../_images/Forecaster_examples_eCommerce_40_0.png
[23]:
f.plot_pacf(diffy=1,train_only=True,lags=30)
plt.show()
../../_images/Forecaster_examples_eCommerce_41_0.png
[24]:
f.seasonal_decompose(diffy=1,train_only=True).plot()
plt.show()
../../_images/Forecaster_examples_eCommerce_42_0.png

This doesn’t give us a very different view of the data, but reinforces that there is strong autocorrelation in our dataset. Some of that can be controlled by using the series’ first difference.

You can see below all functions available to plot the information in the object:

[25]:
print(*f.get_funcs('plotter'),sep='\n')
plot_acf
plot_pacf
plot_periodogram
seasonal_decompose
plot
plot_test_set
plot_fitted

All setter functions:

[26]:
print(*f.get_funcs('setter'),sep='\n')
set_last_future_date
set_test_length
set_validation_length
set_cilevel
set_bootstrap_samples
set_estimator
set_validation_metric

All getter functions:

[27]:
print(*f.get_funcs('getter'),sep='\n')
get_regressor_names
get_freq

HWES

Let’s run a couple of exponential smoothing forecasts using Holt-Winters Exponential Smoothing from StatsModels. Most exponential smoothing is considered a somewhat simple way to forecast time series. They are models that smooth out recent trends and predict them to the future. The added benefit of using Holt-Winters is that setting extra parameters to this basic idea, such as seasonality, is possible. We will try one HWES model with seasonality and one without.

Before running any forecast, we need to generate a forecast period. We are attempting to predict 30 days into the future.

[28]:
f.generate_future_dates(30)

To run an HWES model, we first set the estimator to hwes, then call the manual_forecast() function. We can run as many hwes models as we like and differentiate them by using the call_me argument. By default, the model will be called whatever the estimator is (“hwes” is “hwes”, “arima” is “arima”, etc.)

[29]:
f.set_estimator('hwes')
f.manual_forecast(trend='add')
f.save_summary_stats()

f.manual_forecast(trend='add',seasonal='add',call_me='hwes_seasonal')
f.save_summary_stats()

We can view the results of the model by plotting the test-set predictions with the actual test-set observations, setting ci=True to show 95% confidence intervals. All confidence intervals are evaluated with bootstrapping, and could be different than what is returned from the underlying statsmodels function. This is to make all forecast comparisons consistent across all model types.

[30]:
f.plot_test_set(ci=True)
../../_images/Forecaster_examples_eCommerce_54_0.png

We can see the models’ performance over the 30-day forecast horizon as well:

[31]:
f.plot(print_attr=['LevelTestSetRMSE'],ci=True)
hwes LevelTestSetRMSE: 36031.56799119545
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
../../_images/Forecaster_examples_eCommerce_56_1.png

You may have noticed we called a function save_summary_stats() after running each ARIMA model. Most models allow feature information to be saved from them using save_feature_importance() or save_summary_stats(). Some models don’t allow either. They should be called before running a new model because they automatically save that information for the last model run only. As a rule of thumb, saving summary stats is much less computationally expensive than saving feature importance, which will be explored later.

We can see the summary statistics for the better-peforming HWES model exporting the saved summary stats to a dataframe.

[32]:
f.export_summary_stats(model='hwes_seasonal')
[32]:
coeff code optimized
smoothing_level 0.111071 alpha True
smoothing_trend 0.000100 beta True
smoothing_seasonal 0.197540 gamma True
initial_level 22137.727000 l.0 True
initial_trend 190.311940 b.0 True
initial_seasons.0 31545.165000 s.0 True
initial_seasons.1 -1669.115200 s.1 True
initial_seasons.2 -911.148380 s.2 True
initial_seasons.3 1028.341700 s.3 True
initial_seasons.4 -20410.079000 s.4 True
initial_seasons.5 -10495.992000 s.5 True
initial_seasons.6 912.828050 s.6 True

ARIMA

Now, let’s run another common, albeit slightly more advanced time-series model: ARIMA. This model uses the series’ own history, errors, and stationarity to forecast. Using the output from the plots above, as well as the results from the ADF test, we can specify a 1,1,0 x 1,1,0 ordered model. The seasonal period will be 7 periods–one week.

To run an ARIMA model, we first set the estimator to ARIMA, then call the manual_forecast() function. We can run as many ARIMA models as we like, and differentiate them by using the call_me argument. By default, the model will be called whatever the estimator is (“arima” is “arima”, “mlr” is “mlr”, etc.)

[33]:
f.set_estimator('arima')
f.manual_forecast(order=(1,1,0),seasonal_order=(1,1,0,7))
f.save_summary_stats()

f.manual_forecast(order=(1,1,1),seasonal_order=(0,1,1,7),call_me='arima_ma_terms')
f.save_summary_stats()

We can view the results of the model by plotting the test-set predictions with the actual test-set observations, setting ci=True to show 95% confidence intervals. All confidence intervals are evaluated with bootstrapping, and could be different than what is returned from the underlying statsmodels function. This is to make all forecast comparison consistent across all model types.

[34]:
f.plot_test_set(models=['arima','arima_ma_terms'],ci=True)
../../_images/Forecaster_examples_eCommerce_63_0.png

Those models appear to capture the daily trend fairly well. Let’s see how they look into future periods compared to the HWES models previously evaluated.

[35]:
f.plot(print_attr=['LevelTestSetRMSE'],ci=True)
hwes LevelTestSetRMSE: 36031.56799119545
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
arima LevelTestSetRMSE: 49010.85466653722
arima_ma_terms LevelTestSetRMSE: 46563.90435412486
../../_images/Forecaster_examples_eCommerce_65_1.png

The ARIMA model without MA terms has really gone in a strange direction. It is also the worst performing model we have evaluated yet, using the test-set RMSE as our comparison metric. Let’s delete this model from memory.

[36]:
f.pop('arima')

We can see the summary statistics for the remaining ARIMA model by exporting the saved summary stats to a dataframe, just like we did for the HWES model.

[37]:
f.export_summary_stats(model='arima_ma_terms')
[37]:
coef std err z P>|z| [0.025 0.975]
ar.L1 4.210000e-02 1.120000e-01 3.750000e-01 0.707 -1.780000e-01 2.620000e-01
ma.L1 -8.260000e-01 8.900000e-02 -9.300000e+00 0.000 -1.000000e+00 -6.520000e-01
ma.S.L7 -8.010000e-01 5.000000e-02 -1.598900e+01 0.000 -8.990000e-01 -7.030000e-01
sigma2 2.927000e+08 2.130000e-10 1.380000e+18 0.000 2.930000e+08 2.930000e+08

MLR

We can use many other models through scalecast. One of the most basic of these is Multiple Linear Regression. Unlike ARIMA, we don’t specify orders on this model, but we can add similar regressors, including autoregressive and seasonal terms, as well as a time trend. The MLR works very similar to the ARIMA model in that it tries to find a linear relationship between all these components and the future. It makes the assumption that the errors in each time period are independent of one another, however. This assumption may be spurious, but the MLR has proven to make accurate predictions on real-world data.

First, let’s begin with autoregressive terms, which are lags of the dependent variable values. We can 28 lags to make sure we are capturing all statistically signficant terms.

[38]:
f.add_ar_terms(28) # 1-4 lags

To account for stationarity, which is done by setting the middle term to 1 in ARIMA, we have to difference our data before modeling with linear regression.

[39]:
f.diff()

We can confirm the first-differenced data’s (probable) stationarity with another ADF test.

[40]:
isstationary = f.adf_test(quiet=False)
series appears to be stationary

Let’s plot its first difference.

[41]:
f.plot(models=None)
../../_images/Forecaster_examples_eCommerce_77_0.png

Let’s now add additional seasonality to the model regressors. The main seasonality that we have been able to confirm is weekly, so we can add the ‘dayofweek’ regressor to our object. We have three options: we can use a sin/cos transformation that accounts for regular fluctuations in the data in a cyclical form; we can add the data as 6-7 dummies using dummy=True and specifying the drop_first parameter; or we can just use the raw 1-5 numerical output, which is the default (a decision tree model may handle this last kind of regressor better than a linear model). Any decision we make in this regard has its pros and cons. For this example, we will use the dummy transformation. Also, to add complexity that the ARIMA couldn’t caputre, we add weekly, monthly, and quarterly seasonality with the sin/cos transformation.

Other seasonal regressors are available and can be specified in the same way, including ‘day’, ‘hour’, ‘minute’ and more.

[42]:
f.add_seasonal_regressors('dayofweek',raw=False,dummy=True,drop_first=True)
f.add_seasonal_regressors('week','month','quarter',raw=False,sincos=True)

Let’s also add a time trend.

[43]:
f.add_time_trend()

All options for adding regressors are:

[44]:
print(*f.get_funcs('adder'),sep='\n')
add_ar_terms
add_AR_terms
add_seasonal_regressors
add_time_trend
add_other_regressor
add_covid19_regressor
add_combo_regressors
add_poly_terms
add_exp_terms
add_logged_terms
dd_pt_terms
add_diffed_terms
add_lagged_terms

See the documentation

Let’s see what calling our object now that we have evaluated some models looks like.

[45]:
f
[45]:
Forecaster(
    DateStartActuals=2011-01-04T00:00:00.000000000
    DateEndActuals=2011-12-09T00:00:00.000000000
    Freq=D
    ForecastLength=30
    Xvars=['AR1', 'AR2', 'AR3', 'AR4', 'AR5', 'AR6', 'AR7', 'AR8', 'AR9', 'AR10', 'AR11', 'AR12', 'AR13', 'AR14', 'AR15', 'AR16', 'AR17', 'AR18', 'AR19', 'AR20', 'AR21', 'AR22', 'AR23', 'AR24', 'AR25', 'AR26', 'AR27', 'AR28', 'dayofweek_1', 'dayofweek_2', 'dayofweek_3', 'dayofweek_4', 'dayofweek_5', 'dayofweek_6', 'weeksin', 'weekcos', 'monthsin', 'monthcos', 'quartersin', 'quartercos', 't']
    Differenced=1
    TestLength=30
    ValidationLength=1
    ValidationMetric=rmse
    ForecastsEvaluated=['hwes', 'hwes_seasonal', 'arima_ma_terms']
    CILevel=0.95
    BootstrapSamples=100
)

When adding these and other kinds of regressors, it is possible to change the names of some of them. This can come in handy to reference later. If any regressors begin with uppercase “AR”, the forecasting mechanisms in most of the models will assume such terms are autoregressive in nature and those terms are handled differently. So, be careful when naming variables.

Let’s move to modeling with a linear model. Any arguments that the linear model from the scikit-learn library accepts can also be accepted here. In addition, the following arguments are available for all sklearn models: - Xvars (arguments include “all”, None, and list-like objects) – default is always None, but for models that require regressors, this is treated the same as “all” - normalizer (arguments are None, “minmax”, “normalize”, “pt”, and “scale”) – default is always “minmax” - call_me – does not affect the model’s evaluation at all, just names the model for reference later

[46]:
f.set_estimator('mlr')
f.manual_forecast(normalizer=None,Xvars=None)

Like the ARIMA model, we can see its performance on the test set. Note that all forecasting in this module, even on the test set, is dynamic in nature so that no training-set information is leaked but autoregressive and seasonal patterns can still be predicted. We can trust that this is a true performance on 20 days of out-of-sample data.

The way the forecasting mechanism works (when AR terms are involved) is by making a prediction one step into the future then filling in those predictions to create new AR terms, until the entire forecast interval has been predicted. This is true for testing and forecasting, but validating is non-dynamic by default. Both validating and testing can be either dynamic or non-dynamic; this will be explored later. With large test, validation, or forecast intervals, the forecasting may slow down considerably if everything is kept dynamic. However, if AR terms have not been added to the regressors, forecasting times are similar to any non-time-series prediction application.

[47]:
f.plot_test_set(models='mlr',ci=True)
../../_images/Forecaster_examples_eCommerce_90_0.png

Since the other models were run on level data and the MLR was run on differenced data, to compare them, the plots of the test set and forecasts will revert to level automatically, but level=True is available as an argument if all models were run on differenced data but you want to see their performance on the level test set.

[48]:
f.plot_test_set(models=['mlr','hwes_seasonal'])
../../_images/Forecaster_examples_eCommerce_92_0.png

Confidence intervals are no longer available since models were run at different levels.

Let’s now plot both models’ forecasted values on level data. Let’s order the way these models are displayed based on their test-set performance as well.

[49]:
f.plot(print_attr=['LevelTestSetRMSE'],order_by='LevelTestSetRMSE')
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
mlr LevelTestSetRMSE: 32487.00905488084
hwes LevelTestSetRMSE: 36031.56799119545
arima_ma_terms LevelTestSetRMSE: 46563.90435412486
../../_images/Forecaster_examples_eCommerce_95_1.png

We see that the MLR model performed better on the test set than the remaining ARIMA model, but still not as well as the best HWES model.

With scikit-learn models, such as MLR, permutation feature importance information from the eli5 package can be saved and exported to a dataframe, analogous to saving summary stats from an ARIMA model. Each weight is the average decrease in accuracy when that particular variable is substituted for random values over 10 iterations, so the variables that cause the largest decreases are considered the most important. The output of the exported data is therefore ranked in terms of which variables are most-to-least important, according to this methodology.

[50]:
f.save_feature_importance()
[51]:
f.export_feature_importance(model='mlr')
[51]:
weight std
feature
AR1 1.361581 0.048865
AR2 1.192099 0.055277
AR3 0.857568 0.026342
AR12 0.583114 0.014812
AR4 0.478963 0.040052
AR6 0.393926 0.037958
AR13 0.366810 0.041059
AR5 0.365042 0.027757
AR23 0.330003 0.037069
AR15 0.316762 0.027188
dayofweek_5 0.288735 0.025208
AR11 0.273210 0.028163
AR14 0.272825 0.030271
AR22 0.237563 0.015261
AR24 0.206462 0.025954
AR9 0.193294 0.020129
AR8 0.190799 0.012876
AR20 0.177669 0.018373
AR17 0.132064 0.014788
AR10 0.126921 0.018454
AR21 0.114727 0.024589
AR19 0.104920 0.006715
monthcos 0.100597 0.022104
AR16 0.097380 0.004449
AR7 0.090623 0.012395
AR18 0.089148 0.012601
dayofweek_6 0.084458 0.017751
t 0.056415 0.004780
AR27 0.051311 0.007776
weeksin 0.046513 0.005039
weekcos 0.043880 0.010293
dayofweek_4 0.037765 0.010777
AR26 0.031685 0.005943
AR25 0.029214 0.005976
dayofweek_1 0.021407 0.009135
dayofweek_2 0.018690 0.006764
AR28 0.011374 0.005876
quartersin 0.010209 0.005488
dayofweek_3 0.005569 0.002313
monthsin 0.000719 0.003264
quartercos 0.000186 0.000293

At first glance, it looks as though all added regressors were useful to the MLR model, but we can explore regularization now to further refine our forecasting approach.

Elasticnet and Auto-Forecasting

To optimize the hyperparameters of and auto-forecast with models in scalecast, we use a grid-search approach on a validation set of data–a period of time before the test set. The grids are completely customizable, but standard template grids are available by calling the function below:
[52]:
GridGenerator.get_example_grids()

GridGenerator.get_empty_grids() is also available. We could call this then open the created file (Grids.py) and fill in the empty dictionary with hyperparameter values that we want to use. The example grids can usually be used for adequate performance, but contributions to improving the default values are welcome. You are also more-than-welcome to open the Grids.py file locally and making adjustments as you see fit.

These grids are saved to your working directory as Grids.py. Their structure is that of Dict[str:list-like] and scalecast automatically knows how to look for them. You can also pass your own grid manually by using ingest_grid() and passing a str value, which corresponds to a namespace of a grid in the Grids.py file, or a grid of dict type. By default, the code below will ingest the grid named elasticnet from the Grids.py file in this same directory.

You can also create really big grids and limit them randomly by calling limit_grid_size(n:int|float). If int, must be >0 to denote the number of combinations in the grid to keep. If float, must be between 0 and 1 to represent a portion of the grid’s original size to randomly keep. random_seed is available as an argument in this method and is None by default.

[53]:
f.set_validation_length(15)
f.set_estimator('elasticnet')
# unlike testing, tuning is non-dynamic by default, but this can be changed by passing dynamic_tuning=True as an argument below
f.tune() # automatically imports the elasticnet grid from Grids.py
# unlike tuning, testing is dynamic by default, but this can be changed by passing dynamic_testing=False as an argument below
f.auto_forecast()
f.save_feature_importance()
[54]:
f.plot_test_set(models=['mlr','elasticnet'],ci=True)
../../_images/Forecaster_examples_eCommerce_105_0.png

Regularization really took the seasonal pattern out of this model and reverted the prediction to nearly a straight line. Let’s compare all model forecasts as we did before.

[55]:
f.plot(print_attr=['LevelTestSetRMSE'],order_by='LevelTestSetRMSE')
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
mlr LevelTestSetRMSE: 32487.00905488084
hwes LevelTestSetRMSE: 36031.56799119545
elasticnet LevelTestSetRMSE: 39522.6929766469
arima_ma_terms LevelTestSetRMSE: 46563.90435412486
../../_images/Forecaster_examples_eCommerce_107_1.png

In this instance, the regularization performed by the Elasticnet model resulted in worse performance than the MLR. We can see what hyperparameter values were selected from the tuning process by exporting the evaluated validation grid to a dataframe.

[56]:
f.export_validation_grid(model='elasticnet').sort_values(['metric_value']).head(15)
[56]:
alpha l1_ratio normalizer validation_length validation_metric metric_value
286 2.0 0.00 minmax 15 rmse 28047.227257
271 1.9 0.00 minmax 15 rmse 28068.118873
256 1.8 0.00 minmax 15 rmse 28091.256243
241 1.7 0.00 minmax 15 rmse 28117.021134
226 1.6 0.00 minmax 15 rmse 28145.886826
289 2.0 0.25 minmax 15 rmse 28178.109481
211 1.5 0.00 minmax 15 rmse 28178.447218
274 1.9 0.25 minmax 15 rmse 28205.401145
196 1.4 0.00 minmax 15 rmse 28215.457771
259 1.8 0.25 minmax 15 rmse 28235.588063
181 1.3 0.00 minmax 15 rmse 28257.894291
244 1.7 0.25 minmax 15 rmse 28269.155026
229 1.6 0.25 minmax 15 rmse 28306.701080
166 1.2 0.00 minmax 15 rmse 28307.039331
214 1.5 0.25 minmax 15 rmse 28348.975179

Since this model has a 0 value for its l1_ratio, it is functionally identical to a ridge model. The selected alpha value was 2.0, meaning its coefficient values were enhanced. Like the MLR, we can export Elasticnet feature importance:

[57]:
f.export_feature_importance(model='elasticnet')
[57]:
weight std
feature
dayofweek_5 2.524764e-02 0.002553
dayofweek_6 6.376736e-03 0.000954
AR7 4.668536e-03 0.000362
AR14 2.663682e-03 0.000374
AR21 2.437721e-03 0.000222
AR28 1.911884e-03 0.000271
dayofweek_4 1.700549e-03 0.000770
dayofweek_2 9.471373e-04 0.000450
AR1 9.214553e-04 0.000158
AR12 8.393193e-04 0.000090
AR2 7.304451e-04 0.000192
AR19 7.197982e-04 0.000110
AR9 7.032065e-04 0.000132
AR23 6.781578e-04 0.000101
dayofweek_3 5.588932e-04 0.000502
AR16 4.336966e-04 0.000186
AR26 3.885394e-04 0.000187
t 2.524591e-04 0.000168
quartercos 2.454747e-04 0.000246
AR5 2.377288e-04 0.000154
AR6 1.819853e-04 0.000114
monthcos 1.219272e-04 0.000297
AR8 1.035100e-04 0.000062
AR18 5.723864e-05 0.000041
AR15 4.220264e-05 0.000026
AR22 4.169538e-05 0.000020
AR10 2.042367e-05 0.000008
AR25 1.674803e-05 0.000039
AR17 1.668208e-05 0.000027
AR4 9.392779e-06 0.000015
dayofweek_1 6.705639e-06 0.000624
AR11 4.239243e-06 0.000005
monthsin 3.421046e-06 0.000014
AR24 2.676908e-06 0.000005
weeksin 1.202793e-06 0.000002
AR13 4.208357e-07 0.000005
AR20 -2.269004e-07 0.000002
AR3 -1.461391e-06 0.000014
AR27 -8.850249e-06 0.000016
quartersin -9.815362e-06 0.000032
weekcos -2.958930e-05 0.000118

The elasticnet may have been over-parameterized as it now finds some features as slightly harmful to the model, although those low of values could be due to chance.

Auto-Forecasting the Scikit-learn Models

In the same way we automatically tuned and forecasted with an elasticnet model, we can choose many models to forecast with using notebook.tune_test_forecast(). We will be tuning and forecasting with all available scikit-learn models. Below is a list of all models available, whether they are from scikit-learn, and whether they can be tuned.
[58]:
from scalecast.Forecaster import _estimators_, _can_be_tuned_, _sklearn_estimators_

for m in _estimators_:
    print(f'{m}; can be tuned: {m in _can_be_tuned_}; from scikit-learn: {m in _sklearn_estimators_}')
arima; can be tuned: True; from scikit-learn: False
combo; can be tuned: False; from scikit-learn: False
elasticnet; can be tuned: True; from scikit-learn: True
gbt; can be tuned: True; from scikit-learn: True
hwes; can be tuned: True; from scikit-learn: False
knn; can be tuned: True; from scikit-learn: True
lightgbm; can be tuned: True; from scikit-learn: True
lstm; can be tuned: False; from scikit-learn: False
mlp; can be tuned: True; from scikit-learn: True
mlr; can be tuned: True; from scikit-learn: True
prophet; can be tuned: True; from scikit-learn: False
rf; can be tuned: True; from scikit-learn: True
rnn; can be tuned: False; from scikit-learn: False
silverkite; can be tuned: True; from scikit-learn: False
svr; can be tuned: True; from scikit-learn: True
xgboost; can be tuned: True; from scikit-learn: True

Let’s call the function and the scikit-learn models only and save all feature info so that we can export that information later. To run the next function, note the following dependencies:

  • pip install tqdm

  • pip install ipython

  • pip install ipywidgets

  • jupyter nbextension enable --py widgetsnbextension

  • if using Jupyter Lab: jupyter labextension install @jupyter-widgets/jupyterlab-manager

[59]:
tune_test_forecast(f,
                   _sklearn_estimators_,
                   dynamic_tuning=False,
                   dynamic_testing=True,
                   feature_importance=True) # dynamic_tuning = False and dynamic_testing = True are defaults in this function

See the results on the test set.

[60]:
f.plot_test_set(order_by='LevelTestSetRMSE',models=_sklearn_estimators_,ci=True)
../../_images/Forecaster_examples_eCommerce_117_0.png

All of these models appear to have performed well on test-set data. Using a variety of models and tuning them carefully, a couple (or more) usually emerge as appearing reasonable to implement. Let’s now see the forecasts and level test-set RMSE from all these models.

[61]:
f.plot(models=_sklearn_estimators_,order_by='LevelTestSetRMSE',print_attr=['LevelTestSetRMSE'],ci=True)
mlr LevelTestSetRMSE: 32487.009054880975
rf LevelTestSetRMSE: 35831.41728470146
lightgbm LevelTestSetRMSE: 36585.121542121844
mlp LevelTestSetRMSE: 39120.33092263862
elasticnet LevelTestSetRMSE: 39522.6929766469
svr LevelTestSetRMSE: 44003.8044654691
gbt LevelTestSetRMSE: 57599.607371822385
xgboost LevelTestSetRMSE: 62488.67436538539
knn LevelTestSetRMSE: 92000.73638283063
../../_images/Forecaster_examples_eCommerce_119_1.png

Surprisingly perhaps, nothing outperformed the MLR model. Sometimes the simplest model is the best model. Let’s see the forecasted plots for each model, but to avoid clutter, we focus on only the models that came out further ahead than the others. We pass the models='top_5' argument to this function and rerun. Note, now that the HWES model will not be displayed (which was run on level data), we will have to set level=True to view the forecasts at level.

[62]:
f.plot(models='top_5',order_by='LevelTestSetRMSE',print_attr=['LevelTestSetRMSE'],level=True)
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
mlr LevelTestSetRMSE: 32487.009054880975
rf LevelTestSetRMSE: 35831.41728470146
hwes LevelTestSetRMSE: 36031.56799119545
lightgbm LevelTestSetRMSE: 36585.121542121844
../../_images/Forecaster_examples_eCommerce_121_1.png

These various patterns are interesting, and one has to believe the last value being an outlier signficantly influenced the resulting trends of the random forest and lightGBM models.

Prophet and Silverkite

In addition to the scikit-learn and statsmodels forecasting models we have already explored, two other forecasting models from popular libraries are available for use: Facebook Prophet and LinkedIn Silverkite (from the greykite package). Prophet using a Bayesian regression approach with changepoints; Silverkite uses a linear model with regularization, also with changepoints. Calling them and using them are as easy as any other forecasting model in this package. The Xvars argument is available in both models–you can also tune them using the grid-search method. But, since they select a lot of their own regressors and optimize themselves by default, it is sometimes time-saving to manually forecast them with default parameters. We can also run them on un-differenced data and let them create their own regressors. Note, undifferencing the series deletes all added regressors.
[63]:
f.undiff()
f
[63]:
Forecaster(
    DateStartActuals=2011-01-04T00:00:00.000000000
    DateEndActuals=2011-12-09T00:00:00.000000000
    Freq=D
    ForecastLength=30
    Xvars=[]
    Differenced=0
    TestLength=30
    ValidationLength=15
    ValidationMetric=rmse
    ForecastsEvaluated=['hwes', 'hwes_seasonal', 'arima_ma_terms', 'mlr', 'elasticnet', 'gbt', 'knn', 'lightgbm', 'mlp', 'rf', 'svr', 'xgboost']
    CILevel=0.95
    BootstrapSamples=100
)
[64]:
f.set_estimator('prophet')
f.manual_forecast()
[65]:
f.set_estimator('silverkite')
f.manual_forecast()
f.save_summary_stats()

Note, summary stats are available for silverkite but not Prophet – contributions are welcome to get summary stats for Prophet.

After forecasting with Silverkite, we need to reset some matplotlib parameters for plotting to work.

[66]:
matplotlib.use('nbAgg')
%matplotlib inline
sns.set(rc={'figure.figsize':plot_dim})

Let’s see how these 2 new models performed compared with the top-2 models identified before.

[67]:
f.plot_test_set(models=['hwes_seasonal','mlr','prophet','silverkite'],order_by='LevelTestSetRMSE',level=True)
../../_images/Forecaster_examples_eCommerce_130_0.png

And their trends into the future:

[68]:
f.plot(models=['hwes_seasonal','mlr','prophet','silverkite'],order_by='LevelTestSetRMSE',print_attr=['LevelTestSetRMSE'],level=True)
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
mlr LevelTestSetRMSE: 32487.009054880975
prophet LevelTestSetRMSE: 34148.79734132142
silverkite LevelTestSetRMSE: 44999.94290460528
../../_images/Forecaster_examples_eCommerce_132_1.png

Neither the prophet or silverkite models proved to be as accurate as the MLR or HWES models.

TensorFlow Recurrent Neural Nets

Back to top

The most advanced models available in scalecast are recurrent neural networks. There are two kinds: the SimpleRNN and the Long Short-Term Memory. The SimpleRNN, perhaps ironically, has the more complicated implementation in this framework. It allows for nearly all customization of the model that TensorFlow itselt would offer. What scalecast will do for you is preprocess the data correctly, scale all of the data when training the model, and unscale it appropriately to compare to any other models you run. It also tests the model on the same test set and bootsraps confidence intervals. The fact that it tests all the models also means it fits all the models twice, which can slow you down sometimes, but having two fits is also a blessing: it really lets you know how generalizable your model is when new data is added. Let’s explore the SimpleRNN more carefully.

SimpleRNN

Back to top

See the documentation for the ‘rnn’ estimator. In practice, calling this model looks like:

[70]:
f.set_estimator('rnn')
f.manual_forecast(lags=30,
                  hidden_layers_struct={'simple':{'units':64},'simple':{'units':64},'simple':{'units':64}},
                  validation_split=0.2,
                  epochs=15,
                  plot_loss=True,
                  call_me='simple_rnn')

f.manual_forecast(lags=30,
                  hidden_layers_struct={'lstm':{'units':64},'lstm':{'units':64},'lstm':{'units':64}},
                  validation_split=0.2,
                  epochs=15,
                  plot_loss=True,
                  call_me='lstm')
Epoch 1/15
7/7 [==============================] - 1s 48ms/step - loss: 0.2074 - val_loss: 0.1279
Epoch 2/15
7/7 [==============================] - 0s 9ms/step - loss: 0.1611 - val_loss: 0.1003
Epoch 3/15
7/7 [==============================] - 0s 10ms/step - loss: 0.1394 - val_loss: 0.0871
Epoch 4/15
7/7 [==============================] - 0s 10ms/step - loss: 0.1268 - val_loss: 0.0948
Epoch 5/15
7/7 [==============================] - 0s 10ms/step - loss: 0.1233 - val_loss: 0.0767
Epoch 6/15
7/7 [==============================] - 0s 10ms/step - loss: 0.1159 - val_loss: 0.0758
Epoch 7/15
7/7 [==============================] - 0s 10ms/step - loss: 0.1113 - val_loss: 0.0714
Epoch 8/15
7/7 [==============================] - 0s 11ms/step - loss: 0.1069 - val_loss: 0.0706
Epoch 9/15
7/7 [==============================] - 0s 9ms/step - loss: 0.1025 - val_loss: 0.0682
Epoch 10/15
7/7 [==============================] - 0s 10ms/step - loss: 0.1006 - val_loss: 0.0678
Epoch 11/15
7/7 [==============================] - 0s 10ms/step - loss: 0.0969 - val_loss: 0.0675
Epoch 12/15
7/7 [==============================] - 0s 11ms/step - loss: 0.0941 - val_loss: 0.0625
Epoch 13/15
7/7 [==============================] - 0s 10ms/step - loss: 0.0920 - val_loss: 0.0626
Epoch 14/15
7/7 [==============================] - 0s 10ms/step - loss: 0.0913 - val_loss: 0.0610
Epoch 15/15
7/7 [==============================] - 0s 10ms/step - loss: 0.0899 - val_loss: 0.0623
Epoch 1/15
7/7 [==============================] - 1s 47ms/step - loss: 0.1185 - val_loss: 0.0669
Epoch 2/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0907 - val_loss: 0.0555
Epoch 3/15
7/7 [==============================] - 0s 9ms/step - loss: 0.0793 - val_loss: 0.0550
Epoch 4/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0729 - val_loss: 0.0478
Epoch 5/15
7/7 [==============================] - 0s 9ms/step - loss: 0.0691 - val_loss: 0.0444
Epoch 6/15
7/7 [==============================] - 0s 9ms/step - loss: 0.0665 - val_loss: 0.0432
Epoch 7/15
7/7 [==============================] - 0s 9ms/step - loss: 0.0646 - val_loss: 0.0423
Epoch 8/15
7/7 [==============================] - 0s 9ms/step - loss: 0.0627 - val_loss: 0.0460
Epoch 9/15
7/7 [==============================] - 0s 10ms/step - loss: 0.0616 - val_loss: 0.0410
Epoch 10/15
7/7 [==============================] - 0s 10ms/step - loss: 0.0589 - val_loss: 0.0408
Epoch 11/15
7/7 [==============================] - 0s 9ms/step - loss: 0.0584 - val_loss: 0.0393
Epoch 12/15
7/7 [==============================] - 0s 9ms/step - loss: 0.0569 - val_loss: 0.0394
Epoch 13/15
7/7 [==============================] - 0s 9ms/step - loss: 0.0566 - val_loss: 0.0397
Epoch 14/15
7/7 [==============================] - 0s 8ms/step - loss: 0.0569 - val_loss: 0.0388
Epoch 15/15
7/7 [==============================] - 0s 10ms/step - loss: 0.0555 - val_loss: 0.0360
../../_images/Forecaster_examples_eCommerce_136_1.png
Epoch 1/15
7/7 [==============================] - 2s 93ms/step - loss: 0.1984 - val_loss: 0.1344
Epoch 2/15
7/7 [==============================] - 0s 15ms/step - loss: 0.1579 - val_loss: 0.0984
Epoch 3/15
7/7 [==============================] - 0s 16ms/step - loss: 0.1302 - val_loss: 0.0804
Epoch 4/15
7/7 [==============================] - 0s 17ms/step - loss: 0.1202 - val_loss: 0.0798
Epoch 5/15
7/7 [==============================] - 0s 17ms/step - loss: 0.1181 - val_loss: 0.0754
Epoch 6/15
7/7 [==============================] - 0s 17ms/step - loss: 0.1172 - val_loss: 0.0765
Epoch 7/15
7/7 [==============================] - 0s 17ms/step - loss: 0.1169 - val_loss: 0.0744
Epoch 8/15
7/7 [==============================] - 0s 18ms/step - loss: 0.1170 - val_loss: 0.0748
Epoch 9/15
7/7 [==============================] - 0s 18ms/step - loss: 0.1167 - val_loss: 0.0743
Epoch 10/15
7/7 [==============================] - 0s 17ms/step - loss: 0.1168 - val_loss: 0.0742
Epoch 11/15
7/7 [==============================] - 0s 18ms/step - loss: 0.1162 - val_loss: 0.0737
Epoch 12/15
7/7 [==============================] - 0s 18ms/step - loss: 0.1161 - val_loss: 0.0737
Epoch 13/15
7/7 [==============================] - 0s 21ms/step - loss: 0.1158 - val_loss: 0.0734
Epoch 14/15
7/7 [==============================] - 0s 21ms/step - loss: 0.1158 - val_loss: 0.0736
Epoch 15/15
7/7 [==============================] - 0s 21ms/step - loss: 0.1158 - val_loss: 0.0739
Epoch 1/15
7/7 [==============================] - 2s 93ms/step - loss: 0.1203 - val_loss: 0.0723
Epoch 2/15
7/7 [==============================] - 0s 14ms/step - loss: 0.0969 - val_loss: 0.0553
Epoch 3/15
7/7 [==============================] - 0s 15ms/step - loss: 0.0807 - val_loss: 0.0464
Epoch 4/15
7/7 [==============================] - 0s 16ms/step - loss: 0.0737 - val_loss: 0.0456
Epoch 5/15
7/7 [==============================] - 0s 16ms/step - loss: 0.0719 - val_loss: 0.0453
Epoch 6/15
7/7 [==============================] - 0s 15ms/step - loss: 0.0713 - val_loss: 0.0437
Epoch 7/15
7/7 [==============================] - 0s 17ms/step - loss: 0.0711 - val_loss: 0.0440
Epoch 8/15
7/7 [==============================] - 0s 16ms/step - loss: 0.0706 - val_loss: 0.0435
Epoch 9/15
7/7 [==============================] - 0s 16ms/step - loss: 0.0704 - val_loss: 0.0432
Epoch 10/15
7/7 [==============================] - 0s 17ms/step - loss: 0.0702 - val_loss: 0.0429
Epoch 11/15
7/7 [==============================] - 0s 18ms/step - loss: 0.0699 - val_loss: 0.0422
Epoch 12/15
7/7 [==============================] - 0s 19ms/step - loss: 0.0697 - val_loss: 0.0431
Epoch 13/15
7/7 [==============================] - 0s 17ms/step - loss: 0.0695 - val_loss: 0.0423
Epoch 14/15
7/7 [==============================] - 0s 17ms/step - loss: 0.0694 - val_loss: 0.0429
Epoch 15/15
7/7 [==============================] - 0s 18ms/step - loss: 0.0692 - val_loss: 0.0425
../../_images/Forecaster_examples_eCommerce_136_3.png

Anything the lstm can do in scalecast, the rnn can also do. Using the lstm estimator instead of rnn is simpler, and that is explored below.

LSTM

Back to top

See the documentation for the ‘lstm’ estimator. There are more parameters to explicitly specify, but that makes it a lot easier to navigate as well. In practice, calling this model looks like:

[72]:
f.set_estimator('lstm')
f.manual_forecast(lags=30,
                  lstm_layer_sizes=(64,64,64),
                  dropout=(0.2,0,0),
                  validation_split=0.2,
                  epochs=15,
                  plot_loss=True,
                  call_me='lstm_regularized')
Epoch 1/15
7/7 [==============================] - 7s 294ms/step - loss: 0.1886 - val_loss: 0.1009
Epoch 2/15
7/7 [==============================] - 0s 43ms/step - loss: 0.1343 - val_loss: 0.0823
Epoch 3/15
7/7 [==============================] - 0s 46ms/step - loss: 0.1209 - val_loss: 0.0857
Epoch 4/15
7/7 [==============================] - 0s 48ms/step - loss: 0.1197 - val_loss: 0.0776
Epoch 5/15
7/7 [==============================] - 0s 49ms/step - loss: 0.1196 - val_loss: 0.0800
Epoch 6/15
7/7 [==============================] - 0s 49ms/step - loss: 0.1174 - val_loss: 0.0752
Epoch 7/15
7/7 [==============================] - 0s 49ms/step - loss: 0.1179 - val_loss: 0.0785
Epoch 8/15
7/7 [==============================] - 0s 49ms/step - loss: 0.1174 - val_loss: 0.0745
Epoch 9/15
7/7 [==============================] - 0s 48ms/step - loss: 0.1177 - val_loss: 0.0763
Epoch 10/15
7/7 [==============================] - 0s 49ms/step - loss: 0.1168 - val_loss: 0.0767
Epoch 11/15
7/7 [==============================] - 0s 48ms/step - loss: 0.1176 - val_loss: 0.0743
Epoch 12/15
7/7 [==============================] - 0s 49ms/step - loss: 0.1177 - val_loss: 0.0754
Epoch 13/15
7/7 [==============================] - 0s 49ms/step - loss: 0.1173 - val_loss: 0.0744
Epoch 14/15
7/7 [==============================] - 0s 48ms/step - loss: 0.1177 - val_loss: 0.0757
Epoch 15/15
7/7 [==============================] - 0s 47ms/step - loss: 0.1166 - val_loss: 0.0745
Epoch 1/15
7/7 [==============================] - 5s 238ms/step - loss: 0.1132 - val_loss: 0.0617
Epoch 2/15
7/7 [==============================] - 0s 37ms/step - loss: 0.0845 - val_loss: 0.0461
Epoch 3/15
7/7 [==============================] - 0s 39ms/step - loss: 0.0749 - val_loss: 0.0514
Epoch 4/15
7/7 [==============================] - 0s 41ms/step - loss: 0.0722 - val_loss: 0.0436
Epoch 5/15
7/7 [==============================] - 0s 43ms/step - loss: 0.0725 - val_loss: 0.0431
Epoch 6/15
7/7 [==============================] - 0s 42ms/step - loss: 0.0711 - val_loss: 0.0447
Epoch 7/15
7/7 [==============================] - 0s 42ms/step - loss: 0.0708 - val_loss: 0.0429
Epoch 8/15
7/7 [==============================] - 0s 44ms/step - loss: 0.0706 - val_loss: 0.0431
Epoch 9/15
7/7 [==============================] - 0s 42ms/step - loss: 0.0708 - val_loss: 0.0428
Epoch 10/15
7/7 [==============================] - 0s 43ms/step - loss: 0.0708 - val_loss: 0.0453
Epoch 11/15
7/7 [==============================] - 0s 42ms/step - loss: 0.0707 - val_loss: 0.0440
Epoch 12/15
7/7 [==============================] - 0s 43ms/step - loss: 0.0702 - val_loss: 0.0427
Epoch 13/15
7/7 [==============================] - 0s 43ms/step - loss: 0.0699 - val_loss: 0.0438
Epoch 14/15
7/7 [==============================] - 0s 43ms/step - loss: 0.0706 - val_loss: 0.0434
Epoch 15/15
7/7 [==============================] - 0s 43ms/step - loss: 0.0708 - val_loss: 0.0427
../../_images/Forecaster_examples_eCommerce_139_1.png

Finally, we can see how all these models performed on the test set:

[73]:
f.plot_test_set(models=['simple_rnn','lstm','lstm_regularized'],ci=True,order_by='LevelTestSetRMSE')
../../_images/Forecaster_examples_eCommerce_141_0.png

And into the future:

[74]:
f.plot(models=['simple_rnn','lstm','lstm_regularized'],ci=True,order_by='LevelTestSetRMSE',level=True,print_attr=['LevelTestSetRMSE'])
simple_rnn LevelTestSetRMSE: 40249.15563421383
lstm LevelTestSetRMSE: 40332.36818693822
lstm_regularized LevelTestSetRMSE: 43743.325532266164
../../_images/Forecaster_examples_eCommerce_143_1.png

None of these models arrived at the level of the MLR or HWES models. That’s not to say they couldn’t if more time weren’t spent with them. Theoretically, they can specify thousands of parameters and find the true long-term trends in the data that none of the other models would be capable. For some datasets, this makes a big difference. But some datasets are too random to be fit effectively by one of these more advanced models. Again, sometimes the simplest is most accurate, fastest, and most interpretable.

Combination Modeling

The last model concept we will explore is combination modeling. There are three types of combinations: - simple: a simple average of a group of models. - weighted: a weighted average of a group of models. Weights can be passed manually (as we do below) or set automatically based on a metric passed to the determine_best_by parameter. - splice: a splice of two or more models at one or more future splice points. All metrics, fitted values, and test-set metrics from this model will be identical to the simple average.

We will be using simple and weighted average combination modeling only.

[75]:
f.set_estimator('combo')
f.manual_forecast(how='simple',models=['hwes','hwes_seasonal','arima_ma_terms','prophet','silverkite','simple_rnn','lstm','lstm_regularized'],call_me='avg_lvl_models')
f.manual_forecast(how='weighted',models=_sklearn_estimators_,determine_best_by='ValidationMetricValue',call_me='weighted_differenced_models')

It could be argued that combination modeling is prone to problems related to data leakage–we shouldn’t select model combinations based based on their performance on the test set and then recompare them to the test set. By default, deterimine_best_by in both of these combination types is 'ValidationMetricValue'. This is a way to ensure data leakage does not occur. Since we didn’t tune all the models we want to combine in the first function, we simply write out the models we want to combine in a list.

The weighted average model can accept weights as arguments that add to 0 or not–if not, they will be rebalanced to do so. If determine_best_by is specified, weights can be left to None and the weights will be chosen automatically. Again, be careful to not overfit models this way.

All information available for other models, including fitted values and test-set metrics, are also available for these combo models.

[76]:
f.plot_test_set(models='top_5',order_by='LevelTestSetRMSE',level=True)
../../_images/Forecaster_examples_eCommerce_148_0.png

Again, we see the usual suspects. One more time on future data:

[77]:
f.plot(models='top_5',order_by='LevelTestSetRMSE',print_attr=['LevelTestSetRMSE'],level=True)
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
mlr LevelTestSetRMSE: 32487.009054880975
prophet LevelTestSetRMSE: 34148.79734132142
rf LevelTestSetRMSE: 35831.41728470146
hwes LevelTestSetRMSE: 36031.56799119545
../../_images/Forecaster_examples_eCommerce_150_1.png

And just as a grand finale, let’s see all evaluated models plotted together:

[78]:
f.plot(order_by='LevelTestSetRMSE',print_attr=['LevelTestSetRMSE'],level=True)
hwes_seasonal LevelTestSetRMSE: 29432.19251120192
mlr LevelTestSetRMSE: 32487.009054880975
prophet LevelTestSetRMSE: 34148.79734132142
rf LevelTestSetRMSE: 35831.41728470146
hwes LevelTestSetRMSE: 36031.56799119545
lightgbm LevelTestSetRMSE: 36585.121542121844
avg_lvl_models LevelTestSetRMSE: 37706.15468474031
mlp LevelTestSetRMSE: 39120.33092263862
elasticnet LevelTestSetRMSE: 39522.6929766469
simple_rnn LevelTestSetRMSE: 40249.15563421383
lstm LevelTestSetRMSE: 40332.36818693822
lstm_regularized LevelTestSetRMSE: 43743.325532266164
svr LevelTestSetRMSE: 44003.8044654691
silverkite LevelTestSetRMSE: 44999.94290460528
arima_ma_terms LevelTestSetRMSE: 46563.90435412486
gbt LevelTestSetRMSE: 57599.607371822385
weighted_differenced_models LevelTestSetRMSE: 60439.097276978595
xgboost LevelTestSetRMSE: 62488.67436538539
knn LevelTestSetRMSE: 92000.73638283063
../../_images/Forecaster_examples_eCommerce_152_1.png

Export Results

Now that we have models that offer interesting ranges of predictions (and a few that look pretty bad), let’s export all available results to Excel to view later. The first function below exports a workbook with five tabs that offer information including forecasted values, level forecasted values, descriptive information about each model, and test-set prediction. The second two functions export feature info and validation grids, and they have to_excel in the method name because they automatically put each model’s info on a seperate tab.
[79]:
f.export(to_excel=True,determine_best_by='LevelTestSetRMSE',excel_name='eCommerce_results.xlsx')
f.all_feature_info_to_excel(excel_name='eCommerce_feature_info.xlsx')
f.all_validation_grids_to_excel(sort_by_metric_value=True,excel_name='eCommerce_validation_grids.xlsx')

Here is a list of all export functions. Most of these output results to a pandas dataframe.

[80]:
print(*f.get_funcs('exporter'),sep='\n')
export
export_summary_stats
export_feature_importance
export_validation_grid
all_feature_info_to_excel
all_validation_grids_to_excel
export_Xvars_df
export_forecasts_with_cis
export_test_set_preds_with_cis
export_fitted_vals

Thank you for following along!

[ ]: