Forecaster Class

This is the main object that is utilized for making predictions on the test set, making forecasts, evaluating models, data differencing, adding regressors, and saving, visualizing, and exporting results.

from scalecast.Forecaster import Forecaster
array_of_dates = ['2021-01-01','2021-01-02','2021-01-03']
array_of_values = [1,2,3]
f = Forecaster(y=array_of_values, current_dates=array_of_dates)
class src.scalecast.Forecaster.Forecaster(y, current_dates, **kwargs)

Bases: object

Methods:

add_AR_terms(N)

adds seasonal auto-regressive terms.

add_ar_terms(n)

adds auto-regressive terms.

add_combo_regressors(*args[, sep])

combines all passed variables by multiplying their values together.

add_covid19_regressor([called, start, end])

adds dummy variable that is 1 during the time period that covid19 effects are present for the series, 0 otherwise.

add_diffed_terms(*args[, diff, sep, drop])

differences all passed variables (no AR terms) up to 2 times.

add_exp_terms(*args, pwr[, sep, cutoff, drop])

raises all passed variables (no AR terms) to exponential powers (ints or floats).

add_lagged_terms(*args[, lags, upto, sep])

lags all passed variables (no AR terms) 1 or more times.

add_logged_terms(*args[, base, sep, drop])

logs all passed variables (no AR terms).

add_other_regressor(called, start, end)

adds dummy variable that is 1 during the specified time period, 0 otherwise.

add_poly_terms(*args[, pwr, sep])

raises all passed variables (no AR terms) to exponential powers (ints only).

add_pt_terms(*args[, method, sep, drop])

applies a box-cox or yeo-johnson power transformation to all passed variables (no AR terms).

add_seasonal_regressors(*args[, raw, …])

adds seasonal regressors.

add_time_trend([called])

adds a time trend from 1 to len(current_dates) + len(future_dates) in current_xreg and future_xreg.

adf_test([critical_pval, quiet, full_res, …])

tests the stationarity of the y series using augmented dickey fuller.

all_feature_info_to_excel([out_path, excel_name])

saves all feature importance and summary stats to excel.

all_validation_grids_to_excel([out_path, …])

saves all validation grids to excel.

auto_forecast([call_me, dynamic_testing])

auto forecasts with the best parameters indicated from the tuning process.

diff([i])

differences the y attribute, as well as all AR values stored in current_xreg and future_xreg.

drop_Xvars(*args)

drops regressors.

drop_regressors(*args)

drops regressors.

export([dfs, models, best_model, …])

exports 1-all of 5 pandas dataframes, can write to excel with each dataframe on a separate sheet.

export_Xvars_df([dropna])

gets all utilized regressors and values.

export_feature_importance(model)

exports the feature importance from a model.

export_fitted_vals(model)

exports a single dataframe with dates, fitted values, actuals, and residuals.

export_forecasts_with_cis(model)

exports a single dataframe with forecasts and upper and lower forecast bounds.

export_summary_stats(model)

exports the summary stats from a model.

export_test_set_preds_with_cis(model)

exports a single dataframe with test-set predictions, actuals, and upper and lower prediction bounds.

export_validation_grid(model)

exports the validation from a model.

fillna_y([how])

fills null values in the y attribute.

generate_future_dates(n)

generates a certain amount of future dates in same frequency as current_dates.

get_freq()

gets the pandas inferred date frequency

get_funcs(which)

returns a group of functions based on what’s passed to which

get_regressor_names()

gets the regressor names stored in the object.

infer_freq()

uses pandas library to infer frequency of loaded dates.

ingest_Xvars_df(df[, date_col, drop_first, …])

ingests a dataframe of regressors and saves its contents to the Forecaster object.

ingest_grid(grid)

ingests a grid to tune the estimator.

integrate([critical_pval, train_only, …])

differences the series 0, 1, or 2 times based on ADF test results.

keep_smaller_history(n)

cuts the amount of y observations in the object.

limit_grid_size(n[, random_seed])

makes a grid smaller randomly.

manual_forecast([call_me, dynamic_testing])

manually forecasts with the hyperparameters, Xvars, and normalizer selection passed as keywords.

order_fcsts(models[, determine_best_by])

gets estimated forecasts ordered from best-to-worst.

plot([models, order_by, level, print_attr, ci])

plots all forecasts with the actuals, or just actuals if no forecasts have been evaluated or are selected.

plot_acf([diffy, train_only])

plots an autocorrelation function of the y values.

plot_fitted([models, order_by])

plots all fitted values with the actuals.

plot_pacf([diffy, train_only])

plots a partial autocorrelation function of the y values

plot_periodogram([diffy, train_only])

plots a periodogram of the y values (comes from scipy.signal).

plot_test_set([models, order_by, …])

plots all test-set predictions with the actuals.

pop(*args)

deletes evaluated forecasts from the object’s memory.

pop_using_criterion(metric, evaluated_as, …)

deletes all forecasts from history that meet a given criterion.

reset()

drops all regressors and reverts object to original (level) state when initiated.

save_feature_importance()

saves feature info for models that offer it and will not raise errors if not available.

save_summary_stats()

saves summary stats for models that offer it and will not raise errors if not available.

seasonal_decompose([diffy, train_only])

plots a signal/seasonal decomposition of the y values.

set_bootstrap_samples(n)

sets the number of bootstrap samples to set confidence intervals for each model (100 default).

set_cilevel(n)

sets the level for the resulting confidence intervals (95% default).

set_estimator(estimator)

sets the estimator to forecast with.

set_last_future_date(date)

generates future dates in the same frequency as current_dates that ends on a specified date.

set_test_length([n])

sets the length of the test set.

set_validation_length([n])

sets the length of the validation set.

set_validation_metric([metric])

sets the metric that will be used to tune all subsequent models.

tune([dynamic_tuning])

tunes the specified estimator using an ingested grid (ingests a grid from Grids.py with same name as the estimator by default).

tune_test_forecast(models[, dynamic_tuning, …])

iterates through a list of models, tunes them using grids in Grids.py, forecasts them, and can save feature information.

typ_set()

converts all objects in y, current_dates, future_dates, current_xreg, and future_xreg to appropriate types if possible.

undiff([suppress_error])

undifferences y to original level and drops all regressors (such as AR terms).

validate_regressor_names()

validates that all regressor names exist in both current_xregs and future_xregs.

add_AR_terms(N)

adds seasonal auto-regressive terms.

Parameters

N (tuple) – first element is the number of terms to add and the second element is the space between terms.

Returns

None

>>> f.add_AR_terms((2,12)) # adds 12th and 24th lags
add_ar_terms(n)

adds auto-regressive terms.

Parameters

n (int) – the number of terms to add (1 to this number will be added).

Returns

None

>>> f.add_ar_terms(4) # adds four lags of y to predict with
add_combo_regressors(*args, sep='_')

combines all passed variables by multiplying their values together.

Parameters
  • *args (str) – names of Xvars that aleady exist in the object.

  • sep (str) – default ‘_’. the separator between each term in arg to create the final variable name.

Returns

None

>>> f.add_combo_regressors('t','monthsin') # multiplies these two together
>>> f.add_combo_regressors('t','monthcos') # multiplies these two together
add_covid19_regressor(called='COVID19', start=datetime.datetime(2020, 3, 15, 0, 0), end=datetime.datetime(2021, 5, 13, 0, 0))

adds dummy variable that is 1 during the time period that covid19 effects are present for the series, 0 otherwise. this function may be out of date as the pandemic has lasted longer than most expected, but we are keeping it for now.

Parameters
  • called (str) – default ‘COVID19’. what to call the resulting variable.

  • start (str, datetime.datetime, or pd.Timestamp) – default datetime.datetime(2020,3,15). the start date (default is day Walt Disney World closed in the U.S.). use format ‘%Y-%m-%d’ when passing strings.

  • end – (str, datetime.datetime, or pd.Timestamp): default datetime.datetime(2021,5,13). the end date (default is day the U.S. CDC dropped mask mandate/recommendation for vaccinated people). use format ‘%Y-%m-%d’ when passing strings.

Returns

None

add_diffed_terms(*args, diff=1, sep='_', drop=False)

differences all passed variables (no AR terms) up to 2 times.

Parameters
  • *args (str) – names of Xvars that aleady exist in the object.

  • diff (int) – one of {1,2}, default 1. the number of times to difference each variable passed to args.

  • sep (str) – default ‘_’. the separator between each term in arg to create the final variable name. resulting variable names will be like “tdiff_1” or “tdiff_2” by default.

  • drop (bool) – default False. whether to drop the regressors passed to *args.

Returns

None

>>> add_diffed_terms('t') # adds first difference of t as regressor
add_exp_terms(*args, pwr, sep='^', cutoff=2, drop=False)

raises all passed variables (no AR terms) to exponential powers (ints or floats).

Parameters
  • *args (str) – names of Xvars that aleady exist in the object.

  • pwr (float) – the power to raise each term to in args. can use values like 0.5 to perform square roots, etc.

  • sep (str) – default ‘^’. the separator between each term in arg to create the final variable name.

  • cutoff (int) – default 2. the resulting variable name will be rounded to this number based on the passed pwr. for instance, if pwr = 0.33333333333 and ‘t’ is passed as an arg to *args, the resulting name will be t^0.33 by default.

  • drop (bool) – default False. whether to drop the regressors passed to *args.

Returns

None

>>> f.add_exp_terms('t',pwr=.5) # adds square root t
add_lagged_terms(*args, lags=1, upto=True, sep='_')

lags all passed variables (no AR terms) 1 or more times.

Parameters
  • *args (str) – names of Xvars that aleady exist in the object.

  • lags (int) – greater than 0, default 1. the number of times to lag each passed variable.

  • upto (bool) – default True. whether to add all lags up to the number passed to lags. if you pass 6 to lags and upto is True, lags 1, 2, 3, 4, 5, 6 will all be added. if you pass 6 to lags and upto is False, lag 6 only will be added.

  • sep (str) – default ‘_’. the separator between each term in arg to create the final variable name. resulting variable names will be like “tlag_1” or “tlag_2” by default.

Returns

None

>>> add_lagged_terms('t',lags=3) # adds first, second, and third lag of t
>>> add_lagged_terms('t',lags=6,upto=False) # adds 6th lag of t only
add_logged_terms(*args, base=2.718281828459045, sep='', drop=False)

logs all passed variables (no AR terms).

Parameters
  • *args (str) – names of Xvars that aleady exist in the object.

  • base (float) – default math.e. the log base. must be math.e or int greater than 1.

  • sep (str) – default ‘’. the separator between each term in arg to create the final variable name. resulting variable names will be like “log2t” or “lnt” by default

  • drop (bool) – default False. whether to drop the regressors passed to *args.

Returns

None

>>> f.add_logged_terms('t') # adds natural log t
add_other_regressor(called, start, end)

adds dummy variable that is 1 during the specified time period, 0 otherwise.

Parameters
  • called (str) – what to call the resulting variable.

  • start (str, datetime.datetime, or pd.Timestamp) – start date. use format ‘%Y-%m-%d’ when passing strings.

  • end (str, datetime.datetime, or pd.Timestamp) – end date. use format ‘%Y-%m-%d’ when passing strings.

Returns

None

>>> f.add_other_regressor('january_2021','2021-01-01','2021-01-31')
add_poly_terms(*args, pwr=2, sep='^')

raises all passed variables (no AR terms) to exponential powers (ints only).

Parameters
  • *args (str) – names of Xvars that aleady exist in the object

  • pwr (int) – default 2. the max power to add to each term in args (2 to this number will be added).

  • sep (str) – default ‘^’. the separator between each term in arg to create the final variable name.

Returns

None

>>> f.add_poly_terms('t','year',pwr=3) ### raises t and year to 2nd and 3rd powers
add_pt_terms(*args, method='box-cox', sep='_', drop=False)

applies a box-cox or yeo-johnson power transformation to all passed variables (no AR terms).

Parameters
  • *args (str) – names of Xvars that aleady exist in the object

  • method (str) – one of {‘box-cox’,’yeo-johnson’}, default ‘box-cox’. the type of transformation. box-cox works for positive values only. yeo-johnson is like a box-cox but can be used with 0s or negatives. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html.

  • sep (str) – default ‘’. the separator between each term in arg to create the final variable name. resulting variable names will be like “box-cox_t” or “yeo-johnson_t” by default.

  • drop (bool) – default False. whether to drop the regressors passed to *args.

Returns

None

>>> f.add_pt_terms('t') # adds box cox of t
add_seasonal_regressors(*args, raw=True, sincos=False, dummy=False, drop_first=False)

adds seasonal regressors.

Parameters
  • *args – each of str type. values that return a series of int type from pandas.dt and pandas.dt.isocalendar(). see https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.year.html.

  • raw (bool) – default True. whether to use the raw integer values

  • sincos (bool) – default False. whether to use a sin/cos transformation of the raw integer values (estimates the cycle based on the max observed value)

  • dummy (bool) – default False. whether to use dummy variables from the raw int values

  • drop_first (bool) – default False. whether to drop the first observed dummy level. not relevant when dummy = False

Returns

None

>>> f.add_seasonal_regressors('year')
>>> f.add_seasonal_regressors('month','week','quarter',raw=False,sincos=True)
>>> f.add_seasonal_regressors('dayofweek',raw=False,dummy=True,drop_first=True)
add_time_trend(called='t')

adds a time trend from 1 to len(current_dates) + len(future_dates) in current_xreg and future_xreg.

Parameters

called (str) – default ‘t’. what to call the resulting variable

Returns

None

>>> f.add_time_trend()
adf_test(critical_pval=0.05, quiet=True, full_res=False, train_only=False, **kwargs)Union[tuple, bool]

tests the stationarity of the y series using augmented dickey fuller.

Parameters
  • critical_pval (float) – default 0.05. the p-value threshold in the statistical test to accept the alternative hypothesis.

  • quiet (bool) – default True. if False, prints whether the tests suggests stationary or non-stationary data.

  • full_res (bool) – default False. if True, returns a dictionary with the pvalue, evaluated statistic, and other statistical information (returns what the adfuller() function from statsmodels does). if False, returns a bool that matches whether the test indicates stationarity.

  • train_only (bool) – default False. if True, will exclude the test set from the test (to avoid leakage).

  • **kwargs – passed to adfuller() function from statsmodels.

Returns

if bool (full_res = False), returns whether the test suggests stationarity.

otherwise, returns the full results (stat, pval, etc.) of the test.

Return type

(bool or tuple)

>>> stat, pval, _, _, _, _ = f.adf_test(full_res=True)
all_feature_info_to_excel(out_path='./', excel_name='feature_info.xlsx')

saves all feature importance and summary stats to excel. each model where such info is available for gets its own tab. be sure to have called save_summary_stats() and/or save_feature_importance() before using this function.

Parameters
  • out_path (str) – default ‘./’ the path to export to

  • excel_name (str) – default ‘feature_info.xlsx’ the name of the resulting excel file

Returns

None

all_validation_grids_to_excel(out_path='./', excel_name='validation_grids.xlsx', sort_by_metric_value=False, ascending=True)

saves all validation grids to excel. each model where such info is available for gets its own tab. be sure to have tuned at least model before calling this.

Parameters
  • out_path (str) – default ‘./’. the path to export to.

  • excel_name (str) – default ‘feature_info.xlsx’. the name of the resulting excel file.

  • sort_by_metric_value (bool) – default False. whether to sort the output by performance on validation set

  • ascending (bool) – default True. whether to sort least-to-greatest. ignored if sort_by_metric_value is False.

Returns

None

auto_forecast(call_me=None, dynamic_testing=True)

auto forecasts with the best parameters indicated from the tuning process.

Parameters
  • call_me (str) – optional. what to call the model when storing it in the object’s history dictionary. if not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). duplicated names will be overwritten with the most recently called model.

  • dynamic_testing (bool) – default True. whether to dynamically test the forecast (meaning AR terms will be propogated with predicted values). setting this to False means faster performance, but gives a less-good indication of how well the forecast will perform out x amount of periods. when False, test-set metrics effectively become an average of one-step forecasts.

Returns

None

>>> f.set_estimator('xgboost')
>>> f.tune()
>>> f.auto_forecast()
diff(i=1)

differences the y attribute, as well as all AR values stored in current_xreg and future_xreg.

Parameters

i (int) – default 1. the number of differences to take. must be 1 or 2.

Returns

None

>>> f.diff(2) # differences y twice
drop_Xvars(*args)

drops regressors.

Parameters

*args (str) – the names of regressors to drop.

Returns

None

>>> f.add_time_trend()
>>> f.add_exp_terms('t',pwr=.5)
>>> f.drop_Xvars('t','t^0.5')
drop_regressors(*args)

drops regressors.

Parameters

*args (str) – the names of regressors to drop.

Returns

None

>>> f.add_time_trend()
>>> f.add_exp_terms('t',pwr=.5)
>>> f.drop_regressors('t','t^0.5')
export(dfs=['all_fcsts', 'model_summaries', 'best_fcst', 'test_set_predictions', 'lvl_test_set_predictions', 'lvl_fcsts'], models='all', best_model='auto', determine_best_by='TestSetRMSE', to_excel=False, out_path='./', excel_name='results.xlsx')Union[Dict[str, pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]

exports 1-all of 5 pandas dataframes, can write to excel with each dataframe on a separate sheet. will return either a dictionary with dataframes as values (df str arguments as keys) or a single dataframe if only one df is specified.

Parameters
  • dfs (list-like or str) – default [‘all_fcsts’,’model_summaries’,’best_fcst’,’test_set_predictions’,’lvl_fcsts’]. a list or name of the specific dataframe(s) you want returned and/or written to excel. must be one of or multiple of default.

  • models (list-like or str) – default ‘all’. the models to write information for. can start with “top_” and the metric specified in determine_best_by will be used to order the models appropriately.

  • best_model (str) – default ‘auto’. the name of the best model, if “auto”, will determine this by the metric in determine_best_by. if not “auto”, must match a model nickname of an already-evaluated model.

  • determine_best_by (str) – one of _determine_best_by_, default ‘TestSetRMSE’.

  • to_excel (bool) – default False. whether to save to excel.

  • out_path (str) – default ‘./’. the path to save the excel file to (ignored when to_excel=False).

  • excel_name (str) – default ‘results.xlsx’. the name to call the excel file (ignored when to_excel=False).

Returns

either a single pandas dataframe if one element passed to dfs or a dictionary where the keys match what was passed to dfs and the values are dataframes.

Return type

(DataFrame or Dict[str,DataFrame])

>>> f.export(dfs=['model_summaries','lvl_fcsts'],to_excel=True)
export_Xvars_df(dropna=False)

gets all utilized regressors and values.

Parameters

dropna (bool) – default False. whether to drop null values from the resulting dataframe

Returns

A dataframe of Xvars and names/values stored in the object.

Return type

(DataFrame)

export_feature_importance(model)pandas.core.frame.DataFrame

exports the feature importance from a model. raises an error if you never saved the model’s feature importance.

Parameters

model (str) – the name of them model to export for. matches what was passed to call_me when calling the forecast (default is estimator name)

Returns

The resulting feature importances of the evaluated model passed to model arg.

Return type

(DataFrame)

>>> fi = f.export_feature_importance('mlr')
export_fitted_vals(model)

exports a single dataframe with dates, fitted values, actuals, and residuals.

Parameters

model (str) – the model nickname (must exist in history.keys()).

Returns

A dataframe with dates, fitted values, actuals, and residuals.

Return type

(DataFrame)

export_forecasts_with_cis(model)

exports a single dataframe with forecasts and upper and lower forecast bounds.

Parameters

model (str) – the model nickname (must exist in history.keys()).

Returns

A dataframe with forecasts to future dates and corresponding confidence intervals.

Return type

(DataFrame)

export_summary_stats(model)pandas.core.frame.DataFrame

exports the summary stats from a model. raises an error if you never saved the model’s summary stats.

Parameters

model (str) – the name of them model to export for. matches what was passed to call_me when calling the forecast (default is estimator name)

Returns

The resulting summary stats of the evaluated model passed to model arg.

Return type

(DataFrame)

>>> ss = f.export_summary_stats('arima')
export_test_set_preds_with_cis(model)

exports a single dataframe with test-set predictions, actuals, and upper and lower prediction bounds.

Parameters

model (str) – the model nickname (must exist in history.keys()).

Returns

A dataframe with test-set predictions and actuals with corresponding confidence intervals.

Return type

(DataFrame)

export_validation_grid(model)pandas.core.frame.DataFrame
exports the validation from a model.

raises an error if you never tuned the model.

Parameters

model (str) – the name of them model to export for. matches what was passed to call_me when calling the forecast (default is estimator name)

Returns

The resulting validation grid of the evaluated model passed to model arg.

Return type

(DataFrame)

fillna_y(how='ffill')

fills null values in the y attribute.

Parameters

how (str) – one of {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, ‘midpoint’}. midpoint is unique to this library and only works if there is not more than two missing values sequentially. all other possible arguments are from pandas.DataFrame.fillna() method and will do the same.

Returns

None

generate_future_dates(n)

generates a certain amount of future dates in same frequency as current_dates.

Parameters

n (int) – greater than 0. number of future dates to produce. this will also be the forecast length.

Returns

None

>>> f.generate_future_dates(12) # 12 future dates to forecast out to
get_freq()str

gets the pandas inferred date frequency

Returns

The inferred frequency of the current_dates array.

Return type

(str)

>>> f.get_freq()
get_funcs(which)list

returns a group of functions based on what’s passed to which

Parameters

which (str) – one of {‘adder’,’exporter’,’setter’,’plotter’,’getter’}

Returns

the names of the relevant functions

Return type

(list)

>>> f.get_funcs('adder')
get_regressor_names()list

gets the regressor names stored in the object.

Parameters

None

Returns

Regressor names that have been added to the object.

Return type

(list)

>>> f.add_time_trend()
>>> f.get_regressor_names()
infer_freq()

uses pandas library to infer frequency of loaded dates.

ingest_Xvars_df(df, date_col='Date', drop_first=False, use_future_dates=False)

ingests a dataframe of regressors and saves its contents to the Forecaster object. must specify a date column. all non-numeric values will be dummied. any columns in the dataframe that begin with “AR” will be confused with autoregressive terms and could cause errors.

Parameters
  • df (DataFrame) – the dataframe that is at least the length of len(current_dates) + len(future_dates)

  • date_col (str) – default ‘Date’. the name of the date column in the dataframe. this column must have the same frequency as the dates in current_dates.

  • drop_first (bool) – default False. whether to drop the first observation of any dummied variables. irrelevant if passing all numeric values.

  • use_future_dates (bool) – default False. whether to use the future dates in the dataframe as the future_dates attribute in the object.

Returns

None

ingest_grid(grid)

ingests a grid to tune the estimator.

Parameters

grid (dict or str) – if dict, must be a user-created grid. if str, must match the name of a dict grid stored in Grids.py.

Returns

None

>>> f.set_estimator('mlr')
>>> f.ingest_grid({'normalizer':['scale','minmax']})
integrate(critical_pval=0.05, train_only=False, max_integration=2)

differences the series 0, 1, or 2 times based on ADF test results.

Parameters
  • critical_pval (float) – default 0.05. the p-value threshold in the statistical test to accept the alternative hypothesis.

  • train_only (bool) – default False. if True, will exclude the test set from the ADF test (to avoid leakage).

  • max_integration (int) – one of {1,2}, default 2. if 1, will only difference data up to one time even if the results of the test indicate two integrations. if 2, behaves how you would expect.

Returns

None

>>> f.integrate(max_integration=1) # differences y only once if it is not stationarity
>>> f.integrate() # differences y up to twice it is not stationarity and if its first difference is not stationary
keep_smaller_history(n)

cuts the amount of y observations in the object.

Parameters

n (int, str, or datetime.datetime) – if int, the number of observations to keep. otherwise, the last observation to keep. if str, must be ‘%Y-%m-%d’ format.

Returns

None

>>> f.keep_smaller_history(500) # keeps last 500 observations
>>> f.keep_smaller_history('2020-01-01') # keeps only observations on or later than 1/1/2020
limit_grid_size(n, random_seed=None)

makes a grid smaller randomly.

Parameters
  • n (int or float) – if int, randomly selects that many parameter combinations. if float, must be less than 1 and greater 0, randomly selects that percentage of parameter combinations.

  • random_seed (int) – optional. set a seed to make results consistent.

Returns

None

>>> from scalecast import GridGenerator
>>> GridGenerator.get_example_grids()
>>> f.set_estimator('mlp')
>>> f.ingest_grid('mlp')
>>> f.limit_grid_size(10,random_seed=20) # limits grid to 10 iterations
>>> f.limit_grid_size(.5,random_seed=20) # limits grid to half its original size
manual_forecast(call_me=None, dynamic_testing=True, **kwargs)

manually forecasts with the hyperparameters, Xvars, and normalizer selection passed as keywords.

Parameters
  • call_me (str) – optional. what to call the model when storing it in the object’s history dictionary. if not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). duplicated names will be overwritten with the most recently called model.

  • dynamic_testing (bool) – default True. whether to dynamically test the forecast (meaning AR terms will be propogated with predicted values). setting this to False means faster performance, but gives a less-good indication of how well the forecast will perform out x amount of periods. when False, test-set metrics effectively become an average of one-step forecasts.

  • **kwargs – passed to the _forecast_{estimator}() method and can include such parameters as Xvars, normalizer, cap, and floor, in addition to any given model’s specific hyperparameters for sklearn models, can inlcude normalizer and Xvars. for ARIMA, Prophet and Silverkite models, can include Xvars but not normalizer. LSTM and RNN models have their own sets of possible keywords. see https://scalecast.readthedocs.io/en/latest/Forecaster/_forecast.html.

Returns

None

>>> f.set_estimator('mlr')
>>> f.manual_forecast(normalizer='pt')
order_fcsts(models, determine_best_by='TestSetRMSE')list

gets estimated forecasts ordered from best-to-worst.

Parameters
  • models (list-like) – each element must match an evaluated model’s nickname (which is the same as its estimator name by default).

  • determine_best_by (str) – default ‘TestSetRMSE’. one of _determine_best_by_.

Returns

The ordered models.

Return type

(list)

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
>>> ordered_models = f.order_fcsts(models,"LevelTestSetMAPE")
plot(models='all', order_by=None, level=False, print_attr=[], ci=False)

plots all forecasts with the actuals, or just actuals if no forecasts have been evaluated or are selected.

Parameters
  • models (list-like, str, or None) – default ‘all’. the forecasted models to plot. can start with “top_” and the metric specified in order_by will be used to order the models appropriately. if None or models/order_by combo invalid, will plot only actual values.

  • order_by (str) – one of _determine_best_by_. optional.

  • level (bool) – default False. if True, will always plot level forecasts. if False, will plot the forecasts at whatever level they were called on. if False and there are a mix of models passed with different integrations, will default to True.

  • print_attr (list-like) – default []. attributes from history dict to print to console. if the attribute doesn’t exist for a passed model, will not raise error, will just skip that element.

  • ci (bool) – default False. whether to display the confidence intervals. change defaults by calling set_cilevel() and set_bootstrapped_samples() before forecasting. ignored when level = True.

Returns

the created figure

Return type

(Figure)

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
>>> f.plot(order_by='LevelTestSetMAPE') # plots all forecasts
>>> plt.show()
plot_acf(diffy=False, train_only=False, **kwargs)

plots an autocorrelation function of the y values.

Parameters
  • diffy (bool or int) – one of {True,False,0,1,2}. default False. whether to difference the data and how many times before passing the values to the function. if False or 0, does not difference. if True or 1, differences 1 time. if 2, differences 2 times.

  • train_only (bool) – default False. if True, will exclude the test set from the test (a measure added to avoid leakage).

  • **kwargs – passed to plot_acf() function from statsmodels.

Returns

If ax is None, the created figure. Otherwise the figure to which ax is connected.

Return type

(Figure)

>>> import matplotlib.pyplot as plt
>>> f.plot_acf(train_only=True)
>>> plt.plot()
plot_fitted(models='all', order_by=None)

plots all fitted values with the actuals.

Parameters
  • models (list-like,str) – default ‘all’. the forecated models to plot. can start with “top_” and the metric specified in order_by will be used to order the models appropriately.

  • order_by (str) – one of _determine_best_by_, default None.

Returns

the created figure

Return type

(Figure)

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
>>> f.plot_fitted(order_by='LevelTestSetMAPE') # plots all fitted values
>>> plt.show()
plot_pacf(diffy=False, train_only=False, **kwargs)

plots a partial autocorrelation function of the y values

Parameters
  • diffy (bool or int) – one of {True,False,0,1,2}. default False. whether to difference the data and how many times before passing the values to the function. if False or 0, does not difference. if True or 1, differences 1 time. if 2, differences 2 times.

  • train_only (bool) – default False. if True, will exclude the test set from the test (a measure added to avoid leakage).

  • **kwargs – passed to plot_pacf() function from statsmodels.

Returns

If ax is None, the created figure. Otherwise the figure to which ax is connected.

Return type

(Figure)

>>> import matplotlib.pyplot as plt
>>> f.plot_pacf(train_only=True)
>>> plt.plot()
plot_periodogram(diffy=False, train_only=False)

plots a periodogram of the y values (comes from scipy.signal).

Parameters
  • diffy (bool or int) – one of {True,False,0,1,2}. default False. whether to difference the data and how many times before passing the values to the function. if False or 0, does not difference. if True or 1, differences 1 time. if 2, differences 2 times.

  • train_only (bool) – default False. if True, will exclude the test set from the test (a measure added to avoid leakage).

Returns

Element 1: Array of sample frequencies. Element 2: Power spectral density or power spectrum of x.

Return type

(ndarray,ndarray)

>>> import matplotlib.pyplot as plt
>>> a, b = f.plot_periodogram(diffy=True,train_only=True)
>>> plt.semilogy(a, b)
>>> plt.show()
plot_test_set(models='all', order_by=None, include_train=True, level=False, ci=False)

plots all test-set predictions with the actuals.

Parameters
  • models (list-like or str) – default ‘all’. the forecated models to plot. can start with “top_” and the metric specified in order_by will be used to order the models appropriately.

  • order_by (str) – one of _determine_best_by_, optional.

  • include_train (bool or int) – default True. use to zoom into training resultsl if True, plots the test results with the entire history in y. if False, matches y history to test results and only plots this. if int, plots that length of y to match to test results.

  • level (bool) – default False. if True, will always plot level forecasts. if False, will plot the forecasts at whatever level they were called on. if False and there are a mix of models passed with different integrations, will default to True.

  • ci (bool) – default False. whether to display the confidence intervals. default is 100 boostrapped samples and a 95% confidence interval. change defaults by calling set_cilevel() and set_bootstrapped_samples() before forecasting. ignored when level = False.

Returns

the created figure

Return type

(Figure)

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
>>> f.plot(order_by='LevelTestSetMAPE') # plots all test-set results
>>> plt.show()
pop(*args)

deletes evaluated forecasts from the object’s memory.

Parameters
  • *args (str) – names of models matching what was passed to call_me.

  • for call_me in a given model is the same as the estimator name. (default) –

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
>>> f.pop('mlr')
pop_using_criterion(metric, evaluated_as, threshold, delete_all=True)

deletes all forecasts from history that meet a given criterion.

Parameters
  • metric (str) – one of _determine_best_by_ + [‘AnyPrediction’,’AnyLevelPrediction’].

  • evaluated_as (str) – one of {“<”,”<=”,”>”,”>=”,”==”}.

  • threshold (float) – the threshold to compare the metric and operator to.

  • delete_all (bool) – default True. if the passed criterion deletes all forecasts, whether to actually delete all forecasts. if False and all forecasts meet criterion, will keep them all.

Returns

None

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
>>> f.pop_using_criterion('LevelTestSetMAPE','>',2)
>>> f.pop_using_criterion('AnyPrediction','<',0,delete_all=False)
reset()

drops all regressors and reverts object to original (level) state when initiated.

save_feature_importance()

saves feature info for models that offer it and will not raise errors if not available. call after evaluating the model you want it for and before changing the estimator.

>>> f.set_estimator('mlr')
>>> f.manual_forecast()
>>> f.save_feature_importance()
save_summary_stats()

saves summary stats for models that offer it and will not raise errors if not available. call after evaluating the model you want it for and before changing the estimator.

>>> f.set_estimator('arima')
>>> f.manual_forecast(order=(1,1,1))
>>> f.save_summary_stats()
seasonal_decompose(diffy=False, train_only=False, **kwargs)

plots a signal/seasonal decomposition of the y values.

Parameters
  • diffy (bool or int) – one of {True,False,0,1,2}. default False. whether to difference the data and how many times before passing the values to the function. if False or 0, does not difference. if True or 1, differences 1 time. if 2, differences 2 times.

  • train_only (bool) – default False. If True, will exclude the test set from the test (a measure added to avoid leakage).

  • **kwargs – passed to seasonal_decompose() function from statsmodels.

Returns

An object with seasonal, trend, and resid attributes.

Return type

(DecomposeResult)

>>> import matplotlib.pyplot as plt
>>> f.seasonal_decompose(train_only=True).plot()
>>> plt.show()
set_bootstrap_samples(n)

sets the number of bootstrap samples to set confidence intervals for each model (100 default).

Parameters

n (int) – greater than or equal to 30. 30 because you need around there to satisfy central limit theorem. the lower this number, the faster the performance, but the less confident in the resulting intervals you should be.

Returns

None

>>> f.set_bootstrap_samples(1000) # next forecast will get confidence intervals with 1,000 bootstrap sample
set_cilevel(n)

sets the level for the resulting confidence intervals (95% default).

Parameters

n (float) – greater than 0 and less than 1.

Returns

None

>>> f.set_cilevel(.80) # next forecast will get 80% confidence intervals
set_estimator(estimator)

sets the estimator to forecast with.

Parameters

estimator (str) – one of _estimators_

Returns

None

>>> f.set_estimator('mlr')
set_last_future_date(date)

generates future dates in the same frequency as current_dates that ends on a specified date.

Parameters

date (datetime.datetime, pd.Timestamp, or str) – the date to end on. if str, must be in ‘%Y-%m-%d’ format.

Returns

None

>>> f.set_last_future_date('2021-06-01') # creates future dates up to this one in the expected frequency
set_test_length(n=1)

sets the length of the test set.

Parameters

n (int) – default 1. the length of the resulting test set.

Returns

None

>>> f.set_test_length(12) # test set of 12
set_validation_length(n=1)

sets the length of the validation set.

Parameters

n (int) – default 1. the length of the resulting validation set.

Returns

None

>>> f.set_validation_length(6) # validation length of 6
set_validation_metric(metric='rmse')

sets the metric that will be used to tune all subsequent models.

Parameters

metric – one of _metrics_, default ‘rmse’. the metric to optimize the models with using the validation set.

Returns

None

>>> f.set_validation_metric('mae')
tune(dynamic_tuning=False)

tunes the specified estimator using an ingested grid (ingests a grid from Grids.py with same name as the estimator by default). any parameters that can be passed as arguments to manual_forecast() can be tuned with this process.

Parameters

dynamic_tuning (bool) – default False. whether to dynamically tune the forecast (meaning AR terms will be propogated with predicted values). setting this to False means faster performance, but gives a less-good indication of how well the forecast will perform out x amount of periods. when False, metrics effectively become an average of one-step forecasts.

Returns

None

>>> f.set_estimator('xgboost')
>>> f.tune()
>>> f.auto_forecast()
tune_test_forecast(models, dynamic_tuning=False, dynamic_testing=True, summary_stats=False, feature_importance=False)

iterates through a list of models, tunes them using grids in Grids.py, forecasts them, and can save feature information.

Parameters
  • models (list-like) – each element must be in _can_be_tuned_.

  • dynamic_tuning (bool) – default False. whether to dynamically tune the forecast (meaning AR terms will be propogated with predicted values). setting this to False means faster performance, but gives a less-good indication of how well the forecast will perform out x amount of periods. when False, metrics effectively become an average of one-step forecasts.

  • dynamic_testing (bool) – default True. whether to dynamically test the forecast (meaning AR terms will be propogated with predicted values). setting this to False means faster performance, but gives a less-good indication of how well the forecast will perform out x amount of periods. when False, test-set metrics effectively become an average of one-step forecasts.

  • summary_stats (bool) – default False. whether to save summary stats for the models that offer those.

  • feature_importance (bool) – default False. whether to save permutation feature importance information for the models that offer those.

Returns

None

>>> models = ('mlr','mlp','lightgbm')
>>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
typ_set()

converts all objects in y, current_dates, future_dates, current_xreg, and future_xreg to appropriate types if possible. automatically gets called when object is initiated.

>>> f.typ_set() # sets all arrays to the correct format
undiff(suppress_error=False)

undifferences y to original level and drops all regressors (such as AR terms).

Parameters

suppress_error (bool) – default False. whether to suppress an error that gets raised if the series was never differenced.

Returns

None

>>> f.undiff()
validate_regressor_names()

validates that all regressor names exist in both current_xregs and future_xregs. raises an error if this is not the case.