Forecaster Class
This is the main object that is utilized for making predictions on the test set, making forecasts, evaluating models, data differencing, adding regressors, and saving, visualizing, and exporting results.
from scalecast.Forecaster import Forecaster
array_of_dates = ['2021-01-01','2021-01-02','2021-01-03']
array_of_values = [1,2,3]
f = Forecaster(y=array_of_values, current_dates=array_of_dates)
- class src.scalecast.Forecaster.Forecaster(y, current_dates, **kwargs)
Bases:
object
Methods:
add_AR_terms
(N)adds seasonal auto-regressive terms.
add_ar_terms
(n)adds auto-regressive terms.
add_combo_regressors
(*args[, sep])combines all passed variables by multiplying their values together.
add_covid19_regressor
([called, start, end])adds dummy variable that is 1 during the time period that covid19 effects are present for the series, 0 otherwise.
add_diffed_terms
(*args[, diff, sep, drop])differences all passed variables (no AR terms) up to 2 times.
add_exp_terms
(*args, pwr[, sep, cutoff, drop])raises all passed variables (no AR terms) to exponential powers (ints or floats).
add_lagged_terms
(*args[, lags, upto, sep])lags all passed variables (no AR terms) 1 or more times.
add_logged_terms
(*args[, base, sep, drop])logs all passed variables (no AR terms).
add_other_regressor
(called, start, end)adds dummy variable that is 1 during the specified time period, 0 otherwise.
add_poly_terms
(*args[, pwr, sep])raises all passed variables (no AR terms) to exponential powers (ints only).
add_pt_terms
(*args[, method, sep, drop])applies a box-cox or yeo-johnson power transformation to all passed variables (no AR terms).
add_seasonal_regressors
(*args[, raw, …])adds seasonal regressors.
add_time_trend
([called])adds a time trend from 1 to len(current_dates) + len(future_dates) in current_xreg and future_xreg.
adf_test
([critical_pval, quiet, full_res, …])tests the stationarity of the y series using augmented dickey fuller.
all_feature_info_to_excel
([out_path, excel_name])saves all feature importance and summary stats to excel.
all_validation_grids_to_excel
([out_path, …])saves all validation grids to excel.
auto_forecast
([call_me, dynamic_testing])auto forecasts with the best parameters indicated from the tuning process.
diff
([i])differences the y attribute, as well as all AR values stored in current_xreg and future_xreg.
drop_Xvars
(*args)drops regressors.
drop_regressors
(*args)drops regressors.
export
([dfs, models, best_model, …])exports 1-all of 5 pandas dataframes, can write to excel with each dataframe on a separate sheet.
export_Xvars_df
([dropna])gets all utilized regressors and values.
export_feature_importance
(model)exports the feature importance from a model.
export_fitted_vals
(model)exports a single dataframe with dates, fitted values, actuals, and residuals.
export_forecasts_with_cis
(model)exports a single dataframe with forecasts and upper and lower forecast bounds.
export_summary_stats
(model)exports the summary stats from a model.
exports a single dataframe with test-set predictions, actuals, and upper and lower prediction bounds.
export_validation_grid
(model)exports the validation from a model.
fillna_y
([how])fills null values in the y attribute.
generates a certain amount of future dates in same frequency as current_dates.
get_freq
()gets the pandas inferred date frequency
get_funcs
(which)returns a group of functions based on what’s passed to which
gets the regressor names stored in the object.
uses pandas library to infer frequency of loaded dates.
ingest_Xvars_df
(df[, date_col, drop_first, …])ingests a dataframe of regressors and saves its contents to the Forecaster object.
ingest_grid
(grid)ingests a grid to tune the estimator.
integrate
([critical_pval, train_only, …])differences the series 0, 1, or 2 times based on ADF test results.
cuts the amount of y observations in the object.
limit_grid_size
(n[, random_seed])makes a grid smaller randomly.
manual_forecast
([call_me, dynamic_testing])manually forecasts with the hyperparameters, Xvars, and normalizer selection passed as keywords.
order_fcsts
(models[, determine_best_by])gets estimated forecasts ordered from best-to-worst.
plot
([models, order_by, level, print_attr, ci])plots all forecasts with the actuals, or just actuals if no forecasts have been evaluated or are selected.
plot_acf
([diffy, train_only])plots an autocorrelation function of the y values.
plot_fitted
([models, order_by])plots all fitted values with the actuals.
plot_pacf
([diffy, train_only])plots a partial autocorrelation function of the y values
plot_periodogram
([diffy, train_only])plots a periodogram of the y values (comes from scipy.signal).
plot_test_set
([models, order_by, …])plots all test-set predictions with the actuals.
pop
(*args)deletes evaluated forecasts from the object’s memory.
pop_using_criterion
(metric, evaluated_as, …)deletes all forecasts from history that meet a given criterion.
reset
()drops all regressors and reverts object to original (level) state when initiated.
saves feature info for models that offer it and will not raise errors if not available.
saves summary stats for models that offer it and will not raise errors if not available.
seasonal_decompose
([diffy, train_only])plots a signal/seasonal decomposition of the y values.
sets the number of bootstrap samples to set confidence intervals for each model (100 default).
set_cilevel
(n)sets the level for the resulting confidence intervals (95% default).
set_estimator
(estimator)sets the estimator to forecast with.
set_last_future_date
(date)generates future dates in the same frequency as current_dates that ends on a specified date.
set_test_length
([n])sets the length of the test set.
sets the length of the validation set.
set_validation_metric
([metric])sets the metric that will be used to tune all subsequent models.
tune
([dynamic_tuning])tunes the specified estimator using an ingested grid (ingests a grid from Grids.py with same name as the estimator by default).
tune_test_forecast
(models[, dynamic_tuning, …])iterates through a list of models, tunes them using grids in Grids.py, forecasts them, and can save feature information.
typ_set
()converts all objects in y, current_dates, future_dates, current_xreg, and future_xreg to appropriate types if possible.
undiff
([suppress_error])undifferences y to original level and drops all regressors (such as AR terms).
validates that all regressor names exist in both current_xregs and future_xregs.
- add_AR_terms(N)
adds seasonal auto-regressive terms.
- Parameters
N (tuple) – first element is the number of terms to add and the second element is the space between terms.
- Returns
None
>>> f.add_AR_terms((2,12)) # adds 12th and 24th lags
- add_ar_terms(n)
adds auto-regressive terms.
- Parameters
n (int) – the number of terms to add (1 to this number will be added).
- Returns
None
>>> f.add_ar_terms(4) # adds four lags of y to predict with
- add_combo_regressors(*args, sep='_')
combines all passed variables by multiplying their values together.
- Parameters
*args (str) – names of Xvars that aleady exist in the object.
sep (str) – default ‘_’. the separator between each term in arg to create the final variable name.
- Returns
None
>>> f.add_combo_regressors('t','monthsin') # multiplies these two together >>> f.add_combo_regressors('t','monthcos') # multiplies these two together
- add_covid19_regressor(called='COVID19', start=datetime.datetime(2020, 3, 15, 0, 0), end=datetime.datetime(2021, 5, 13, 0, 0))
adds dummy variable that is 1 during the time period that covid19 effects are present for the series, 0 otherwise. this function may be out of date as the pandemic has lasted longer than most expected, but we are keeping it for now.
- Parameters
called (str) – default ‘COVID19’. what to call the resulting variable.
start (str, datetime.datetime, or pd.Timestamp) – default datetime.datetime(2020,3,15). the start date (default is day Walt Disney World closed in the U.S.). use format ‘%Y-%m-%d’ when passing strings.
end – (str, datetime.datetime, or pd.Timestamp): default datetime.datetime(2021,5,13). the end date (default is day the U.S. CDC dropped mask mandate/recommendation for vaccinated people). use format ‘%Y-%m-%d’ when passing strings.
- Returns
None
- add_diffed_terms(*args, diff=1, sep='_', drop=False)
differences all passed variables (no AR terms) up to 2 times.
- Parameters
*args (str) – names of Xvars that aleady exist in the object.
diff (int) – one of {1,2}, default 1. the number of times to difference each variable passed to args.
sep (str) – default ‘_’. the separator between each term in arg to create the final variable name. resulting variable names will be like “tdiff_1” or “tdiff_2” by default.
drop (bool) – default False. whether to drop the regressors passed to *args.
- Returns
None
>>> add_diffed_terms('t') # adds first difference of t as regressor
- add_exp_terms(*args, pwr, sep='^', cutoff=2, drop=False)
raises all passed variables (no AR terms) to exponential powers (ints or floats).
- Parameters
*args (str) – names of Xvars that aleady exist in the object.
pwr (float) – the power to raise each term to in args. can use values like 0.5 to perform square roots, etc.
sep (str) – default ‘^’. the separator between each term in arg to create the final variable name.
cutoff (int) – default 2. the resulting variable name will be rounded to this number based on the passed pwr. for instance, if pwr = 0.33333333333 and ‘t’ is passed as an arg to *args, the resulting name will be t^0.33 by default.
drop (bool) – default False. whether to drop the regressors passed to *args.
- Returns
None
>>> f.add_exp_terms('t',pwr=.5) # adds square root t
- add_lagged_terms(*args, lags=1, upto=True, sep='_')
lags all passed variables (no AR terms) 1 or more times.
- Parameters
*args (str) – names of Xvars that aleady exist in the object.
lags (int) – greater than 0, default 1. the number of times to lag each passed variable.
upto (bool) – default True. whether to add all lags up to the number passed to lags. if you pass 6 to lags and upto is True, lags 1, 2, 3, 4, 5, 6 will all be added. if you pass 6 to lags and upto is False, lag 6 only will be added.
sep (str) – default ‘_’. the separator between each term in arg to create the final variable name. resulting variable names will be like “tlag_1” or “tlag_2” by default.
- Returns
None
>>> add_lagged_terms('t',lags=3) # adds first, second, and third lag of t >>> add_lagged_terms('t',lags=6,upto=False) # adds 6th lag of t only
- add_logged_terms(*args, base=2.718281828459045, sep='', drop=False)
logs all passed variables (no AR terms).
- Parameters
*args (str) – names of Xvars that aleady exist in the object.
base (float) – default math.e. the log base. must be math.e or int greater than 1.
sep (str) – default ‘’. the separator between each term in arg to create the final variable name. resulting variable names will be like “log2t” or “lnt” by default
drop (bool) – default False. whether to drop the regressors passed to *args.
- Returns
None
>>> f.add_logged_terms('t') # adds natural log t
- add_other_regressor(called, start, end)
adds dummy variable that is 1 during the specified time period, 0 otherwise.
- Parameters
called (str) – what to call the resulting variable.
start (str, datetime.datetime, or pd.Timestamp) – start date. use format ‘%Y-%m-%d’ when passing strings.
end (str, datetime.datetime, or pd.Timestamp) – end date. use format ‘%Y-%m-%d’ when passing strings.
- Returns
None
>>> f.add_other_regressor('january_2021','2021-01-01','2021-01-31')
- add_poly_terms(*args, pwr=2, sep='^')
raises all passed variables (no AR terms) to exponential powers (ints only).
- Parameters
*args (str) – names of Xvars that aleady exist in the object
pwr (int) – default 2. the max power to add to each term in args (2 to this number will be added).
sep (str) – default ‘^’. the separator between each term in arg to create the final variable name.
- Returns
None
>>> f.add_poly_terms('t','year',pwr=3) ### raises t and year to 2nd and 3rd powers
- add_pt_terms(*args, method='box-cox', sep='_', drop=False)
applies a box-cox or yeo-johnson power transformation to all passed variables (no AR terms).
- Parameters
*args (str) – names of Xvars that aleady exist in the object
method (str) – one of {‘box-cox’,’yeo-johnson’}, default ‘box-cox’. the type of transformation. box-cox works for positive values only. yeo-johnson is like a box-cox but can be used with 0s or negatives. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html.
sep (str) – default ‘’. the separator between each term in arg to create the final variable name. resulting variable names will be like “box-cox_t” or “yeo-johnson_t” by default.
drop (bool) – default False. whether to drop the regressors passed to *args.
- Returns
None
>>> f.add_pt_terms('t') # adds box cox of t
- add_seasonal_regressors(*args, raw=True, sincos=False, dummy=False, drop_first=False)
adds seasonal regressors.
- Parameters
*args – each of str type. values that return a series of int type from pandas.dt and pandas.dt.isocalendar(). see https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.year.html.
raw (bool) – default True. whether to use the raw integer values
sincos (bool) – default False. whether to use a sin/cos transformation of the raw integer values (estimates the cycle based on the max observed value)
dummy (bool) – default False. whether to use dummy variables from the raw int values
drop_first (bool) – default False. whether to drop the first observed dummy level. not relevant when dummy = False
- Returns
None
>>> f.add_seasonal_regressors('year') >>> f.add_seasonal_regressors('month','week','quarter',raw=False,sincos=True) >>> f.add_seasonal_regressors('dayofweek',raw=False,dummy=True,drop_first=True)
- add_time_trend(called='t')
adds a time trend from 1 to len(current_dates) + len(future_dates) in current_xreg and future_xreg.
- Parameters
called (str) – default ‘t’. what to call the resulting variable
- Returns
None
>>> f.add_time_trend()
- adf_test(critical_pval=0.05, quiet=True, full_res=False, train_only=False, **kwargs) → Union[tuple, bool]
tests the stationarity of the y series using augmented dickey fuller.
- Parameters
critical_pval (float) – default 0.05. the p-value threshold in the statistical test to accept the alternative hypothesis.
quiet (bool) – default True. if False, prints whether the tests suggests stationary or non-stationary data.
full_res (bool) – default False. if True, returns a dictionary with the pvalue, evaluated statistic, and other statistical information (returns what the adfuller() function from statsmodels does). if False, returns a bool that matches whether the test indicates stationarity.
train_only (bool) – default False. if True, will exclude the test set from the test (to avoid leakage).
**kwargs – passed to adfuller() function from statsmodels.
- Returns
- if bool (full_res = False), returns whether the test suggests stationarity.
otherwise, returns the full results (stat, pval, etc.) of the test.
- Return type
(bool or tuple)
>>> stat, pval, _, _, _, _ = f.adf_test(full_res=True)
- all_feature_info_to_excel(out_path='./', excel_name='feature_info.xlsx')
saves all feature importance and summary stats to excel. each model where such info is available for gets its own tab. be sure to have called save_summary_stats() and/or save_feature_importance() before using this function.
- Parameters
out_path (str) – default ‘./’ the path to export to
excel_name (str) – default ‘feature_info.xlsx’ the name of the resulting excel file
- Returns
None
- all_validation_grids_to_excel(out_path='./', excel_name='validation_grids.xlsx', sort_by_metric_value=False, ascending=True)
saves all validation grids to excel. each model where such info is available for gets its own tab. be sure to have tuned at least model before calling this.
- Parameters
out_path (str) – default ‘./’. the path to export to.
excel_name (str) – default ‘feature_info.xlsx’. the name of the resulting excel file.
sort_by_metric_value (bool) – default False. whether to sort the output by performance on validation set
ascending (bool) – default True. whether to sort least-to-greatest. ignored if sort_by_metric_value is False.
- Returns
None
- auto_forecast(call_me=None, dynamic_testing=True)
auto forecasts with the best parameters indicated from the tuning process.
- Parameters
call_me (str) – optional. what to call the model when storing it in the object’s history dictionary. if not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). duplicated names will be overwritten with the most recently called model.
dynamic_testing (bool) – default True. whether to dynamically test the forecast (meaning AR terms will be propogated with predicted values). setting this to False means faster performance, but gives a less-good indication of how well the forecast will perform out x amount of periods. when False, test-set metrics effectively become an average of one-step forecasts.
- Returns
None
>>> f.set_estimator('xgboost') >>> f.tune() >>> f.auto_forecast()
- diff(i=1)
differences the y attribute, as well as all AR values stored in current_xreg and future_xreg.
- Parameters
i (int) – default 1. the number of differences to take. must be 1 or 2.
- Returns
None
>>> f.diff(2) # differences y twice
- drop_Xvars(*args)
drops regressors.
- Parameters
*args (str) – the names of regressors to drop.
- Returns
None
>>> f.add_time_trend() >>> f.add_exp_terms('t',pwr=.5) >>> f.drop_Xvars('t','t^0.5')
- drop_regressors(*args)
drops regressors.
- Parameters
*args (str) – the names of regressors to drop.
- Returns
None
>>> f.add_time_trend() >>> f.add_exp_terms('t',pwr=.5) >>> f.drop_regressors('t','t^0.5')
- export(dfs=['all_fcsts', 'model_summaries', 'best_fcst', 'test_set_predictions', 'lvl_test_set_predictions', 'lvl_fcsts'], models='all', best_model='auto', determine_best_by='TestSetRMSE', to_excel=False, out_path='./', excel_name='results.xlsx') → Union[Dict[str, pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]
exports 1-all of 5 pandas dataframes, can write to excel with each dataframe on a separate sheet. will return either a dictionary with dataframes as values (df str arguments as keys) or a single dataframe if only one df is specified.
- Parameters
dfs (list-like or str) – default [‘all_fcsts’,’model_summaries’,’best_fcst’,’test_set_predictions’,’lvl_fcsts’]. a list or name of the specific dataframe(s) you want returned and/or written to excel. must be one of or multiple of default.
models (list-like or str) – default ‘all’. the models to write information for. can start with “top_” and the metric specified in determine_best_by will be used to order the models appropriately.
best_model (str) – default ‘auto’. the name of the best model, if “auto”, will determine this by the metric in determine_best_by. if not “auto”, must match a model nickname of an already-evaluated model.
determine_best_by (str) – one of _determine_best_by_, default ‘TestSetRMSE’.
to_excel (bool) – default False. whether to save to excel.
out_path (str) – default ‘./’. the path to save the excel file to (ignored when to_excel=False).
excel_name (str) – default ‘results.xlsx’. the name to call the excel file (ignored when to_excel=False).
- Returns
either a single pandas dataframe if one element passed to dfs or a dictionary where the keys match what was passed to dfs and the values are dataframes.
- Return type
(DataFrame or Dict[str,DataFrame])
>>> f.export(dfs=['model_summaries','lvl_fcsts'],to_excel=True)
- export_Xvars_df(dropna=False)
gets all utilized regressors and values.
- Parameters
dropna (bool) – default False. whether to drop null values from the resulting dataframe
- Returns
A dataframe of Xvars and names/values stored in the object.
- Return type
(DataFrame)
- export_feature_importance(model) → pandas.core.frame.DataFrame
exports the feature importance from a model. raises an error if you never saved the model’s feature importance.
- Parameters
model (str) – the name of them model to export for. matches what was passed to call_me when calling the forecast (default is estimator name)
- Returns
The resulting feature importances of the evaluated model passed to model arg.
- Return type
(DataFrame)
>>> fi = f.export_feature_importance('mlr')
- export_fitted_vals(model)
exports a single dataframe with dates, fitted values, actuals, and residuals.
- Parameters
model (str) – the model nickname (must exist in history.keys()).
- Returns
A dataframe with dates, fitted values, actuals, and residuals.
- Return type
(DataFrame)
- export_forecasts_with_cis(model)
exports a single dataframe with forecasts and upper and lower forecast bounds.
- Parameters
model (str) – the model nickname (must exist in history.keys()).
- Returns
A dataframe with forecasts to future dates and corresponding confidence intervals.
- Return type
(DataFrame)
- export_summary_stats(model) → pandas.core.frame.DataFrame
exports the summary stats from a model. raises an error if you never saved the model’s summary stats.
- Parameters
model (str) – the name of them model to export for. matches what was passed to call_me when calling the forecast (default is estimator name)
- Returns
The resulting summary stats of the evaluated model passed to model arg.
- Return type
(DataFrame)
>>> ss = f.export_summary_stats('arima')
- export_test_set_preds_with_cis(model)
exports a single dataframe with test-set predictions, actuals, and upper and lower prediction bounds.
- Parameters
model (str) – the model nickname (must exist in history.keys()).
- Returns
A dataframe with test-set predictions and actuals with corresponding confidence intervals.
- Return type
(DataFrame)
- export_validation_grid(model) → pandas.core.frame.DataFrame
- exports the validation from a model.
raises an error if you never tuned the model.
- Parameters
model (str) – the name of them model to export for. matches what was passed to call_me when calling the forecast (default is estimator name)
- Returns
The resulting validation grid of the evaluated model passed to model arg.
- Return type
(DataFrame)
- fillna_y(how='ffill')
fills null values in the y attribute.
- Parameters
how (str) – one of {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, ‘midpoint’}. midpoint is unique to this library and only works if there is not more than two missing values sequentially. all other possible arguments are from pandas.DataFrame.fillna() method and will do the same.
- Returns
None
- generate_future_dates(n)
generates a certain amount of future dates in same frequency as current_dates.
- Parameters
n (int) – greater than 0. number of future dates to produce. this will also be the forecast length.
- Returns
None
>>> f.generate_future_dates(12) # 12 future dates to forecast out to
- get_freq() → str
gets the pandas inferred date frequency
- Returns
The inferred frequency of the current_dates array.
- Return type
(str)
>>> f.get_freq()
- get_funcs(which) → list
returns a group of functions based on what’s passed to which
- Parameters
which (str) – one of {‘adder’,’exporter’,’setter’,’plotter’,’getter’}
- Returns
the names of the relevant functions
- Return type
(list)
>>> f.get_funcs('adder')
- get_regressor_names() → list
gets the regressor names stored in the object.
- Parameters
None –
- Returns
Regressor names that have been added to the object.
- Return type
(list)
>>> f.add_time_trend() >>> f.get_regressor_names()
- infer_freq()
uses pandas library to infer frequency of loaded dates.
- ingest_Xvars_df(df, date_col='Date', drop_first=False, use_future_dates=False)
ingests a dataframe of regressors and saves its contents to the Forecaster object. must specify a date column. all non-numeric values will be dummied. any columns in the dataframe that begin with “AR” will be confused with autoregressive terms and could cause errors.
- Parameters
df (DataFrame) – the dataframe that is at least the length of len(current_dates) + len(future_dates)
date_col (str) – default ‘Date’. the name of the date column in the dataframe. this column must have the same frequency as the dates in current_dates.
drop_first (bool) – default False. whether to drop the first observation of any dummied variables. irrelevant if passing all numeric values.
use_future_dates (bool) – default False. whether to use the future dates in the dataframe as the future_dates attribute in the object.
- Returns
None
- ingest_grid(grid)
ingests a grid to tune the estimator.
- Parameters
grid (dict or str) – if dict, must be a user-created grid. if str, must match the name of a dict grid stored in Grids.py.
- Returns
None
>>> f.set_estimator('mlr') >>> f.ingest_grid({'normalizer':['scale','minmax']})
- integrate(critical_pval=0.05, train_only=False, max_integration=2)
differences the series 0, 1, or 2 times based on ADF test results.
- Parameters
critical_pval (float) – default 0.05. the p-value threshold in the statistical test to accept the alternative hypothesis.
train_only (bool) – default False. if True, will exclude the test set from the ADF test (to avoid leakage).
max_integration (int) – one of {1,2}, default 2. if 1, will only difference data up to one time even if the results of the test indicate two integrations. if 2, behaves how you would expect.
- Returns
None
>>> f.integrate(max_integration=1) # differences y only once if it is not stationarity >>> f.integrate() # differences y up to twice it is not stationarity and if its first difference is not stationary
- keep_smaller_history(n)
cuts the amount of y observations in the object.
- Parameters
n (int, str, or datetime.datetime) – if int, the number of observations to keep. otherwise, the last observation to keep. if str, must be ‘%Y-%m-%d’ format.
- Returns
None
>>> f.keep_smaller_history(500) # keeps last 500 observations >>> f.keep_smaller_history('2020-01-01') # keeps only observations on or later than 1/1/2020
- limit_grid_size(n, random_seed=None)
makes a grid smaller randomly.
- Parameters
n (int or float) – if int, randomly selects that many parameter combinations. if float, must be less than 1 and greater 0, randomly selects that percentage of parameter combinations.
random_seed (int) – optional. set a seed to make results consistent.
- Returns
None
>>> from scalecast import GridGenerator >>> GridGenerator.get_example_grids() >>> f.set_estimator('mlp') >>> f.ingest_grid('mlp') >>> f.limit_grid_size(10,random_seed=20) # limits grid to 10 iterations >>> f.limit_grid_size(.5,random_seed=20) # limits grid to half its original size
- manual_forecast(call_me=None, dynamic_testing=True, **kwargs)
manually forecasts with the hyperparameters, Xvars, and normalizer selection passed as keywords.
- Parameters
call_me (str) – optional. what to call the model when storing it in the object’s history dictionary. if not specified, the model’s nickname will be assigned the estimator value (‘mlp’ will be ‘mlp’, etc.). duplicated names will be overwritten with the most recently called model.
dynamic_testing (bool) – default True. whether to dynamically test the forecast (meaning AR terms will be propogated with predicted values). setting this to False means faster performance, but gives a less-good indication of how well the forecast will perform out x amount of periods. when False, test-set metrics effectively become an average of one-step forecasts.
**kwargs – passed to the _forecast_{estimator}() method and can include such parameters as Xvars, normalizer, cap, and floor, in addition to any given model’s specific hyperparameters for sklearn models, can inlcude normalizer and Xvars. for ARIMA, Prophet and Silverkite models, can include Xvars but not normalizer. LSTM and RNN models have their own sets of possible keywords. see https://scalecast.readthedocs.io/en/latest/Forecaster/_forecast.html.
- Returns
None
>>> f.set_estimator('mlr') >>> f.manual_forecast(normalizer='pt')
- order_fcsts(models, determine_best_by='TestSetRMSE') → list
gets estimated forecasts ordered from best-to-worst.
- Parameters
models (list-like) – each element must match an evaluated model’s nickname (which is the same as its estimator name by default).
determine_best_by (str) – default ‘TestSetRMSE’. one of _determine_best_by_.
- Returns
The ordered models.
- Return type
(list)
>>> models = ('mlr','mlp','lightgbm') >>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True) >>> ordered_models = f.order_fcsts(models,"LevelTestSetMAPE")
- plot(models='all', order_by=None, level=False, print_attr=[], ci=False)
plots all forecasts with the actuals, or just actuals if no forecasts have been evaluated or are selected.
- Parameters
models (list-like, str, or None) – default ‘all’. the forecasted models to plot. can start with “top_” and the metric specified in order_by will be used to order the models appropriately. if None or models/order_by combo invalid, will plot only actual values.
order_by (str) – one of _determine_best_by_. optional.
level (bool) – default False. if True, will always plot level forecasts. if False, will plot the forecasts at whatever level they were called on. if False and there are a mix of models passed with different integrations, will default to True.
print_attr (list-like) – default []. attributes from history dict to print to console. if the attribute doesn’t exist for a passed model, will not raise error, will just skip that element.
ci (bool) – default False. whether to display the confidence intervals. change defaults by calling set_cilevel() and set_bootstrapped_samples() before forecasting. ignored when level = True.
- Returns
the created figure
- Return type
(Figure)
>>> models = ('mlr','mlp','lightgbm') >>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True) >>> f.plot(order_by='LevelTestSetMAPE') # plots all forecasts >>> plt.show()
- plot_acf(diffy=False, train_only=False, **kwargs)
plots an autocorrelation function of the y values.
- Parameters
diffy (bool or int) – one of {True,False,0,1,2}. default False. whether to difference the data and how many times before passing the values to the function. if False or 0, does not difference. if True or 1, differences 1 time. if 2, differences 2 times.
train_only (bool) – default False. if True, will exclude the test set from the test (a measure added to avoid leakage).
**kwargs – passed to plot_acf() function from statsmodels.
- Returns
If ax is None, the created figure. Otherwise the figure to which ax is connected.
- Return type
(Figure)
>>> import matplotlib.pyplot as plt >>> f.plot_acf(train_only=True) >>> plt.plot()
- plot_fitted(models='all', order_by=None)
plots all fitted values with the actuals.
- Parameters
models (list-like,str) – default ‘all’. the forecated models to plot. can start with “top_” and the metric specified in order_by will be used to order the models appropriately.
order_by (str) – one of _determine_best_by_, default None.
- Returns
the created figure
- Return type
(Figure)
>>> models = ('mlr','mlp','lightgbm') >>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True) >>> f.plot_fitted(order_by='LevelTestSetMAPE') # plots all fitted values >>> plt.show()
- plot_pacf(diffy=False, train_only=False, **kwargs)
plots a partial autocorrelation function of the y values
- Parameters
diffy (bool or int) – one of {True,False,0,1,2}. default False. whether to difference the data and how many times before passing the values to the function. if False or 0, does not difference. if True or 1, differences 1 time. if 2, differences 2 times.
train_only (bool) – default False. if True, will exclude the test set from the test (a measure added to avoid leakage).
**kwargs – passed to plot_pacf() function from statsmodels.
- Returns
If ax is None, the created figure. Otherwise the figure to which ax is connected.
- Return type
(Figure)
>>> import matplotlib.pyplot as plt >>> f.plot_pacf(train_only=True) >>> plt.plot()
- plot_periodogram(diffy=False, train_only=False)
plots a periodogram of the y values (comes from scipy.signal).
- Parameters
diffy (bool or int) – one of {True,False,0,1,2}. default False. whether to difference the data and how many times before passing the values to the function. if False or 0, does not difference. if True or 1, differences 1 time. if 2, differences 2 times.
train_only (bool) – default False. if True, will exclude the test set from the test (a measure added to avoid leakage).
- Returns
Element 1: Array of sample frequencies. Element 2: Power spectral density or power spectrum of x.
- Return type
(ndarray,ndarray)
>>> import matplotlib.pyplot as plt >>> a, b = f.plot_periodogram(diffy=True,train_only=True) >>> plt.semilogy(a, b) >>> plt.show()
- plot_test_set(models='all', order_by=None, include_train=True, level=False, ci=False)
plots all test-set predictions with the actuals.
- Parameters
models (list-like or str) – default ‘all’. the forecated models to plot. can start with “top_” and the metric specified in order_by will be used to order the models appropriately.
order_by (str) – one of _determine_best_by_, optional.
include_train (bool or int) – default True. use to zoom into training resultsl if True, plots the test results with the entire history in y. if False, matches y history to test results and only plots this. if int, plots that length of y to match to test results.
level (bool) – default False. if True, will always plot level forecasts. if False, will plot the forecasts at whatever level they were called on. if False and there are a mix of models passed with different integrations, will default to True.
ci (bool) – default False. whether to display the confidence intervals. default is 100 boostrapped samples and a 95% confidence interval. change defaults by calling set_cilevel() and set_bootstrapped_samples() before forecasting. ignored when level = False.
- Returns
the created figure
- Return type
(Figure)
>>> models = ('mlr','mlp','lightgbm') >>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True) >>> f.plot(order_by='LevelTestSetMAPE') # plots all test-set results >>> plt.show()
- pop(*args)
deletes evaluated forecasts from the object’s memory.
- Parameters
*args (str) – names of models matching what was passed to call_me.
for call_me in a given model is the same as the estimator name. (default) –
>>> models = ('mlr','mlp','lightgbm') >>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True) >>> f.pop('mlr')
- pop_using_criterion(metric, evaluated_as, threshold, delete_all=True)
deletes all forecasts from history that meet a given criterion.
- Parameters
metric (str) – one of _determine_best_by_ + [‘AnyPrediction’,’AnyLevelPrediction’].
evaluated_as (str) – one of {“<”,”<=”,”>”,”>=”,”==”}.
threshold (float) – the threshold to compare the metric and operator to.
delete_all (bool) – default True. if the passed criterion deletes all forecasts, whether to actually delete all forecasts. if False and all forecasts meet criterion, will keep them all.
- Returns
None
>>> models = ('mlr','mlp','lightgbm') >>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True) >>> f.pop_using_criterion('LevelTestSetMAPE','>',2) >>> f.pop_using_criterion('AnyPrediction','<',0,delete_all=False)
- reset()
drops all regressors and reverts object to original (level) state when initiated.
- save_feature_importance()
saves feature info for models that offer it and will not raise errors if not available. call after evaluating the model you want it for and before changing the estimator.
>>> f.set_estimator('mlr') >>> f.manual_forecast() >>> f.save_feature_importance()
- save_summary_stats()
saves summary stats for models that offer it and will not raise errors if not available. call after evaluating the model you want it for and before changing the estimator.
>>> f.set_estimator('arima') >>> f.manual_forecast(order=(1,1,1)) >>> f.save_summary_stats()
- seasonal_decompose(diffy=False, train_only=False, **kwargs)
plots a signal/seasonal decomposition of the y values.
- Parameters
diffy (bool or int) – one of {True,False,0,1,2}. default False. whether to difference the data and how many times before passing the values to the function. if False or 0, does not difference. if True or 1, differences 1 time. if 2, differences 2 times.
train_only (bool) – default False. If True, will exclude the test set from the test (a measure added to avoid leakage).
**kwargs – passed to seasonal_decompose() function from statsmodels.
- Returns
An object with seasonal, trend, and resid attributes.
- Return type
(DecomposeResult)
>>> import matplotlib.pyplot as plt >>> f.seasonal_decompose(train_only=True).plot() >>> plt.show()
- set_bootstrap_samples(n)
sets the number of bootstrap samples to set confidence intervals for each model (100 default).
- Parameters
n (int) – greater than or equal to 30. 30 because you need around there to satisfy central limit theorem. the lower this number, the faster the performance, but the less confident in the resulting intervals you should be.
- Returns
None
>>> f.set_bootstrap_samples(1000) # next forecast will get confidence intervals with 1,000 bootstrap sample
- set_cilevel(n)
sets the level for the resulting confidence intervals (95% default).
- Parameters
n (float) – greater than 0 and less than 1.
- Returns
None
>>> f.set_cilevel(.80) # next forecast will get 80% confidence intervals
- set_estimator(estimator)
sets the estimator to forecast with.
- Parameters
estimator (str) – one of _estimators_
- Returns
None
>>> f.set_estimator('mlr')
- set_last_future_date(date)
generates future dates in the same frequency as current_dates that ends on a specified date.
- Parameters
date (datetime.datetime, pd.Timestamp, or str) – the date to end on. if str, must be in ‘%Y-%m-%d’ format.
- Returns
None
>>> f.set_last_future_date('2021-06-01') # creates future dates up to this one in the expected frequency
- set_test_length(n=1)
sets the length of the test set.
- Parameters
n (int) – default 1. the length of the resulting test set.
- Returns
None
>>> f.set_test_length(12) # test set of 12
- set_validation_length(n=1)
sets the length of the validation set.
- Parameters
n (int) – default 1. the length of the resulting validation set.
- Returns
None
>>> f.set_validation_length(6) # validation length of 6
- set_validation_metric(metric='rmse')
sets the metric that will be used to tune all subsequent models.
- Parameters
metric – one of _metrics_, default ‘rmse’. the metric to optimize the models with using the validation set.
- Returns
None
>>> f.set_validation_metric('mae')
- tune(dynamic_tuning=False)
tunes the specified estimator using an ingested grid (ingests a grid from Grids.py with same name as the estimator by default). any parameters that can be passed as arguments to manual_forecast() can be tuned with this process.
- Parameters
dynamic_tuning (bool) – default False. whether to dynamically tune the forecast (meaning AR terms will be propogated with predicted values). setting this to False means faster performance, but gives a less-good indication of how well the forecast will perform out x amount of periods. when False, metrics effectively become an average of one-step forecasts.
- Returns
None
>>> f.set_estimator('xgboost') >>> f.tune() >>> f.auto_forecast()
- tune_test_forecast(models, dynamic_tuning=False, dynamic_testing=True, summary_stats=False, feature_importance=False)
iterates through a list of models, tunes them using grids in Grids.py, forecasts them, and can save feature information.
- Parameters
models (list-like) – each element must be in _can_be_tuned_.
dynamic_tuning (bool) – default False. whether to dynamically tune the forecast (meaning AR terms will be propogated with predicted values). setting this to False means faster performance, but gives a less-good indication of how well the forecast will perform out x amount of periods. when False, metrics effectively become an average of one-step forecasts.
dynamic_testing (bool) – default True. whether to dynamically test the forecast (meaning AR terms will be propogated with predicted values). setting this to False means faster performance, but gives a less-good indication of how well the forecast will perform out x amount of periods. when False, test-set metrics effectively become an average of one-step forecasts.
summary_stats (bool) – default False. whether to save summary stats for the models that offer those.
feature_importance (bool) – default False. whether to save permutation feature importance information for the models that offer those.
- Returns
None
>>> models = ('mlr','mlp','lightgbm') >>> f.tune_test_forecast(models,dynamic_testing=False,feature_importance=True)
- typ_set()
converts all objects in y, current_dates, future_dates, current_xreg, and future_xreg to appropriate types if possible. automatically gets called when object is initiated.
>>> f.typ_set() # sets all arrays to the correct format
- undiff(suppress_error=False)
undifferences y to original level and drops all regressors (such as AR terms).
- Parameters
suppress_error (bool) – default False. whether to suppress an error that gets raised if the series was never differenced.
- Returns
None
>>> f.undiff()
- validate_regressor_names()
validates that all regressor names exist in both current_xregs and future_xregs. raises an error if this is not the case.