quickstart

Date: Nov 30, 2021 Version:

Getting Started

A minimal experiment definition supplies the path to the dataset, the name of the column that holds the time index and the name of the column that holds the target to predict. In the minimal example below, an experiment is conducting using the retail sales data included with divina and using the log link function best suited towards sales data.

{
  "experiment_definition": {
    "target": "Sales",
    "link_function": "log",
    "time_index": "Date",
    "data_path": "divina://retail_sales"
  }
}

As you can see, Divina automatically uses the non-target data in the file to make insample predictions that are quite accurate. However, the forecast produced is for all stores in aggregate while there are three distinct retail locations in the dataset.

Target Dimensions

Below we use the target_dimensions option to tell divina to individually aggregate and forecast each retail store in the dataset.

{
  "experiment_definition": {
    "target": "Sales",
    "link_function": "log",
    "target_dimensions": [
      "Store"
    ],
    "time_index": "Date",
    "data_path": "divina://retail_sales"
  }
}

We can see through the interpretability interface what information is influencing the forecasts and how.

Joining Datasets

An important part of forecasting and feature of divina is the ability to work with additional datasets and the below example definition illustrates how to join the built-in time dataset to the retail dataset, allowing for additional information to be used in the predictions.

{
  "experiment_definition": {
    "target": "Sales",
    "link_function": "log",
    "target_dimensions": [
      "Store"
    ],
    "time_index": "Date",
    "data_path": "divina://retail_sales",
    "joins": [
      {
        "data_path": "divina://time",
        "join_on": [
          "Date",
          "Date"
        ],
        "as": "time"
      }
    ]
  }
}

We can see through the interpretablity interface that the new time information is now informing the forecasts. This is important because in order to make long-range forecasts, datasets with forward-looking information or assumptions present through the forecast period must be used.

Feature Engineering

Information encoding, binning and interaction terms are all powerful features of divina that bring its performance in line with that of tree-based models and neural networks. Here we narrow the information provided to the model to prevent overfitting and add those options to the experiment definition. You can see that the forecasts become more meaningful through the interpretability interface.

{
  "experiment_definition": {
    "target": "Sales",
    "link_function": "log",
    "target_dimensions": [
      "Store"
    ],
    "time_index": "Date",
    "data_path": "divina://retail_sales",
    "joins": [
      {
        "data_path": "divina://time",
        "join_on": [
          "Date",
          "Date"
        ],
        "as": "time"
      }
    ],
    "include_features": [
      "Store",
      "Weekday",
      "Month",
      "Holiday",
      "HolidayType",
      "StoreType",
      "Assortment",
      "LastDayOfMonth"
    ],
    "encode_features": [
      "Store",
      "Month",
      "StoreType",
      "Weekday",
      "HolidayType",
      "Assortment"
    ],
    "bin_features": {
      "Month": [
        3,
        6,
        9
      ]
    },
    "interaction_features": {
      "Store": [
        "Holiday"
      ]
    }
  }
}

While visual inspection is a powerful tool for validating a model, programmatic and distributional validation is provided through the time_validation_splits option of divina.

Cross Validation

{
  "experiment_definition": {
    "target": "Sales",
    "link_function": "log",
    "target_dimensions": [
      "Store"
    ],
    "time_index": "Date",
    "data_path": "divina://retail_sales",
    "include_features": [
      "Store",
      "Weekday",
      "Month",
      "WeekOfYear",
      "Holiday",
      "HolidayType",
      "StoreType",
      "Assortment",
      "LastDayOfMonth"
    ],
    "encode_features": [
      "Store",
      "Month",
      "StoreType",
      "Weekday",
      "HolidayType",
      "Assortment"
    ],
    "joins": [
      {
        "data_path": "divina://time",
        "join_on": [
          "Date",
          "Date"
        ],
        "as": "time"
      }
    ],
    "bin_features": {
      "Month": [
        3,
        6,
        9
      ]
    },
    "interaction_features": {
      "Store": [
        "Holiday"
      ]
    },
    "validation_splits": [
      "2015-07-18"
    ]
  }
}

A key feature of divina is the ability to easily simulate potential future values as information to feed the model. In our retail example, we simulate promotions as both occuring and not every day, so that we have both scenarios to consider during the decision-making process.

Simulation

{
  "experiment_definition": {
    "target": "Sales",
    "link_function": "log",
    "target_dimensions": [
      "Store"
    ],
    "time_index": "Date",
    "data_path": "divina://retail_sales",
    "include_features": [
      "Store",
      "Weekday",
      "Month",
      "Holiday",
      "HolidayType",
      "StoreType",
      "Assortment",
      "LastDayOfMonth",
      "Promo"
    ],
    "forecast_end": "01-01-2016",
    "frequency": "D",
    "encode_features": [
      "Store",
      "Month",
      "StoreType",
      "Weekday",
      "HolidayType",
      "Assortment"
    ],
    "joins": [
      {
        "data_path": "divina://time",
        "join_on": [
          "Date",
          "Date"
        ],
        "as": "time"
      }
    ],
    "bin_features": {
      "Month": [
        3,
        6,
        9
      ]
    },
    "interaction_features": {
      "Store": [
        "Holiday"
      ]
    },
    "validation_splits": [
      "2015-07-18"
    ],
    "scenarios": {
      "Promo": {
        "mode": "constant",
        "constant_values": [
          0,
          1
        ]
      },
      "StoreType": {
        "mode": "last"
      },
      "Assortment": {
        "mode": "last"
      }
    }
  }
}

Confidence Intervals

Confidence intervals provide important insight into how sure divina is of its predictions, further allowing high-quality decisions to be made on top of them. Below we add confidence intervals to the forecasts via the confidence_intervals option.

{
  "experiment_definition": {
    "target": "Sales",
    "link_function": "log",
    "target_dimensions": [
      "Store"
    ],
    "time_index": "Date",
    "data_path": "divina://retail_sales",
    "include_features": [
      "Store",
      "Weekday",
      "Promo",
      "Month",
      "Holiday",
      "HolidayType",
      "StoreType",
      "Assortment",
      "LastDayOfMonth"
    ],
    "joins": [
      {
        "data_path": "divina://time",
        "join_on": [
          "Date",
          "Date"
        ],
        "as": "time"
      }
    ],
    "forecast_end": "01-01-2016",
    "frequency": "D",
    "encode_features": [
      "Store",
      "Month",
      "StoreType",
      "Weekday",
      "HolidayType",
      "Assortment"
    ],
    "validation_splits": [
      "2015-07-18"
    ],
    "interaction_features": {
      "Store": [
        "Holiday"
      ]
    },
    "bin_features": {
      "Month": [
        3,
        6,
        9
      ]
    },
    "scenarios": {
      "Promo": {
        "mode": "constant",
        "constant_values": [
          0,
          1
        ]
      },
      "StoreType": {
        "mode": "last"
      },
      "Assortment": {
        "mode": "last"
      }
    },
    "confidence_intervals": [
      0,
      100
    ],
    "bootstrap_sample": 5
  }
}

Forecasting at Scale

In order to to work with larger datasets, include more features, increase the bootstrap sample of divina’s confidence intervals, or otherwise scale your forecasting workload, use the –aws_workers option when running the experiment through the cli.

divina experiment /path/to/my/experiment_definition.json -aws_workers=10