Kedro starters¶
Kedro starters are used to create projects that contain code to run as-is, or to adapt and extend. They provide pre-defined example code and configuration that can be reused, for example:
As example code for a typical Kedro project
To add a
docker-compose
setup to launch Kedro next to a monitoring stackTo add deployment scripts and CI/CD setup for your targeted infrastructure
A Kedro starter is a Cookiecutter template that contains the boilerplate code for a Kedro project. You can create your own starters for reuse within a project or team, as described in the documentation about how to create a Kedro starter.
How to use Kedro starters¶
To create a Kedro project using a starter, apply the --starter
flag to kedro new
as follows:
kedro new --starter=<path-to-starter>
Note:
path-to-starter
could be a local directory or a VCS repository, as long as it is supported by Cookiecutter.
To create a project using the PySpark
starter:
kedro new --starter=pyspark
If no starter is provided to kedro new
, the default Kedro template will be used, as documented in “Creating a new project”.
Starter aliases¶
We provide aliases for common starters maintained by the Kedro team so that users don’t have to specify the full path. For example, to create a project using the PySpark
starter:
kedro new --starter=pyspark
To list all the aliases we support:
kedro starter list
List of official starters¶
The Kedro team maintains the following starters to bootstrap new Kedro projects:
Alias
mini-kedro
: A minimum setup to use the traditional Iris dataset with Kedro’s DataCatalog, which is a core component of Kedro. This starter is of use in the exploratory phase of a project. For more information, please read the Mini-Kedro guide.Alias
pyspark-iris
: An alternative Kedro Iris dataset example, using PySparkAlias
pyspark
: The configuration and initialisation code for a Kedro pipeline using PySparkAlias
spaceflights
: The spaceflights tutorial example code
Each starter project encodes our recommended Kedro best practices.
Starter versioning¶
By default, Kedro will use the latest version available in the repository, but if you want to use a specific version of a starter, you can pass a --checkout
argument to the command as follows:
kedro new --starter=pyspark --checkout=0.1.0
The --checkout
value points to a branch, tag or commit in the starter repository.
Under the hood, the value will be passed to the --checkout
flag in Cookiecutter.
Use a starter in interactive mode¶
By default, when you create a new project using a starter, kedro new
launches by asking a few questions. You will be prompted to provide the following variables:
project_name
- A human readable name for your new projectrepo_name
- A name for the directory that holds your project repositorypython_package
- A Python package name for your project package (see Python package naming conventions)
This mode assumes that the starter doesn’t require any additional configuration variables.
Use a starter with a configuration file¶
Kedro also allows you to specify a configuration file to create a project. Use the --config
flag alongside the starter as follows:
kedro new --config=my_kedro_pyspark_project.yml --starter=pyspark
This option is useful when the starter requires more configuration than is required by the interactive mode.