Kedro Logo
0.18

Introduction

  • What is Kedro?
    • Learn how to use Kedro
    • Assumptions

Get started

  • Installation prerequisites
    • Virtual environments
      • conda
      • venv (instead of conda)
      • pipenv (instead of conda)
  • Install Kedro
    • Verify a successful installation
    • Install a development version
  • A “Hello World” example
    • Node
    • Pipeline
    • DataCatalog
    • Runner
    • Hello Kedro!

Make a project

  • Create a new project
    • Create a new project interactively
    • Create a new project from a configuration file
    • Initialise a git repository
  • Iris dataset example project
    • Create the example project
      • Project directory structure
        • conf/
        • data
        • src
      • What best practice should I follow to avoid leaking confidential data?
    • Run the example project
    • Under the hood: Pipelines and nodes
  • Kedro starters
    • How to use Kedro starters
      • Starter aliases
    • List of official starters
    • Starter versioning
    • Use a starter in interactive mode
    • Use a starter with a configuration file
  • Standalone use of the DataCatalog
    • Introduction
    • Usage
    • Content
    • Create a full Kedro project

Tutorial

  • Kedro spaceflights tutorial
    • Kedro project development workflow
      • 1. Set up the project template
      • 2. Set up the data
      • 3. Create the pipeline
      • 4. Package the project
    • Optional: Git workflow
      • Create a project repository
      • Submit your changes to GitHub
  • Set up the spaceflights project
    • Create a new project
    • Install dependencies
    • Configure the project
  • Set up the data
    • Add your datasets to data
      • reviews.csv
      • companies.csv
      • shuttles.xlsx
    • Register the datasets
      • csv
      • xlsx
    • Custom data
  • Create a pipeline
    • Data processing pipeline
      • Generate a new pipeline template
      • Add node functions
      • Assemble nodes into the data processing pipeline
      • Update the project pipeline
      • Test the example
      • Visualise the pipeline
      • Persist pre-processed data
      • Extend the data processing pipeline
      • Persist the model input table
      • Test the example
      • Use kedro viz --autoreload
    • Data science pipeline
      • Create the data science pipeline
      • Configure the input parameters
      • Register the dataset
      • Assemble the data science pipeline
      • Update the project pipeline
      • Test the pipelines
    • Kedro runners
    • Slice a pipeline
  • Visualise pipelines
    • Install Kedro-Viz
    • Visualise a whole pipeline
    • Exit an open visualisation
    • Visualise layers
    • Share a pipeline
    • Visualise Plotly charts in Kedro-Viz
  • Namespace pipelines
    • Adding a namespace to the data_processing pipeline
      • Why do we need to provide explicit inputs and outputs?
    • Adding namespaces to the data_science pipeline
      • Let’s explain what’s going on here
    • Nesting modular pipelines
  • Set up experiment tracking
    • Set up a project
    • Set up the session store
    • Set up tracking datasets
    • Set up your nodes and pipelines to log metrics
    • Generate the Run data
    • Access run data and compare runs
  • Package a project
    • Add documentation to your project
    • Package your project
      • Docker, Airflow and Deployment

Kedro project setup

  • Dependencies
    • Project-specific dependencies
    • Install project-specific dependencies
    • Workflow dependencies
      • Install dependencies related to the Data Catalog
        • Install dependencies at a group-level
        • Install dependencies at a type-level
  • Configuration
    • Configuration root
    • Local and base configuration environments
    • Additional configuration environments
    • Template configuration
      • Jinja2 support
    • Parameters
      • Load parameters
      • Specify parameters at runtime
      • Use parameters
    • Credentials
      • AWS credentials
    • Configure kedro run arguments
  • Lifecycle management with KedroSession
    • Overview
    • Create a session
  • Project settings

Data Catalog

  • The Data Catalog
    • Using the Data Catalog within Kedro configuration
    • Specifying the location of the dataset
    • Data Catalog *_args parameters
    • Using the Data Catalog with the YAML API
    • Creating a Data Catalog YAML configuration file via CLI
    • Adding parameters
    • Feeding in credentials
    • Loading multiple datasets that have similar configuration
    • Transcoding datasets
      • A typical example of transcoding
      • How does transcoding work?
    • Versioning datasets and ML models
    • Using the Data Catalog with the Code API
      • Configuring a Data Catalog
      • Loading datasets
        • Behind the scenes
      • Viewing the available data sources
      • Saving data
        • Saving data to memory
        • Saving data to a SQL database for querying
        • Saving data in Parquet
  • Kedro IO
    • Error handling
    • AbstractDataSet
    • Versioning
      • version namedtuple
      • Versioning using the YAML API
      • Versioning using the Code API
      • Supported datasets
    • Partitioned dataset
      • Partitioned dataset definition
        • Dataset definition
        • Partitioned dataset credentials
      • Partitioned dataset load
      • Partitioned dataset save
      • Incremental loads with IncrementalDataSet
        • Incremental dataset load
        • Incremental dataset save
        • Incremental dataset confirm
        • Checkpoint configuration
        • Special checkpoint config keys

Nodes and pipelines

  • Nodes
    • How to create a node
      • Node definition syntax
      • Syntax for input variables
      • Syntax for output variables
    • **kwargs-only node functions
    • How to tag a node
    • How to run a node
  • Pipelines
    • How to build a pipeline
      • How to tag a pipeline
      • How to merge multiple pipelines
      • Information about the nodes in a pipeline
      • Information about pipeline inputs and outputs
    • Bad pipelines
      • Pipeline with bad nodes
      • Pipeline with circular dependencies
  • Modular pipelines
    • What are modular pipelines?
      • Key concepts
    • How do I create a modular pipeline?
      • What does the kedro pipeline create do?
      • Ensuring portability
      • Providing modular pipeline specific dependencies
    • Using the modular pipeline() wrapper to provide overrides
    • Combining disconnected pipelines
    • Using a modular pipeline multiple times
    • How to use a modular pipeline with different parameters
  • Micro-packaging
    • Package a micro-package
    • Package multiple micro-packages
    • Pull a micro-package
      • Providing fsspec arguments
    • Pull multiple micro-packages
  • Run a pipeline
    • Runners
      • SequentialRunner
      • ParallelRunner
        • Multiprocessing
        • Multithreading
    • Custom runners
    • Load and save asynchronously
    • Run a pipeline by name
    • Run pipelines with IO
    • Output to a file
  • Slice a pipeline
    • Slice a pipeline by providing inputs
    • Slice a pipeline by specifying nodes
    • Slice a pipeline by specifying final nodes
    • Slice a pipeline with tagged nodes
    • Slice a pipeline by running specified nodes
    • How to recreate missing outputs

Extend Kedro

  • Common use cases
    • Use Case 1: How to add extra behaviour to Kedro’s execution timeline
    • Use Case 2: How to integrate Kedro with additional data sources
    • Use Case 3: How to add or modify CLI commands
    • Use Case 4: How to customise the initial boilerplate of your project
  • Hooks
    • Introduction
    • Concepts
      • Hook specification
        • CLI hooks
      • Hook implementation
        • Registering your Hook implementations with Kedro
        • Disable auto-registered plugins’ Hooks
    • Common use cases
      • Use Hooks to extend a node’s behaviour
      • Use Hooks to customise the dataset load and save methods
    • Under the hood
    • Hooks examples
      • Add memory consumption tracking
      • Add data validation
      • Add observability to your pipeline
      • Add metrics tracking to your model
      • Modify node inputs using before_node_run hook
  • Custom datasets
    • Scenario
    • Project setup
    • The anatomy of a dataset
    • Implement the _load method with fsspec
    • Implement the _save method with fsspec
    • Implement the _describe method
    • The complete example
    • Integration with PartitionedDataSet
    • Versioning
    • Thread-safety
    • How to handle credentials and different filesystems
    • How to contribute a custom dataset implementation
  • Kedro plugins
    • Overview
    • Example of a simple plugin
    • Working with click
    • Project context
    • Initialisation
    • global and project commands
    • Suggested command convention
    • Hooks
    • CLI Hooks
    • Contributing process
    • Supported Kedro plugins
    • Community-developed plugins
  • Create a Kedro starter
    • How to create a Kedro starter
    • Configuration variables
      • Example Kedro starter

Logging

  • Logging
    • Configure logging
    • Use logging
    • Logging for anyconfig
  • Experiment tracking
    • Enable experiment tracking
    • Community solutions

Development

  • Set up Visual Studio Code
    • Advanced: For those using venv / virtualenv
    • Setting up tasks
    • Debugging
      • Advanced: Remote Interpreter / Debugging
    • Configuring the Kedro catalog validation schema
  • Set up PyCharm
    • Set up Run configurations
    • Debugging
    • Advanced: Remote SSH interpreter
    • Advanced: Docker interpreter
    • Configure Python Console
    • Configuring the Kedro catalog validation schema
  • Kedro’s command line interface
    • Autocompletion (optional)
    • Invoke Kedro CLI from Python (optional)
    • Kedro commands
    • Global Kedro commands
      • Get help on Kedro commands
      • Confirm the Kedro version
      • Confirm Kedro information
      • Create a new Kedro project
      • Open the Kedro documentation in your browser
    • Project-specific Kedro commands
      • Project setup
        • Build the project’s dependency tree
        • Install all package dependencies
      • Run the project
        • Modifying a kedro run
      • Deploy the project
      • Pull a micro-package
      • Project quality
        • Build the project documentation
        • Lint your project
        • Test your project
      • Project development
        • Modular pipelines
        • Registered pipelines
        • Datasets
        • Data Catalog
        • Notebooks
  • Debugging
    • Introduction
    • Debugging Node
    • Debugging Pipeline

Deployment

  • Deployment guide
    • Deployment choices
  • Single-machine deployment
    • Container based
      • How to use container registry
    • Package based
    • CLI based
      • Use GitHub workflow to copy your project
      • Install and run the Kedro project
  • Distributed deployment
    • 1. Containerise the pipeline
    • 2. Convert your Kedro pipeline into targeted platform’s primitives
    • 3. Parameterise the runs
    • 4. (Optional) Create starters
  • Deployment with Argo Workflows
    • Why would you use Argo Workflows?
    • Prerequisites
    • How to run your Kedro pipeline using Argo Workflows
      • Containerise your Kedro project
      • Create Argo Workflows spec
      • Submit Argo Workflows spec to Kubernetes
      • Kedro-Argo plugin
  • Deployment with Prefect
    • Prerequisites
    • How to run your Kedro pipeline using Prefect
      • Convert your Kedro pipeline to Prefect flow
      • Run Prefect flow
  • Deployment with Kubeflow Pipelines
    • Why would you use Kubeflow Pipelines?
    • Prerequisites
    • How to run your Kedro pipeline using Kubeflow Pipelines
      • Containerise your Kedro project
      • Create a workflow spec
      • Authenticate Kubeflow Pipelines
      • Upload workflow spec and execute runs
  • Deployment with AWS Batch
    • Why would you use AWS Batch?
    • Prerequisites
    • How to run a Kedro pipeline using AWS Batch
      • Containerise your Kedro project
      • Provision resources
        • Create IAM Role
        • Create AWS Batch job definition
        • Create AWS Batch compute environment
        • Create AWS Batch job queue
      • Configure the credentials
      • Submit AWS Batch jobs
        • Create a custom runner
        • Set up Batch-related configuration
        • Update CLI implementation
      • Deploy
  • Deployment to a Databricks cluster
    • Prerequisites
    • Run the Kedro project with Databricks Connect
      • 1. Project setup
      • 2. Install dependencies and run locally
      • 3. Create a Databricks cluster
      • 4. Install Databricks Connect
      • 5. Configure Databricks Connect
      • 6. Copy local data into DBFS
      • 7. Run the project
    • Run Kedro project from a Databricks notebook
      • Extra requirements
      • 1. Create Kedro project
      • 2. Create GitHub personal access token
      • 3. Create a GitHub repository
      • 4. Push Kedro project to the GitHub repository
      • 5. Configure the Databricks cluster
      • 6. Run your Kedro project from the Databricks notebook
  • How to integrate Amazon SageMaker into your Kedro pipeline
    • Why would you use Amazon SageMaker?
    • Prerequisites
    • Prepare the environment
      • Install SageMaker package dependencies
      • Create SageMaker execution role
      • Create S3 bucket
    • Update the Kedro project
      • Create the configuration environment
      • Update the project hooks
      • Update the data science pipeline
        • Create node functions
        • Update the pipeline definition
      • Create the SageMaker entry point
    • Run the project
    • Cleanup
  • How to deploy your Kedro pipeline with AWS Step Functions
    • Why would you run a Kedro pipeline with AWS Step Functions
    • Strategy
    • Prerequisites
    • Deployment process
      • Step 1. Create new configuration environment to prepare a compatible DataCatalog
      • Step 2. Package the Kedro pipeline as an AWS Lambda-compliant Docker image
      • Step 3. Write the deployment script
      • Step 4. Deploy the pipeline
    • Limitations
    • Final thought
  • How to deploy your Kedro pipeline on Apache Airflow with Astronomer
    • Strategy
    • Prerequisites
    • Project Setup
    • Deployment process
      • Step 1. Create new configuration environment to prepare a compatible DataCatalog
      • Step 2. Package the Kedro pipeline as an Astronomer-compliant Docker image
      • Step 3. Convert the Kedro pipeline into an Airflow DAG with kedro airflow
      • Step 4. Launch the local Airflow cluster with Astronomer
    • Final thought
  • Deployment to a Dask cluster
    • Why would you use Dask?
    • Prerequisites
    • How to distribute your Kedro pipeline using Dask
      • Create a custom runner
      • Update CLI implementation
      • Deploy
        • Set up Dask and related configuration

Tools integration

  • Build a Kedro pipeline with PySpark
    • Centralise Spark configuration in conf/base/spark.yml
    • Initialise a SparkSession in custom project context class
    • Use Kedro’s built-in Spark datasets to load and save raw data
      • spark.DeltaTableDataSet
      • spark.SparkDataSet
      • spark.SparkJDBCDataSet
      • spark.SparkHiveDataSet
    • Spark and Delta Lake interaction
    • Use MemoryDataSet for intermediary DataFrame
    • Use MemoryDataSet with copy_mode="assign" for non-DataFrame Spark objects
    • Tips for maximising concurrency using ThreadRunner
  • Use Kedro with IPython and Jupyter
    • Why use a Notebook?
    • Kedro IPython extension
      • Managed Jupyter instances
    • Kedro variables: catalog, context, pipelines and session
      • catalog
      • context
      • pipelines
      • session
    • Kedro and Jupyter
      • Manage Jupyter kernels
      • Use an alternative Jupyter client
      • Convert functions from Jupyter Notebooks into Kedro nodes
      • Kedro-Viz line magic

FAQs

  • Frequently asked questions
    • What is Kedro?
    • Who maintains Kedro?
    • What are the primary advantages of Kedro?
    • How does Kedro compare to other projects?
    • What is data engineering convention?
    • How do I upgrade Kedro?
    • How can I use a development version of Kedro?
    • How can I find out more about Kedro?
    • How can I cite Kedro?
    • How can I get my question answered?
  • Kedro architecture overview
    • Kedro project
    • Kedro starter
    • Kedro library
    • Kedro framework
    • Kedro extension
  • Kedro Principles
    • 1. Modularity at the core ️📦
    • 2. Grow beginners into experts 🌱
    • 3. User empathy without unfounded assumptions 🤝
    • 4. Simplicity means bare necessities 🍞
    • 5. There should be one obvious way of doing things 🎯
    • 6. A sprinkle of magic is better than a spoonful of it ✨
    • 7. Lean process and lean product 👟

Resources

  • Images and icons
    • White background
      • Icon
      • Icon with text
    • Black background
      • Icon
      • Icon with text
  • Kedro glossary
    • Data Catalog
    • Data engineering vs Data science
    • Kedro
    • KedroContext
    • KedroSession
    • Kedro-Viz
    • Layers (data engineering convention)
    • Modular pipeline
    • Node
    • Node execution order
    • Pipeline
    • Pipeline slicing
    • Runner
    • Starters
    • Tags
    • Workflow dependencies

Contribute to Kedro

  • Introduction
  • Guidelines for contributing developers
    • Introduction
    • Before you start: development set up
    • Get started: areas of contribution
      • core contribution process
      • extras contribution process
    • Create a pull request
      • Hints on pre-commit usage
      • Developer Certificate of Origin
    • Need help?
      • First timers only
      • How to contribute to an open source project on GitHub
  • Backwards compatibility & breaking changes
    • When should I make a breaking change?
    • The Kedro release model
  • Contribute to the Kedro documentation
    • How do I rebuild the documentation after I make changes to it?
      • Set up to build Kedro documentation
      • Build the documentation
    • Extend Kedro documentation
      • Add new pages
      • Move or remove pages
      • Create a pull request
      • Help!
    • Kedro documentation style guide
      • Language
      • Formatting
      • Links
      • Capitalisation
      • Bullets
      • Notes
      • Kedro lexicon
      • Style
  • Join the Technical Steering Committee
    • Responsibilities of a maintainer
      • Product development
      • Community management
    • Requirements to become a maintainer
    • Application process
    • Voting process
      • Other issues or proposals
      • Adding or removing maintainers

API documentation

  • kedro
    • kedro.config
      • kedro.config.ConfigLoader
      • kedro.config.TemplatedConfigLoader
      • kedro.config.MissingConfigException
    • kedro.extras
      • kedro.extras.datasets
        • kedro.extras.datasets.api.APIDataSet
        • kedro.extras.datasets.biosequence.BioSequenceDataSet
        • kedro.extras.datasets.dask.ParquetDataSet
        • kedro.extras.datasets.email.EmailMessageDataSet
        • kedro.extras.datasets.geopandas.GeoJSONDataSet
        • kedro.extras.datasets.holoviews.HoloviewsWriter
        • kedro.extras.datasets.json.JSONDataSet
        • kedro.extras.datasets.matplotlib.MatplotlibWriter
        • kedro.extras.datasets.networkx.GMLDataSet
        • kedro.extras.datasets.networkx.GraphMLDataSet
        • kedro.extras.datasets.networkx.JSONDataSet
        • kedro.extras.datasets.pandas.CSVDataSet
        • kedro.extras.datasets.pandas.ExcelDataSet
        • kedro.extras.datasets.pandas.FeatherDataSet
        • kedro.extras.datasets.pandas.GBQQueryDataSet
        • kedro.extras.datasets.pandas.GBQTableDataSet
        • kedro.extras.datasets.pandas.GenericDataSet
        • kedro.extras.datasets.pandas.HDFDataSet
        • kedro.extras.datasets.pandas.JSONDataSet
        • kedro.extras.datasets.pandas.ParquetDataSet
        • kedro.extras.datasets.pandas.SQLQueryDataSet
        • kedro.extras.datasets.pandas.SQLTableDataSet
        • kedro.extras.datasets.pandas.XMLDataSet
        • kedro.extras.datasets.pickle.PickleDataSet
        • kedro.extras.datasets.pillow.ImageDataSet
        • kedro.extras.datasets.plotly.JSONDataSet
        • kedro.extras.datasets.plotly.PlotlyDataSet
        • kedro.extras.datasets.redis.PickleDataSet
        • kedro.extras.datasets.spark.DeltaTableDataSet
        • kedro.extras.datasets.spark.SparkDataSet
        • kedro.extras.datasets.spark.SparkHiveDataSet
        • kedro.extras.datasets.spark.SparkJDBCDataSet
        • kedro.extras.datasets.tensorflow.TensorFlowModelDataset
        • kedro.extras.datasets.text.TextDataSet
        • kedro.extras.datasets.tracking.JSONDataSet
        • kedro.extras.datasets.tracking.MetricsDataSet
        • kedro.extras.datasets.yaml.YAMLDataSet
      • kedro.extras.extensions
        • kedro.extras.extensions.ipython
      • kedro.extras.logging
        • kedro.extras.logging.color_logger
    • kedro.framework
      • kedro.framework.cli
        • kedro.framework.cli.catalog
        • kedro.framework.cli.cli
        • kedro.framework.cli.hooks
        • kedro.framework.cli.jupyter
        • kedro.framework.cli.micropkg
        • kedro.framework.cli.pipeline
        • kedro.framework.cli.project
        • kedro.framework.cli.registry
        • kedro.framework.cli.starters
        • kedro.framework.cli.utils
      • kedro.framework.context
        • kedro.framework.context.KedroContext
        • kedro.framework.context.KedroContextError
      • kedro.framework.hooks
        • kedro.framework.hooks.manager
        • kedro.framework.hooks.markers
        • kedro.framework.hooks.specs
      • kedro.framework.project
        • kedro.framework.project.configure_logging
        • kedro.framework.project.configure_project
        • kedro.framework.project.validate_settings
      • kedro.framework.session
        • kedro.framework.session.session
        • kedro.framework.session.store
      • kedro.framework.startup
        • kedro.framework.startup.bootstrap_project
        • kedro.framework.startup.ProjectMetadata
    • kedro.io
      • kedro.io.AbstractDataSet
      • kedro.io.AbstractVersionedDataSet
      • kedro.io.DataCatalog
      • kedro.io.LambdaDataSet
      • kedro.io.MemoryDataSet
      • kedro.io.PartitionedDataSet
      • kedro.io.IncrementalDataSet
      • kedro.io.CachedDataSet
      • kedro.io.Version
      • kedro.io.DataSetAlreadyExistsError
      • kedro.io.DataSetError
      • kedro.io.DataSetNotFoundError
    • kedro.pipeline
      • kedro.pipeline.node
      • kedro.pipeline.modular_pipeline.pipeline
      • kedro.pipeline.Pipeline
      • kedro.pipeline.node.Node
      • kedro.pipeline.modular_pipeline.ModularPipelineError
    • kedro.runner
      • kedro.runner.run_node
      • kedro.runner.AbstractRunner
      • kedro.runner.ParallelRunner
      • kedro.runner.SequentialRunner
      • kedro.runner.ThreadRunner
    • kedro.utils
      • kedro.utils.load_obj
Kedro
  • Docs »
  • Search
  • Edit on GitHub


Built with Sphinx using a theme provided by Read the Docs.