# Cog: Containers for machine learning

Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container.

You can deploy your packaged model to your own infrastructure, or to [Replicate](https://replicate.com/).

## Highlights

- 📦 **Docker containers without the pain.** Writing your own `Dockerfile` can be a bewildering process. With Cog, you define your environment with a [simple configuration file](#how-it-works) and it generates a Docker image with all the best practices: Nvidia base images, efficient caching of dependencies, installing specific Python versions, sensible environment variable defaults, and so on.

- 🤬️ **No more CUDA hell.** Cog knows which CUDA/cuDNN/PyTorch/Tensorflow/Python combos are compatible and will set it all up correctly for you.

- ✅ **Define the inputs and outputs for your model with standard Python.** Then, Cog generates an OpenAPI schema and validates the inputs and outputs.

- 🎁 **Automatic HTTP inference server**: Your model's types are used to dynamically generate a RESTful HTTP API using a high-performance Rust/Axum server.

- 🚀 **Ready for production.** Deploy your model anywhere that Docker images run. Your own infrastructure, or [Replicate](https://replicate.com).

## How it works

Define the Docker environment your model runs in with `cog.yaml`:

```yaml
build:
  gpu: true
  system_packages:
    - "libgl1"
    - "libglib2.0-0"
  python_version: "3.13"
  python_requirements: requirements.txt
run: "run.py:Runner"
```

Define how your model runs with `run.py`:

```python
from cog import BaseRunner, Input, Path
import torch

class Runner(BaseRunner):
    def setup(self):
        """Load the model into memory to make running multiple inferences efficient"""
        self.model = torch.load("./weights.pth")

    # The arguments and types the model takes as input
    def run(self,
          image: Path = Input(description="Grayscale input image")
    ) -> Path:
        """Run the model"""
        processed_image = preprocess(image)
        output = self.model(processed_image)
        return postprocess(output)
```

In the above we accept a path to the image as an input, and return a path to our transformed image after running it through our model.

Now, you can run the model:

```console
$ cog run -i image=@input.jpg
--> Building Docker image...
--> Running...
--> Output written to output.jpg
```

Or, build a Docker image for deployment:

```console
$ cog build -t my-classification-model
--> Building Docker image...
--> Built my-classification-model:latest

$ docker run -d -p 5000:5000 --gpus all my-classification-model

$ curl http://localhost:5000/predictions -X POST \
    -H 'Content-Type: application/json' \
    -d '{"input": {"image": "https://.../input.jpg"}}'
```

Or, combine build and run via the `serve` command:

```console
$ cog serve -p 8080

$ curl http://localhost:8080/predictions -X POST \
    -H 'Content-Type: application/json' \
    -d '{"input": {"image": "https://.../input.jpg"}}'
```

<!-- NOTE (bfirsh): Development environment instructions intentionally left out of readme for now, so as not to confuse the "ship a model to production" message.

In development, you can also run arbitrary commands inside the Docker environment:

```console
$ cog exec python train.py
...
```

Or, [spin up a Jupyter notebook](docs/notebooks.md):

```console
$ cog exec -p 8888 jupyter notebook --allow-root --ip=0.0.0.0
```
-->

## Why are we building this?

It's really hard for researchers to ship machine learning models to production.

Part of the solution is Docker, but it is so complex to get it to work: Dockerfiles, pre-/post-processing, Flask servers, CUDA versions. More often than not the researcher has to sit down with an engineer to get the damn thing deployed.

[Andreas](https://github.com/andreasjansson) and [Ben](https://github.com/bfirsh) created Cog. Andreas used to work at Spotify, where he built tools for building and deploying ML models with Docker. Ben worked at Docker, where he created [Docker Compose](https://github.com/docker/compose).

We realized that, in addition to Spotify, other companies were also using Docker to build and deploy machine learning models. [Uber](https://eng.uber.com/michelangelo-pyml/) and others have built similar systems. So, we're making an open source version so other people can do this too.

Hit us up if you're interested in using it or want to collaborate with us. [We're on Discord](https://discord.gg/replicate) or email us at [team@replicate.com](mailto:team@replicate.com).

## Prerequisites

- **macOS, Linux or Windows 11**. Cog works on macOS, Linux and Windows 11 with [WSL 2](docs/wsl2/wsl2.md)
- **Docker**. Cog uses Docker to create a container for your model. You'll need to [install Docker](https://docs.docker.com/get-docker/) before you can run Cog. If you install Docker Engine instead of Docker Desktop, you will need to [install Buildx](https://docs.docker.com/build/architecture/#buildx) as well.

## Install

If you're using macOS, you can install Cog using Homebrew:

```console
brew install replicate/tap/cog
```

You can also download and install the latest release using our
[install script](https://cog.run/install):

```sh
# bash, zsh, and other shells
sh <(curl -fsSL https://cog.run/install.sh)

# fish shell
sh (curl -fsSL https://cog.run/install.sh | psub)

# download with wget and run in a separate command
wget -qO- https://cog.run/install.sh
sh ./install.sh
```

You can manually install the latest release of Cog directly from GitHub
by running the following commands in a terminal:

```console
sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog
```

Or if you are on docker:

```
RUN sh -c "INSTALL_DIR=\"/usr/local/bin\" SUDO=\"\" $(curl -fsSL https://cog.run/install.sh)"
```

## Upgrade

If you're using macOS and you previously installed Cog with Homebrew, run the following:

```console
brew upgrade replicate/tap/cog
```

Otherwise, you can upgrade to the latest version by running the same commands you used to install it.

## Development

See [CONTRIBUTING.md](CONTRIBUTING.md) for how to set up a development environment and build from source.

## Next steps

- [Get started with an example model](docs/getting-started.md)
- [Get started with your own model](docs/getting-started-own-model.md)
- [Using Cog with notebooks](docs/notebooks.md)
- [Using Cog with Windows 11](docs/wsl2/wsl2.md)
- [Take a look at some examples of using Cog](https://github.com/replicate/cog-examples)
- [Deploy models with Cog](docs/deploy.md)
- [`cog.yaml` reference](docs/yaml.md) to learn how to define your model's environment
- [Run interface reference](docs/python.md) to learn how the `Runner` interface works
- [Training interface reference](docs/training.md) to learn how to add a fine-tuning API to your model
- [HTTP API reference](docs/http.md) to learn how to use the HTTP API that models serve

## Need help?

[Join us in #cog on Discord.](https://discord.gg/replicate)

[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/replicate/cog)



---

# CLI reference

<!-- This file is auto-generated. Do not edit manually. -->

## `cog`

Containers for machine learning.

To get started, take a look at the documentation:
https://github.com/replicate/cog

**Examples**

```
   To execute a command inside a Docker environment defined with Cog:
      $ cog exec echo hello world
```

**Options**

```
      --debug      Show debugging output
  -h, --help       help for cog
      --no-color   Disable colored output
      --version    Show version of Cog
```

## `cog build`

Build a Docker image from the cog.yaml in the current directory.

The generated image contains your model code, dependencies, and the Cog
runtime. It can be run locally with 'cog run' or pushed to a registry
with 'cog push'.

```
cog build [flags]
```

**Examples**

```
  # Build with default settings
  cog build

  # Build and tag the image
  cog build -t my-model:latest

  # Build without using the cache
  cog build --no-cache

  # Build with model weights in a separate layer
  cog build --separate-weights -t my-model:v1
```

**Options**

```
  -f, --file string                  The name of the config file. (default "cog.yaml")
  -h, --help                         help for build
      --no-cache                     Do not use cache when building the image
      --openapi-schema string        Load OpenAPI schema from a file
      --progress string              Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
      --secret stringArray           Secrets to pass to the build environment in the form 'id=foo,src=/path/to/file'
      --separate-weights             Separate model weights from code in image layers
  -t, --tag string                   A name for the built image in the form 'repository:tag'
      --use-cog-base-image           Use pre-built Cog base image for faster cold boots (default true)
      --use-cuda-base-image string   Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")
```

## `cog doctor`

Diagnose and fix common issues in your Cog project.

NOTE: cog doctor is experimental. Behavior and checks may change in future versions.

By default, cog doctor reports problems without modifying any files.
Pass --fix to automatically apply safe fixes.

```
cog doctor [flags]
```

**Options**

```
  -f, --file string   The name of the config file. (default "cog.yaml")
      --fix           Automatically apply fixes
  -h, --help          help for doctor
```

## `cog exec`

Execute a command inside a Docker environment defined by cog.yaml.

Cog builds a temporary image from your cog.yaml configuration and runs the
given command inside it. This is useful for debugging, running scripts, or
exploring the environment your model will run in.

```
cog exec <command> [arg...] [flags]
```

**Examples**

```
  # Open a Python interpreter inside the model environment
  cog exec python

  # Run a script
  cog exec python train.py

  # Run with environment variables
  cog exec -e HUGGING_FACE_HUB_TOKEN=abc123 python download.py

  # Expose a port (e.g. for Jupyter)
  cog exec -p 8888 jupyter notebook
```

**Options**

```
  -e, --env stringArray              Environment variables, in the form name=value
  -f, --file string                  The name of the config file. (default "cog.yaml")
      --gpus docker run --gpus       GPU devices to add to the container, in the same format as docker run --gpus.
  -h, --help                         help for exec
      --progress string              Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
  -p, --publish stringArray          Publish a container's port to the host, e.g. -p 8000
      --use-cog-base-image           Use pre-built Cog base image for faster cold boots (default true)
      --use-cuda-base-image string   Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")
```

## `cog init`

Create a cog.yaml and run.py in the current directory.

These files provide a starting template for defining your model's environment
and run interface. Edit them to match your model's requirements.

```
cog init [flags]
```

**Examples**

```
  # Set up a new Cog project in the current directory
  cog init
```

**Options**

```
  -h, --help   help for init
```

## `cog login`

Log in to a container registry.

For Replicate's registry (r8.im), this command handles authentication
through Replicate's token-based flow.

For other registries, this command prompts for username and password,
then stores credentials using Docker's credential system.

```
cog login [flags]
```

**Options**

```
  -h, --help          help for login
      --token-stdin   Pass login token on stdin instead of opening a browser. You can find your Replicate login token at https://replicate.com/auth/token
```

## `cog push`

Build a Docker image from cog.yaml and push it to a container registry.

Cog can push to any OCI-compliant registry. When pushing to Replicate's
registry (r8.im), run 'cog login' first to authenticate.

```
cog push [IMAGE] [flags]
```

**Examples**

```
  # Push to Replicate
  cog push r8.im/your-username/my-model

  # Push to any OCI registry
  cog push registry.example.com/your-username/model-name

  # Push with model weights in a separate layer (Replicate only)
  cog push r8.im/your-username/my-model --separate-weights
```

**Options**

```
  -f, --file string                  The name of the config file. (default "cog.yaml")
  -h, --help                         help for push
      --no-cache                     Do not use cache when building the image
      --openapi-schema string        Load OpenAPI schema from a file
      --progress string              Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
      --secret stringArray           Secrets to pass to the build environment in the form 'id=foo,src=/path/to/file'
      --separate-weights             Separate model weights from code in image layers
      --use-cog-base-image           Use pre-built Cog base image for faster cold boots (default true)
      --use-cuda-base-image string   Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")
```

## `cog run`

Run the model.

If 'image' is passed, it will run the model on that Docker image.
It must be an image that has been built by Cog.

Otherwise, it will build the model in the current directory and run
it.

```
cog run [image] [flags]
```

**Examples**

```
  # Run the model with named inputs
  cog run -i prompt="a photo of a cat"

  # Pass a file as input
  cog run -i image=@photo.jpg

  # Save output to a file
  cog run -i image=@input.jpg -o output.png

  # Pass multiple inputs
  cog run -i prompt="sunset" -i width=1024 -i height=768

  # Run against a pre-built image
  cog run r8.im/your-username/my-model -i prompt="hello"

  # Pass inputs as JSON
  echo '{"prompt": "a cat"}' | cog run --json @-
```

**Options**

```
  -e, --env stringArray              Environment variables, in the form name=value
  -f, --file string                  The name of the config file. (default "cog.yaml")
      --gpus docker run --gpus       GPU devices to add to the container, in the same format as docker run --gpus.
  -h, --help                         help for run
  -i, --input stringArray            Inputs, in the form name=value. if value is prefixed with @, then it is read from a file on disk. E.g. -i path=@image.jpg
      --json string                  Pass inputs as JSON object, read from file (@inputs.json) or via stdin (@-)
  -o, --output string                Output path
      --progress string              Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
      --setup-timeout uint32         The timeout for a container to setup (in seconds). (default 300)
      --use-cog-base-image           Use pre-built Cog base image for faster cold boots (default true)
      --use-cuda-base-image string   Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")
      --use-replicate-token          Pass REPLICATE_API_TOKEN from local environment into the model context
```

## `cog serve`

Run an HTTP server.

Builds the model and starts an HTTP server that exposes the model's inputs
and outputs as a REST API. Compatible with the Cog HTTP protocol.

```
cog serve [flags]
```

**Examples**

```
  # Start the server on the default port (8393)
  cog serve

  # Start on a custom port
  cog serve -p 5000

  # Test the server
  curl http://localhost:8393/predictions \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{"input": {"prompt": "a cat"}}'
```

**Options**

```
  -f, --file string                  The name of the config file. (default "cog.yaml")
      --gpus docker run --gpus       GPU devices to add to the container, in the same format as docker run --gpus.
  -h, --help                         help for serve
  -p, --port int                     Port on which to listen (default 8393)
      --progress string              Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
      --upload-url string            Upload URL for file outputs (e.g. https://example.com/upload/)
      --use-cog-base-image           Use pre-built Cog base image for faster cold boots (default true)
      --use-cuda-base-image string   Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")
```


---

# Deploy models with Cog

Cog containers are Docker containers that serve an HTTP server
for running your model.
You can deploy them anywhere that Docker containers run.

The server inside Cog containers is **coglet**, a Rust-based inference server
that handles HTTP requests, worker process management, and run execution.

This guide assumes you have a model packaged with Cog.
If you don't, [follow our getting started guide](getting-started-own-model.md),
or use [an example model](https://github.com/replicate/cog-examples).

## Getting started

First, build your model:

```console
cog build -t my-model
```

You can serve your model locally with `cog serve`:

```console
cog serve
# or, from a built image:
cog serve my-model
```

Alternatively, start the Docker container directly:

```shell
# If your model uses a CPU:
docker run -d -p 5001:5000 my-model

# If your model uses a GPU:
docker run -d -p 5001:5000 --gpus all my-model
```

The server listens on port 5000 inside the container (mapped to 5001 above).

To view the OpenAPI schema,
open [localhost:5001/openapi.json](http://localhost:5001/openapi.json)
in your browser
or use cURL to make a request:

```console
curl http://localhost:5001/openapi.json
```

To stop the server, run:

```console
docker kill my-model
```

To run the model,
call the `/predictions` endpoint,
passing input in the format expected by your model:

```console
curl http://localhost:5001/predictions -X POST \
    --header "Content-Type: application/json" \
    --data '{"input": {"image": "https://.../input.jpg"}}'
```

For more details about the HTTP API,
see the [HTTP API reference documentation](http.md).

## Health checks

The server exposes a `GET /health-check` endpoint that returns the current status of the model container. Use this for readiness probes in orchestration systems like Kubernetes.

```console
curl http://localhost:5001/health-check
```

The response includes a `status` field with values like `STARTING`, `READY`, `BUSY`, `SETUP_FAILED`, or `DEFUNCT`. See the [HTTP API reference](http.md#get-health-check) for full details.

## Concurrency

By default, the server processes one run at a time. To enable concurrent runs, set the `concurrency.max` option in `cog.yaml`:

```yaml
concurrency:
  max: 4
```

See the [`cog.yaml` reference](yaml.md#concurrency) for more details.

## Environment variables

You can configure runtime behavior with environment variables:

- `COG_SETUP_TIMEOUT`: Maximum time in seconds for the `setup()` method (default: no timeout).

See the [environment variables reference](environment.md) for the full list.


---

# Environment variables

This reference lists the public Cog-specific environment variables that change how Cog behaves.

## Build-time variables

### `COG_SDK_WHEEL`

Controls which Cog Python SDK wheel is installed in the Docker image during `cog build`. Takes precedence over `build.sdk_version` in `cog.yaml`.

**Supported values:**

| Value                | Description                                          |
| -------------------- | ---------------------------------------------------- |
| `pypi`               | Install latest version from PyPI                     |
| `pypi:0.12.0`        | Install specific version from PyPI                   |
| `dist`               | Use wheel from `dist/` directory (requires git repo) |
| `https://...`        | Install from URL                                     |
| `/path/to/wheel.whl` | Install from local file path                         |

**Default behaviour:**

- Release builds install the latest Cog SDK from PyPI.
- Development builds auto-detect a wheel in `dist/`, then fall back to the latest Cog SDK from PyPI.

```console
$ COG_SDK_WHEEL=pypi:0.11.0 cog build
$ COG_SDK_WHEEL=dist cog build
$ COG_SDK_WHEEL=https://example.com/cog-0.12.0-py3-none-any.whl cog build
```

The `dist` option searches for wheels in:

1. `./dist/` (current directory)
2. `$REPO_ROOT/dist/` (if `REPO_ROOT` is set)
3. `<git-repo-root>/dist/` (via `git rev-parse`, useful when running from subdirectories)

### `COGLET_WHEEL`

Controls which coglet wheel is installed in the Docker image. Coglet is the Rust-based inference server.

**Supported values:** Same as `COG_SDK_WHEEL`.

**Default behaviour:** For development builds, auto-detects a wheel in `dist/`. For release builds, installs the latest version from PyPI.

```console
$ COGLET_WHEEL=dist cog build
$ COGLET_WHEEL=pypi:0.1.0 cog build
```

### `COG_CA_CERT`

Injects a custom CA certificate into the Docker image during `cog build`. This is useful when building behind a corporate proxy or VPN that uses custom certificate authorities (for example, Cloudflare WARP).

**Supported values:**

| Value                            | Description                                                 |
| -------------------------------- | ----------------------------------------------------------- |
| `/path/to/cert.crt`              | Path to a single PEM certificate file                       |
| `/path/to/certs/`                | Directory of `.crt` and `.pem` files (all are concatenated) |
| `-----BEGIN CERTIFICATE-----...` | Inline PEM certificate                                      |
| `LS0tLS1CRUdJTi...`              | Base64-encoded PEM certificate                              |

The certificate is installed into the system CA store and the `SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE` environment variables are set automatically in the built image.

```console
$ COG_CA_CERT=/usr/local/share/ca-certificates/corporate-ca.crt cog build
$ COG_CA_CERT=/etc/custom-certs/ cog build
$ COG_CA_CERT="$(cat /path/to/cert.pem)" cog build
```

### `COG_OPENAPI_SCHEMA`

Uses a pre-built OpenAPI schema instead of generating one from the configured predict or train reference.

The value must be a path to a JSON schema file. Cog reads that file during schema generation and embeds it in the built image.

```console
$ COG_OPENAPI_SCHEMA=./openapi.json cog build
```

## CLI and local cache variables

### `COG_NO_UPDATE_CHECK`

Disables Cog's automatic update check. Set it to any non-empty value.

```console
$ COG_NO_UPDATE_CHECK=1 cog build
```

### `COG_NO_COLOR`

Disables coloured CLI output. Set it to any non-empty value.

Cog also honours the standard `NO_COLOR` environment variable.

```console
$ COG_NO_COLOR=1 cog predict -i prompt="hello"
```

### `COG_SKIP_DOCKER_CHECK`

Skips the `cog doctor` Docker environment check. Set it to any non-empty value.

```console
$ COG_SKIP_DOCKER_CHECK=1 cog doctor
```

### `COG_CACHE_DIR`

Overrides Cog's local cache root.

Cog currently uses this cache for the content-addressed weights store. If unset, Cog uses `$XDG_CACHE_HOME/cog` when `XDG_CACHE_HOME` is set, otherwise `$HOME/.cache/cog`.

```console
$ COG_CACHE_DIR=/mnt/fast-cache cog weights pull
```

## Model reference and registry variables

### `COG_MODEL`

Overrides the full model reference used by commands that need a model destination, such as `cog push` and weights commands.

The value is parsed as a complete model reference (`registry/repo`, `registry/repo:tag`, or `registry/repo@digest`). If no tag is supplied, Cog generates a timestamp tag.

When `COG_MODEL` is set, it takes precedence over `COG_MODEL_REGISTRY`, `COG_MODEL_REPO`, and `COG_MODEL_TAG`.

```console
$ COG_MODEL=r8.im/acme/my-model:v1 cog push
```

### `COG_MODEL_REGISTRY`

Overrides only the registry host of the model reference.

```console
$ COG_MODEL_REGISTRY=registry.example.com cog push
```

### `COG_MODEL_REPO`

Overrides only the repository path of the model reference. The value must not include a registry host, tag, or digest.

```console
$ COG_MODEL_REPO=acme/my-model cog push
```

### `COG_MODEL_TAG`

Overrides only the tag of the model reference.

Tags starting with `cog-` are reserved for tags that Cog generates internally and are rejected.

```console
$ COG_MODEL_TAG=staging cog push
```

### `COG_REGISTRY_HOST`

Changes the default Replicate-compatible registry host used by commands such as `cog login`, base image resolution, and model reference resolution.

The default is `r8.im`.

```console
$ COG_REGISTRY_HOST=registry.example.com cog login
```

## Runtime server variables

These variables affect a running model server. Set them in `cog.yaml` under `environment`, pass them with `cog predict -e` or `cog serve -e`, or set them when running the built Docker image.

### `COG_MAX_CONCURRENCY`

Controls how many predictions the model server can run concurrently.

By default, Cog runs one prediction at a time. Invalid values are ignored and the default of `1` is used.

```console
$ COG_MAX_CONCURRENCY=4 docker run -p 5000:5000 my-model
```

### `COG_SETUP_TIMEOUT`

Controls the maximum time, in seconds, allowed for the model's `setup()` method to complete. If setup exceeds this timeout, the server reports setup failure.

By default, there is no timeout. Set to `0` to disable the timeout. Invalid values are ignored with a warning.

```console
$ COG_SETUP_TIMEOUT=300 docker run -p 5000:5000 my-model
```

### `COG_LOG_LEVEL`

Controls Coglet runtime log verbosity when `RUST_LOG` is not set.

Supported values are `debug`, `info`, `warn`, `warning`, and `error`. The default is `info`.

```console
$ COG_LOG_LEVEL=debug docker run -p 5000:5000 my-model
```

### `COG_THROTTLE_RESPONSE_INTERVAL`

Controls how often asynchronous webhook `output` and `logs` events are sent, in seconds.

The default is `0.5` seconds. Invalid values are ignored and the default is used. `start` and `completed` webhook events are always sent immediately.

```console
$ COG_THROTTLE_RESPONSE_INTERVAL=1 docker run -p 5000:5000 my-model
```

### `COG_STREAM_HISTORY_CAPACITY`

Controls how many server-sent event stream events are retained per prediction for replay when a client reconnects with `Accept: text/event-stream`.

By default, Cog retains the most recent 1024 events per prediction. Set to `0` to disable replay history while keeping live streaming enabled. Invalid values are ignored with a warning and the default is used.

```console
$ COG_STREAM_HISTORY_CAPACITY=0 docker run -p 5000:5000 my-model
$ COG_STREAM_HISTORY_CAPACITY=4096 docker run -p 5000:5000 my-model
```

### `COG_WEIGHTS`

Provides a weights path or URL to a model whose `setup()` method accepts a `weights` parameter.

```console
$ cog run -e COG_WEIGHTS=https://example.com/weights.tar -i prompt="hello"
```

### `COG_USER_AGENT`

Sets the `User-Agent` header used by Cog when downloading URL-backed `File` inputs.

```console
$ COG_USER_AGENT="my-service/1.0" docker run -p 5000:5000 my-model
```

## Push tuning variables

### `COG_PUSH_OCI`

Enables Cog's OCI chunked push path for container image layers when set to `1`. If the OCI push fails with a non-fatal error, Cog falls back to Docker's native push path.

```console
$ COG_PUSH_OCI=1 cog push
```

### `COG_PUSH_CONCURRENCY`

Controls how many image layers or weight blobs Cog uploads concurrently during push operations.

The default is `5`. Invalid values and values less than `1` are ignored.

```console
$ COG_PUSH_CONCURRENCY=2 cog push
```

### `COG_PUSH_DEFAULT_CHUNK_SIZE`

Sets the default multipart upload chunk size, in bytes, when the registry does not advertise a maximum chunk size.

The default is 96 MiB. Invalid values and values less than `1` are ignored.

```console
$ COG_PUSH_DEFAULT_CHUNK_SIZE=67108864 cog push
```

### `COG_PUSH_MULTIPART_THRESHOLD`

Sets the minimum blob size, in bytes, before Cog uses multipart upload.

The default is 128 MiB. Invalid values and values less than `1` are ignored.

```console
$ COG_PUSH_MULTIPART_THRESHOLD=268435456 cog push
```


---

# Getting started with your own model

This guide will show you how to put your own machine learning model in a Docker image using Cog. If you haven't got a model to try out, you'll want to follow the [main getting started guide](getting-started.md).

## Prerequisites

- **macOS or Linux**. Cog works on macOS and Linux, but does not currently support Windows.
- **Docker**. Cog uses Docker to create a container for your model. You'll need to [install Docker](https://docs.docker.com/get-docker/) before you can run Cog.

## Initialization

First, install Cog if you haven't already:

**macOS (recommended):**

```sh
brew install replicate/tap/cog
```

**Linux or macOS (manual):**

```sh
sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
sudo chmod +x /usr/local/bin/cog
```

To configure your project for use with Cog, you'll need to add two files:

- [`cog.yaml`](yaml.md) defines system requirements, Python package dependencies, etc
- [`run.py`](python.md) describes the run interface for your model

Use the `cog init` command to generate these files in your project:

```sh
$ cd path/to/your/model
$ cog init
```

## Define the Docker environment

The `cog.yaml` file defines all the different things that need to be installed for your model to run. You can think of it as a simple way of defining a Docker image.

For example:

```yaml
build:
  python_version: "3.13"
  python_requirements: requirements.txt
```

With a `requirements.txt` containing your dependencies:

```
torch==2.6.0
```

This will generate a Docker image with Python 3.13 and PyTorch 2 installed, for both CPU and GPU, with the correct version of CUDA, and various other sensible best-practices.

To run a command inside this environment, prefix it with `cog exec`:

```
$ cog exec python
✓ Building Docker image from cog.yaml... Successfully built 8f54020c8981
Running 'python' in Docker with the current directory mounted as a volume...
────────────────────────────────────────────────────────────────────────────────────────

Python 3.13.x (main, ...)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
```

This is handy for ensuring a consistent environment for development or training.

With `cog.yaml`, you can also install system packages and other things. [Take a look at the full reference to see what else you can do.](yaml.md)

## Define how to run your model

The next step is to update `run.py` to define the interface for running your model. The `run.py` generated by `cog init` looks something like this:

```python
from cog import BaseRunner, Path, Input
import torch

class Runner(BaseRunner):
    def setup(self):
        """Load the model into memory to make running multiple inferences efficient"""
        self.net = torch.load("weights.pth")

    def run(self,
            image: Path = Input(description="Image to enlarge"),
            scale: float = Input(description="Factor to scale image by", default=1.5)
    ) -> Path:
        """Run the model"""
        # ... pre-processing ...
        output = self.net(input)
        # ... post-processing ...
        return output
```

Edit your `run.py` file and fill in the functions with your own model's setup and run code. You might need to import parts of your model from another file.

You also need to define the inputs to your model as arguments to the `run()` function, as demonstrated above. For each argument, you need to annotate with a type. The supported types are:

- `str`: a string
- `int`: an integer
- `float`: a floating point number
- `bool`: a boolean
- `cog.File`: a file-like object representing a file (deprecated — use `cog.Path` instead)
- `cog.Path`: a path to a file on disk

You can provide more information about the input with the `Input()` function, as shown above. It takes these basic arguments:

- `description`: A description of what to pass to this input for users of the model
- `default`: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to `None`, the input is optional.
- `ge`: For `int` or `float` types, the value should be greater than or equal to this number.
- `le`: For `int` or `float` types, the value should be less than or equal to this number.
- `min_length`: For `str` types, the minimum length of the string.
- `max_length`: For `str` types, the maximum length of the string.
- `regex`: For `str` types, the string must match this regular expression.
- `choices`: For `str` or `int` types, a list of possible values for this input.
- `deprecated`: Mark this input as deprecated with a message explaining what to use instead.

There are some more advanced options you can pass, too. For more details, [take a look at the run interface documentation](python.md).

Next, add the line `run: "run.py:Runner"` to your `cog.yaml`, so it looks something like this:

```yaml
build:
  python_version: "3.13"
  python_requirements: requirements.txt
run: "run.py:Runner"
```

That's it! To test this works, try running the model:

```
$ cog run -i image=@input.jpg
✓ Building Docker image from cog.yaml... Successfully built 664ef88bc1f4
✓ Model running in Docker image 664ef88bc1f4

Written output to output.png
```

To pass more inputs to the model, you can add more `-i` options:

```
$ cog run -i image=@image.jpg -i scale=2.0
```

In this case it is just a number, not a file, so you don't need the `@` prefix.

## Using GPUs

To use GPUs with Cog, add the `gpu: true` option to the `build` section of your `cog.yaml`:

```yaml
build:
  gpu: true
  ...
```

Cog will use the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) base image and automatically figure out what versions of CUDA and cuDNN to use based on the version of Python, PyTorch, and Tensorflow that you are using.

For more details, [see the `gpu` section of the `cog.yaml` reference](yaml.md#gpu).

## Next steps

Next, you might want to take a look at:

- [A guide explaining how to deploy a model.](deploy.md)
- [The reference for `cog.yaml`](yaml.md)
- [The reference for the Python library](python.md)


---

# Getting started

This guide will walk you through what you can do with Cog by using an example model.

> [!TIP]
> Using a language model to help you write the code for your new Cog model?
>
> Feed it [https://cog.run/llms.txt](https://cog.run/llms.txt), which has all of Cog's documentation bundled into a single file. To learn more about this format, check out [llmstxt.org](https://llmstxt.org).

## Prerequisites

- **macOS or Linux**. Cog works on macOS and Linux, but does not currently support Windows.
- **Docker**. Cog uses Docker to create a container for your model. You'll need to [install Docker](https://docs.docker.com/get-docker/) before you can run Cog.

## Install Cog

**macOS (recommended):**

```bash
brew install replicate/tap/cog
```

**Linux or macOS (manual):**

```bash
sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
sudo chmod +x /usr/local/bin/cog
sudo xattr -d com.apple.quarantine /usr/local/bin/cog 2>/dev/null || true

```

> [!NOTE]
> **macOS: "cannot be opened because the developer cannot be verified"**
>
> If you downloaded the binary manually (via `curl` or a browser) and see this Gatekeeper warning, run:
>
> ```bash
> sudo xattr -d com.apple.quarantine /usr/local/bin/cog
> ```
>
> Installing via `brew install replicate/tap/cog` handles this automatically.

## Create a project

Let's make a directory to work in:

```bash
mkdir cog-quickstart
cd cog-quickstart

```

## Run commands

The simplest thing you can do with Cog is run a command inside a Docker environment.

The first thing you need to do is create a file called `cog.yaml`:

```yaml
build:
  python_version: "3.13"
```

Then, you can run any command inside this environment. For example, enter

```bash
cog exec python

```

and you'll get an interactive Python shell:

```none
✓ Building Docker image from cog.yaml... Successfully built 8f54020c8981
Running 'python' in Docker with the current directory mounted as a volume...
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Python 3.13.x (main, ...)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
```

(Hit Ctrl-D to exit the Python shell.)

Inside this Docker environment you can do anything – run a Jupyter notebook, your training script, your evaluation script, and so on.

## Run a model

Let's pretend we've trained a model. With Cog, we can define how to run it in a standard way, so other people can easily run it without having to hunt around for a run script.

We need to write some code to describe how the model runs.

Save this to `run.py`:

```python
import os
os.environ["TORCH_HOME"] = "."

import torch
from cog import BaseRunner, Input, Path
from PIL import Image
from torchvision import models

WEIGHTS = models.ResNet50_Weights.IMAGENET1K_V1


class Runner(BaseRunner):
    def setup(self):
        """Load the model into memory to make running multiple inferences efficient"""
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model = models.resnet50(weights=WEIGHTS).to(self.device)
        self.model.eval()

    def run(self, image: Path = Input(description="Image to classify")) -> dict:
        """Run the model"""
        img = Image.open(image).convert("RGB")
        preds = self.model(WEIGHTS.transforms()(img).unsqueeze(0).to(self.device))
        top3 = preds[0].softmax(0).topk(3)
        categories = WEIGHTS.meta["categories"]
        return {categories[i]: p.detach().item() for p, i in zip(*top3)}
```

We also need to point Cog at this, and tell it what Python dependencies to install.

Save this to `requirements.txt`:

```
pillow==11.1.0
torch==2.6.0
torchvision==0.21.0
```

Then update `cog.yaml` to look like this:

```yaml
build:
  python_version: "3.13"
  python_requirements: requirements.txt
run: "run.py:Runner"
```

> [!TIP]
> If you have a machine with an NVIDIA GPU attached, add `gpu: true` to the `build` section of your `cog.yaml` to enable GPU acceleration.

Let's grab an image to test the model with:

```bash
IMAGE_URL=https://gist.githubusercontent.com/bfirsh/3c2115692682ae260932a67d93fd94a8/raw/56b19f53f7643bb6c0b822c410c366c3a6244de2/mystery.jpg
curl $IMAGE_URL > input.jpg

```

Now, let's run the model using Cog:

```bash
cog run -i image=@input.jpg

```

If you see the following output

```json
{
  "tiger_cat": 0.4874822497367859,
  "tabby": 0.23169134557247162,
  "Egyptian_cat": 0.09728282690048218
}
```

then it worked!

Note: The first time you run `cog run`, the build process will be triggered to generate a Docker container that can run your model. The next time you run `cog run` the pre-built container will be used.

## Build an image

We can bake your model's code, the trained weights, and the Docker environment into a Docker image. This image serves an HTTP server, and can be deployed to anywhere that Docker runs to serve real-time inference.

```bash
cog build -t resnet
# Building Docker image...
# Built resnet:latest

```

You can run this image with `cog run` by passing the filename as an argument:

```bash
cog run resnet -i image=@input.jpg

```

Or, you can run it with Docker directly, and it'll serve an HTTP server:

```bash
docker run -d --rm -p 5000:5000 resnet

```

We can send inputs directly with `curl`:

```bash
curl http://localhost:5000/predictions -X POST \
    -H 'Content-Type: application/json' \
    -d '{"input": {"image": "https://gist.githubusercontent.com/bfirsh/3c2115692682ae260932a67d93fd94a8/raw/56b19f53f7643bb6c0b822c410c366c3a6244de2/mystery.jpg"}}'

```

As a shorthand, you can add the Docker image's name as an extra line in `cog.yaml`:

```yaml
image: "r8.im/replicate/resnet"
```

Once you've done this, you can use `cog push` to build and push the image to a Docker registry:

```bash
cog push
# Building r8.im/replicate/resnet...
# Pushing r8.im/replicate/resnet...
# Pushed!
```

The Docker image is now accessible to anyone or any system that has access to this Docker registry.

## Next steps

Those are the basics! Next, you might want to take a look at:

- [A guide to help you set up your own model on Cog.](getting-started-own-model.md)
- [A guide explaining how to deploy a model.](deploy.md)
- [Reference for `cog.yaml`](yaml.md)
- [Reference for the Python library](python.md)


---

# HTTP API

> [!TIP]
> For information about how to run the HTTP server,
> see [our documentation on deploying models](deploy.md).

When you run a Docker image built by Cog,
it serves an HTTP API for making predictions.

The server supports both synchronous and asynchronous prediction creation:

- **Synchronous**:
  The server waits until the prediction is completed
  and responds with the result.
- **Asynchronous**:
  The server immediately returns a response
  and processes the prediction in the background.

The client can create a prediction asynchronously
by setting the `Prefer: respond-async` header in their request
or by requesting a streamed response with `Accept: text/event-stream`.
With `Prefer: respond-async`,
the server responds immediately after starting the prediction
with `202 Accepted` status and a prediction object in status `starting`.
With `Accept: text/event-stream`,
the server responds with `200 OK` and keeps the response open
as a server-sent event stream.

> [!NOTE]
> For JSON responses, the only supported way to receive updates on the status
> of predictions started asynchronously is using [webhooks](#webhooks).
> Polling for prediction status is not currently supported.

You can also use certain server endpoints to create predictions idempotently,
such that if a client calls this endpoint more than once with the same ID
(for example, due to a network interruption)
while the prediction is still running,
no new prediction is created.
Instead, the client receives the response type requested by the retry:
JSON for regular requests or a server-sent event stream for streaming requests.

---

Here's a summary of the prediction creation endpoints:

| Endpoint                           | Header                      | Behavior                     |
| ---------------------------------- | --------------------------- | ---------------------------- |
| `POST /predictions`                | -                           | Synchronous, non-idempotent  |
| `POST /predictions`                | `Prefer: respond-async`     | Asynchronous, non-idempotent |
| `POST /predictions`                | `Accept: text/event-stream` | Streaming, non-idempotent    |
| `PUT /predictions/<prediction_id>` | -                           | Synchronous, idempotent      |
| `PUT /predictions/<prediction_id>` | `Prefer: respond-async`     | Asynchronous, idempotent     |
| `PUT /predictions/<prediction_id>` | `Accept: text/event-stream` | Streaming, idempotent        |

Choose the endpoint that best fits your needs:

- Use synchronous endpoints when you want to wait for the prediction result.
- Use asynchronous endpoints when you want to start a prediction
  and receive updates via webhooks.
- Use streaming endpoints when you want to receive prediction lifecycle events
  over the HTTP response as they happen.
- Use idempotent endpoints when you need to safely retry requests
  without creating duplicate predictions.

## Streaming predictions with server-sent events

To produce streamed prediction events,
the model must return an iterator and opt in to SSE streaming
with the `streaming` decorator.

```python
from typing import Iterator

from cog import BaseRunner, Input, streaming


class Runner(BaseRunner):
    @streaming
    def run(self, prompt: str = Input(description="Prompt")) -> Iterator[str]:
        for token in generate_tokens(prompt):
            yield token
```

The decorator can also be written as `@cog.streaming`
or, if imported directly from `cog`, `@streaming`.
The parenthesized forms `@cog.streaming()` and `@streaming()` are also accepted.
Without the decorator,
iterator outputs still work in normal JSON responses,
but requests with `Accept: text/event-stream` return `406 Not Acceptable`.

To consume a streamed prediction,
send the prediction request with `Accept: text/event-stream`:

```http
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Accept: text/event-stream

{
    "input": {"prompt": "Write a haiku about onions"}
}
```

The server starts the prediction asynchronously
and keeps the HTTP response open as a server-sent event stream.
Each event has an `event` name and JSON `data` payload:

```text
event: start
data: {"id":"abc123","status":"processing"}

event: output
data: {"chunk":"Onions","index":0}

event: output
data: {"chunk":" bloom","index":1}

event: completed
data: {"id":"abc123","status":"succeeded","output":["Onions"," bloom"],"metrics":{"predict_time":0.42}}
```

Prediction streams can emit these event types:

- `start`: The prediction started processing.
- `output`: The model yielded an output chunk.
  The payload includes `chunk` and `index`.
- `log`: The model wrote to `stdout` or `stderr`.
  The payload includes `source` and `data`.
- `metric`: The model recorded a custom metric.
  The payload includes `name`, `value`, and `mode`.
- `completed`: The prediction reached a terminal state.
  The payload is the final prediction object,
  with `status` set to `succeeded`, `failed`, or `canceled`.

For command-line clients,
use a client that prints the response as data arrives:

```bash
curl -N \
  -H 'Accept: text/event-stream' \
  -H 'Content-Type: application/json' \
  -d '{"input":{"prompt":"Write a haiku about onions"}}' \
  http://localhost:5000/predictions
```

For browser clients,
use `fetch()` or another client that supports request bodies.
The browser `EventSource` API only supports `GET` requests,
so it cannot create a prediction with `POST /predictions` or
`PUT /predictions/<prediction_id>`.

```js
const response = await fetch("/predictions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Accept: "text/event-stream",
  },
  body: JSON.stringify({ input: { prompt: "Write a haiku about onions" } }),
});

const reader = response.body.pipeThrough(new TextDecoderStream()).getReader();

while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  console.log(value);
}
```

Use `PUT /predictions/<prediction_id>` when the client needs safe retries
or wants to reconnect to an in-flight prediction by ID:

```http
PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1
Content-Type: application/json; charset=utf-8
Accept: text/event-stream

{
    "input": {"prompt": "Write a haiku about onions"}
}
```

If the prediction is still running,
the server returns a stream for the existing prediction
instead of creating a duplicate prediction.
If earlier events have been dropped from the replay buffer,
the stream emits an `error` event and closes.
The replay buffer keeps the most recent 1024 events by default.
Set `COG_STREAM_HISTORY_CAPACITY` to change this limit,
or set it to `0` to disable replay history while keeping live streaming enabled.
Training endpoints do not support SSE streaming;
requests to `/trainings` with `Accept: text/event-stream`
return `406 Not Acceptable`.

## Webhooks

You can provide a `webhook` parameter in the client request body
when creating a prediction.

```http
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async

{
    "input": {"prompt": "A picture of an onion with sunglasses"},
    "webhook": "https://example.com/webhook/prediction"
}
```

The server makes requests to the provided URL
with the current state of the prediction object in the request body
at the following times.

- `start`:
  Once, when the prediction starts
  (`status` is `starting`).
- `output`:
  Each time a run function generates an output
  (either once using `return` or multiple times using `yield`)
- `logs`:
  Each time the run function writes to `stdout`
- `completed`:
  Once, when the prediction reaches a terminal state
  (`status` is `succeeded`, `canceled`, or `failed`)

Webhook requests for `start` and `completed` event types
are sent immediately.
Webhook requests for `output` and `logs` event types
are sent at most once every 500ms.
This interval is not configurable.

By default, the server sends requests for all event types.
Clients can specify which events trigger webhook requests
with the `webhook_events_filter` parameter in the prediction request body.
For example,
the following request specifies that webhooks are sent by the server
only at the start and end of the prediction:

```http
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async

{
    "input": {"prompt": "A picture of an onion with sunglasses"},
    "webhook": "https://example.com/webhook/prediction",
    "webhook_events_filter": ["start", "completed"]
}
```

## Generating unique prediction IDs

Endpoints for creating and canceling a prediction idempotently
accept a `prediction_id` parameter in their path.
By default, the server runs one prediction at a time,
but this can be increased with the [`concurrency.max`](yaml.md#concurrency) setting.
When all prediction slots are in use, the server returns `409 Conflict`.
The client should ensure prediction slots are available
before creating a new prediction with a different ID.

Clients are responsible for providing unique prediction IDs.
We recommend generating a UUIDv4 or [UUIDv7](https://uuid7.com),
base32-encoding that value,
and removing padding characters (`==`).
This produces a random identifier that is 26 ASCII characters long.

```python
>> from uuid import uuid4
>> from base64 import b32encode
>> b32encode(uuid4().bytes).decode('utf-8').lower().rstrip('=')
'wjx3whax6rf4vphkegkhcvpv6a'
```

## File uploads

A model's `run` function can produce file output by yielding or returning
a `cog.Path` or `cog.File` value.

By default,
files are returned as a base64-encoded
[data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs).

```http
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8

{
    "input": {"prompt": "A picture of an onion with sunglasses"},
}
```

```http
HTTP/1.1 200 OK
Content-Type: application/json

{
    "status": "succeeded",
    "output": "data:image/png;base64,..."
}
```

When creating a prediction synchronously,
the client can configure a base URL to upload output files to instead
by setting the `output_file_prefix` parameter in the request body:

```http
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8

{
    "input": {"prompt": "A picture of an onion with sunglasses"},
    "output_file_prefix": "https://example.com/upload",
}
```

When the model produces a file output,
the server sends the following request to upload the file to the configured URL:

```http
PUT /upload HTTP/1.1
Host: example.com
Content-Type: multipart/form-data

--boundary
Content-Disposition: form-data; name="file"; filename="image.png"
Content-Type: image/png

<binary data>
--boundary--
```

If the upload succeeds, the server responds with output:

```http
HTTP/1.1 200 OK
Content-Type: application/json

{
    "status": "succeeded",
    "output": "http://example.com/upload/image.png"
}
```

If the upload fails, the server responds with an error.

> [!IMPORTANT]  
> File uploads for predictions created asynchronously
> require `--upload-url` to be specified when starting the HTTP server.

<a id="api"></a>

## Endpoints

### `GET /`

Returns a discovery document listing available API endpoints, the OpenAPI schema URL, and version information.

```http
GET / HTTP/1.1
```

```http
HTTP/1.1 200 OK
Content-Type: application/json

{
    "cog_version": "0.17.0",
    "docs_url": "/docs",
    "openapi_url": "/openapi.json",
    "shutdown_url": "/shutdown",
    "healthcheck_url": "/health-check",
    "predictions_url": "/predictions",
    "predictions_idempotent_url": "/predictions/{prediction_id}",
    "predictions_cancel_url": "/predictions/{prediction_id}/cancel"
}
```

If training is configured, the response also includes
`trainings_url`, `trainings_idempotent_url`, and `trainings_cancel_url` fields.

### `GET /health-check`

Returns the current health status of the model container.
This endpoint always responds with `200 OK` —
check the `status` field in the response body to determine readiness.

The response body is a JSON object with the following fields:

- `status`: One of the following values:
  - `STARTING`: The model's `setup()` method is still running.
  - `READY`: The model is ready to accept predictions.
  - `BUSY`: The model is ready but all prediction slots are in use.
  - `SETUP_FAILED`: The model's `setup()` method raised an exception.
  - `DEFUNCT`: The model encountered an unrecoverable error.
  - `UNHEALTHY`: The model is ready
    but a user-defined `healthcheck()` method returned `False`.
- `setup`: Setup phase details (included once setup has started):
  - `started_at`: ISO 8601 timestamp of when setup began.
  - `completed_at`: ISO 8601 timestamp of when setup finished (if complete).
  - `status`: One of `starting`, `succeeded`, or `failed`.
  - `logs`: Output captured during setup.
- `version`: Runtime version information:
  - `coglet`: Coglet version.
  - `cog`: Cog Python SDK version (if available).
  - `python`: Python version (if available).
- `user_healthcheck_error`:
  Error message from a user-defined `healthcheck()` method (if applicable).

```http
GET /health-check HTTP/1.1
```

```http
HTTP/1.1 200 OK
Content-Type: application/json

{
    "status": "READY",
    "setup": {
        "started_at": "2025-01-01T00:00:00.000000+00:00",
        "completed_at": "2025-01-01T00:00:05.000000+00:00",
        "status": "succeeded",
        "logs": ""
    },
    "version": {
        "coglet": "0.17.0",
        "cog": "0.14.0",
        "python": "3.13.0"
    }
}
```

### `GET /openapi.json`

The [OpenAPI](https://swagger.io/specification/) specification of the API,
which is derived from the input and output types specified in your model's
[Predictor](python.md) and [Training](training.md) objects.

### `POST /predictions`

Makes a single prediction.

The request body is a JSON object with the following fields:

- `input`:
  A JSON object with the same keys as the
  [arguments to the `run()` function](python.md).
  Any `File` or `Path` inputs are passed as URLs.

The response body is a JSON object with the following fields:

- `status`: Either `succeeded` or `failed`.
- `output`: The return value of the `run()` function.
- `error`: If `status` is `failed`, the error message.
- `metrics`: An object containing prediction metrics.
  Always includes `predict_time` (elapsed seconds).
  May also include custom metrics recorded by the model
  using [`self.record_metric()`](python.md#metrics).

```http
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8

{
    "input": {
        "image": "https://example.com/image.jpg",
        "text": "Hello world!"
    }
}
```

```http
HTTP/1.1 200 OK
Content-Type: application/json

{
    "status": "succeeded",
    "output": "data:image/png;base64,...",
    "metrics": {
        "predict_time": 4.52
    }
}
```

If the client sets the `Prefer: respond-async` header in their request,
the server responds immediately after starting the prediction
with `202 Accepted` status and a prediction object in status `processing`.

```http
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async

{
    "input": {"prompt": "A picture of an onion with sunglasses"}
}
```

```http
HTTP/1.1 202 Accepted
Content-Type: application/json

{
    "status": "starting",
}
```

If the client sets the `Accept: text/event-stream` header,
the server starts the prediction asynchronously and responds with a
server-sent event stream.
See [Streaming predictions with server-sent events](#streaming-predictions-with-server-sent-events).

### `PUT /predictions/<prediction_id>`

Make a single prediction.
This is the idempotent version of the `POST /predictions` endpoint.

```http
PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1
Content-Type: application/json; charset=utf-8

{
    "input": {"prompt": "A picture of an onion with sunglasses"}
}
```

```http
HTTP/1.1 200 OK
Content-Type: application/json

{
    "status": "succeeded",
    "output": "data:image/png;base64,..."
}
```

If the client sets the `Prefer: respond-async` header in their request,
the server responds immediately after starting the prediction
with `202 Accepted` status and a prediction object in status `processing`.

```http
PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async

{
    "input": {"prompt": "A picture of an onion with sunglasses"}
}
```

```http
HTTP/1.1 202 Accepted
Content-Type: application/json

{
    "id": "wjx3whax6rf4vphkegkhcvpv6a",
    "status": "starting"
}
```

If the client sets the `Accept: text/event-stream` header,
the server starts the prediction asynchronously and responds with a
server-sent event stream.
If a prediction with the same ID is already running,
the server returns a stream for the existing prediction.
See [Streaming predictions with server-sent events](#streaming-predictions-with-server-sent-events).

### `POST /predictions/<prediction_id>/cancel`

A client can cancel an asynchronous prediction by making a
`POST /predictions/<prediction_id>/cancel` request
using the prediction `id` provided when the prediction was created.

For example,
if the client creates a prediction by sending the request:

```http
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async

{
    "id": "abcd1234",
    "input": {"prompt": "A picture of an onion with sunglasses"},
}
```

The client can cancel the prediction by sending the request:

```http
POST /predictions/abcd1234/cancel HTTP/1.1
```

A prediction cannot be canceled if it's
created synchronously, without the `Prefer: respond-async` header,
or created without a provided `id`.

If a prediction exists with the provided `id`,
the server responds with status `200 OK`.
Otherwise, the server responds with status `404 Not Found`.

When a prediction is canceled,
Cog raises [`CancelationException`](python.md#cancelationexception)
in sync predictors (or `asyncio.CancelledError` in async predictors).
This exception may be caught by the model to perform necessary cleanup.
The cleanup should be brief, ideally completing within a few seconds.
After cleanup, the exception must be re-raised using a bare `raise` statement.
Failure to re-raise the exception may result in the termination of the container.

```python
from cog import BaseRunner, CancelationException, Input, Path

class Runner(BaseRunner):
    def run(self, image: Path = Input(description="Image to process")) -> Path:
        try:
            return self.process(image)
        except CancelationException:
            self.cleanup()
            raise  # always re-raise
```


---

# Notebooks

Cog plays nicely with Jupyter notebooks.

## Install the jupyterlab Python package

First, add `jupyterlab` to your `requirements.txt` file and reference it in [`cog.yaml`](yaml.md):

`requirements.txt`:

```
jupyterlab
```

`cog.yaml`:

```yaml
build:
  python_requirements: requirements.txt
```

## Run a notebook

Cog can run notebooks in the environment you've defined in `cog.yaml` with the following command:

```sh
cog exec -p 8888 jupyter lab --allow-root --ip=0.0.0.0
```

## Use notebook code in your runner

You can also import a notebook into your Cog [Runner](python.md) file.

First, export your notebook to a Python file:

```sh
jupyter nbconvert --to script my_notebook.ipynb # creates my_notebook.py
```

Then import the exported Python script into your `run.py` file. Any functions or variables defined in your notebook will be available to your runner:

```python
from cog import BaseRunner, Input

import my_notebook

class Runner(BaseRunner):
    def run(self, prompt: str = Input(description="string prompt")) -> str:
      output = my_notebook.do_stuff(prompt)
      return output
```


---

# Private package registry

This guide describes how to build a Docker image with Cog that fetches Python packages from a private registry during setup.

## `pip.conf`

In a directory outside your Cog project, create a `pip.conf` file with an `index-url` set to the registry's URL with embedded credentials.

```conf
[global]
index-url = https://username:password@my-private-registry.com
```

> **Warning**
> Be careful not to commit secrets in Git or include them in Docker images. If your Cog project contains any sensitive files, make sure they're listed in `.gitignore` and `.dockerignore`.

## `cog.yaml`

In your project's [`cog.yaml`](yaml.md) file, add a setup command to run `pip install` with a secret configuration file mounted to `/etc/pip.conf`.

```yaml
build:
  run:
    - command: pip install
      mounts:
        - type: secret
          id: pip
          target: /etc/pip.conf
```

## Build

When building or pushing your model with Cog, pass the `--secret` option with an `id` matching the one specified in `cog.yaml`, along with a path to your local `pip.conf` file.

```console
$ cog build --secret id=pip,source=/path/to/pip.conf
```

Using a secret mount allows the private registry credentials to be securely passed to the `pip install` setup command, without baking them into the Docker image.

> **Warning**
> If you run `cog build` or `cog push` and then change the contents of a secret source file, the cached version of the file will be used on subsequent builds, ignoring any changes you made. To update the contents of the target secret file, either change the `id` value in `cog.yaml` and the `--secret` option, or pass the `--no-cache` option to bypass the cache entirely.


---

# Run interface reference

This document defines the API of the `cog` Python module, which is used to define the interface for running your model.

> [!TIP]
> Run [`cog init`](getting-started-own-model.md#initialization) to generate an annotated `run.py` file that can be used as a starting point for setting up your model.

> [!TIP]
> Using a language model to help you write the code for your new Cog model?
>
> Feed it [https://cog.run/llms.txt](https://cog.run/llms.txt), which has all of Cog's documentation bundled into a single file. To learn more about this format, check out [llmstxt.org](https://llmstxt.org).

## Contents

- [Contents](#contents)
- [`BaseRunner`](#baserunner)
  - [`Runner.setup()`](#runnersetup)
  - [`Runner.run(**kwargs)`](#runnerrunkwargs)
- [`async` runners and concurrency](#async-runners-and-concurrency)
- [`Input(**kwargs)`](#inputkwargs)
  - [Deprecating inputs](#deprecating-inputs)
- [Output](#output)
  - [Returning an object](#returning-an-object)
  - [Returning a list](#returning-a-list)
  - [Optional properties](#optional-properties)
  - [Streaming output](#streaming-output)
- [Metrics](#metrics)
  - [Recording metrics](#recording-metrics)
  - [Accumulation modes](#accumulation-modes)
  - [Dot-path keys](#dot-path-keys)
  - [Type safety](#type-safety)
- [Cancellation](#cancellation)
  - [`CancelationException`](#cancelationexception)
- [Input and output types](#input-and-output-types)
  - [Primitive types](#primitive-types)
  - [`cog.Path`](#cogpath)
  - [`cog.File` (deprecated)](#cogfile-deprecated)
  - [`cog.Secret`](#cogsecret)
  - [Wrapper types](#wrapper-types)
    - [`Optional`](#optional)
    - [`list`](#list)
    - [`dict`](#dict)
    - [`cog.Opaque`](#cogopaque)
  - [Structured output with `BaseModel`](#structured-output-with-basemodel)
    - [Using `cog.BaseModel`](#using-cogbasemodel)
    - [Using Pydantic `BaseModel`](#using-pydantic-basemodel)
    - [`BaseModel` field types](#basemodel-field-types)
  - [Type limitations](#type-limitations)

## `BaseRunner`

You define how Cog runs your model by defining a class that inherits from `BaseRunner`. It looks something like this:

```python
from cog import BaseRunner, Path, Input
import torch

class Runner(BaseRunner):
    def setup(self):
        """Load the model into memory to make running multiple inferences efficient"""
        self.model = torch.load("weights.pth")

    def run(self,
            image: Path = Input(description="Image to enlarge"),
            scale: float = Input(description="Factor to scale image by", default=1.5)
    ) -> Path:
        """Run the model"""
        # ... pre-processing ...
        output = self.model(image)
        # ... post-processing ...
        return output
```

Your Runner class should define two methods: `setup()` and `run()`.

`BasePredictor`, `Predictor`, and `predict()` still work for existing models, but they are deprecated. Cog warns when it loads or inspects those legacy names. Use `BaseRunner`, `Runner`, and `run()` for new code.

### `Runner.setup()`

Prepare the model so multiple runs are efficient.

Use this _optional_ method to include expensive one-off operations like loading trained models, instantiating data transformations, etc.

Many models use this method to download their weights (e.g. using [`pget`](https://github.com/replicate/pget)). This has some advantages:

- Smaller image sizes
- Faster build times
- Faster pushes and inference on [Replicate](https://replicate.com)

However, this may also significantly increase your `setup()` time.

As an alternative, some choose to store their weights directly in the image. You can simply leave your weights in the directory alongside your `cog.yaml` and ensure they are not excluded in your `.dockerignore` file.

While this will increase your image size and build time, it offers other advantages:

- Faster `setup()` time
- Ensures idempotency and reduces your model's reliance on external systems
- Preserves reproducibility as your model will be self-contained in the image

> When using this method, you should use the `--separate-weights` flag on `cog build` to store weights in a [separate layer](https://github.com/replicate/cog/blob/12ac02091d93beebebed037f38a0c99cd8749806/docs/getting-started.md?plain=1#L219).

### `Runner.run(**kwargs)`

Run the model.

This _required_ method is where you call the model that was loaded during `setup()`, but you may also want to add pre- and post-processing code here.

The `run()` method takes an arbitrary list of named arguments, where each argument name must correspond to an [`Input()`](#inputkwargs) annotation.

`run()` can return strings, numbers, [`cog.Path`](#cogpath) objects representing files on disk, or lists or dicts of those types. You can also define a custom [`BaseModel`](#structured-output-with-basemodel) for structured return types. See [Input and output types](#input-and-output-types) for the full list of supported types.

## `async` runners and concurrency

> Added in cog 0.14.0.

You may specify your `run()` method as `async def run(...)`. In
addition, if you have an async `run()` function you may also have an async
`setup()` function:

```py
class Runner(BaseRunner):
    async def setup(self) -> None:
        print("async setup is also supported...")

    async def run(self) -> str:
        print("async run");
        return "hello world";
```

Models that have an async `run()` function can run concurrently, up to the limit specified by [`concurrency.max`](yaml.md#max) in cog.yaml. Attempting to exceed this limit will return a 409 Conflict response.

## `Input(**kwargs)`

Use cog's `Input()` function to define each of the parameters in your `run()` method:

```py
class Runner(BaseRunner):
    def run(self,
            image: Path = Input(description="Image to enlarge"),
            scale: float = Input(description="Factor to scale image by", default=1.5, ge=1.0, le=10.0)
    ) -> Path:
```

The `Input()` function takes these keyword arguments:

- `description`: A description of what to pass to this input for users of the model.
- `default`: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to `None`, the input is optional.
- `ge`: For `int` or `float` types, the value must be greater than or equal to this number.
- `le`: For `int` or `float` types, the value must be less than or equal to this number.
- `min_length`: For `str` types, the minimum length of the string.
- `max_length`: For `str` types, the maximum length of the string.
- `regex`: For `str` types, the string must match this regular expression.
- `choices`: For `str` or `int` types, a list of possible values for this input.
- `deprecated`: (optional) If set to `True`, marks this input as deprecated. Deprecated inputs will still be accepted, but tools and UIs may warn users that the input is deprecated and may be removed in the future. See [Deprecating inputs](#deprecating-inputs).

Each parameter of the `run()` method must be annotated with a type like `str`, `int`, `float`, `bool`, etc. See [Input and output types](#input-and-output-types) for the full list of supported types.

Using the `Input` function provides better documentation and validation constraints to the users of your model, but it is not strictly required. You can also specify default values for your parameters using plain Python, or omit default assignment entirely:

```py
class Runner(BaseRunner):
    def run(self,
        prompt: str = "default prompt", # this is valid
        iterations: int                 # also valid
    ) -> str:
        # ...
```

## Deprecating inputs

You can mark an input as deprecated by passing `deprecated=True` to the `Input()` function. Deprecated inputs will still be accepted, but tools and UIs may warn users that the input is deprecated and may be removed in the future.

This is useful when you want to phase out an input without breaking existing clients immediately:

```py
from cog import BaseRunner, Input

class Runner(BaseRunner):
    def run(self,
        text: str = Input(description="Some deprecated text", deprecated=True),
        prompt: str = Input(description="Prompt for the model")
    ) -> str:
        # ...
        return prompt
```

## Output

Cog runners can return a simple data type like a string, number, float, or boolean. Use Python's `-> <type>` syntax to annotate the return type.

Here's an example of a runner that returns a string:

```py
from cog import BaseRunner

class Runner(BaseRunner):
    def run(self) -> str:
        return "hello"
```

### Returning an object

To return a complex object with multiple values, define an `Output` object with multiple fields to return from your `run()` method:

```py
from cog import BaseRunner, BaseModel, File

class Output(BaseModel):
    file: File
    text: str

class Runner(BaseRunner):
    def run(self) -> Output:
        return Output(text="hello", file=io.StringIO("hello"))
```

Each of the output object's properties must be one of the supported output types. For the full list, see [Input and output types](#input-and-output-types).

### Returning a list

The `run()` method can return a list of any of the supported output types. Here's an example that outputs multiple files:

```py
from cog import BaseRunner, Path

class Runner(BaseRunner):
    def run(self) -> list[Path]:
        items = ["foo", "bar", "baz"]
        output = []
        for i, item in enumerate(items):
            out_path = Path(f"/tmp/out-{i}.txt")
            with out_path.open("w") as f:
                f.write(item)
            output.append(out_path)
        return output
```

Files are named in the format `output.<index>.<extension>`, e.g. `output.0.txt`, `output.1.txt`, and `output.2.txt` from the example above.

### Optional properties

To conditionally omit properties from the Output object, define them using `typing.Optional`:

```py
from cog import BaseModel, BaseRunner, Path
from typing import Optional

class Output(BaseModel):
    score: Optional[float]
    file: Optional[Path]

class Runner(BaseRunner):
    def run(self) -> Output:
        if condition:
            return Output(score=1.5)
        else:
            return Output(file=io.StringIO("hello"))
```

### Streaming output

Cog models can stream output as the `run()` method is running. For example, a language model can output tokens as they're being generated and an image generation model can output images as they are being generated.

To support streaming output in your Cog model, add `from typing import Iterator` to your `run.py` file. The `typing` package is a part of Python's standard library so it doesn't need to be installed. Then add a return type annotation to the `run()` method in the form `-> Iterator[<type>]` where `<type>` can be one of `str`, `int`, `float`, `bool`, or `cog.Path`.

To allow clients to receive chunks as server-sent events with `Accept: text/event-stream`, decorate the prediction method (`run()` or `predict()`) with `@cog.streaming` (or `@streaming` if imported directly from `cog`). The parenthesized forms `@cog.streaming()` and `@streaming()` are also accepted. The decorated method must return `Iterator[...]`, `AsyncIterator[...]`, `ConcatenateIterator[...]`, or `AsyncConcatenateIterator[...]`. Without the decorator, iterator outputs still work in normal JSON responses, but SSE requests return `406 Not Acceptable`.

```py
from typing import Iterator
from cog import BaseRunner, Path, streaming

class Runner(BaseRunner):
    @streaming
    def run(self) -> Iterator[Path]:
        done = False
        while not done:
            output_path, done = do_stuff()
            yield Path(output_path)
```

If you have an [async `run()` method](#async-runners-and-concurrency), use `AsyncIterator` from the `typing` module:

```py
from typing import AsyncIterator
from cog import BaseRunner, Path, streaming

class Runner(BaseRunner):
    @streaming
    async def run(self) -> AsyncIterator[Path]:
        done = False
        while not done:
            output_path, done = do_stuff()
            yield Path(output_path)
```

If you're streaming text output, you can use `ConcatenateIterator` to hint that the output should be concatenated together into a single string. This is useful on Replicate to display the output as a string instead of a list of strings.

```py
from cog import BaseRunner, ConcatenateIterator, streaming

class Runner(BaseRunner):
    @streaming
    def run(self) -> ConcatenateIterator[str]:
        tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
        for token in tokens:
            yield token + " "
```

Or for async `run()` methods, use `AsyncConcatenateIterator`:

```py
from cog import AsyncConcatenateIterator, BaseRunner, streaming

class Runner(BaseRunner):
    @streaming
    async def run(self) -> AsyncConcatenateIterator[str]:
        tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
        for token in tokens:
            yield token + " "
```

## Metrics

You can record custom metrics from your `run()` function to track model-specific data like token counts, timing breakdowns, or confidence scores. Metrics are included in the response alongside the output.

### Recording metrics

Use `self.record_metric()` inside your `run()` method:

```python
from cog import BaseRunner

class Runner(BaseRunner):
    def run(self, prompt: str) -> str:
        self.record_metric("temperature", 0.7)
        self.record_metric("token_count", 42)

        result = self.model.generate(prompt)
        return result
```

For advanced use (dict-style access, deleting metrics), use `self.scope`:

```python
self.scope.metrics["token_count"] = 42
del self.scope.metrics["token_count"]
```

Metrics appear in the response `metrics` field:

```json
{
  "status": "succeeded",
  "output": "...",
  "metrics": {
    "temperature": 0.7,
    "token_count": 42,
    "predict_time": 1.23
  }
}
```

The `predict_time` metric is always added automatically by the runtime.

Supported value types are `bool`, `int`, `float`, `str`, `list`, and `dict`. Setting a metric to `None` deletes it.

### Naming rules

Metric names must follow these rules:

- Each segment must start with a letter (`a-z`, `A-Z`) and end with a letter or digit
- Segments can contain letters, digits, and underscores (`_`)
- Segments cannot start or end with underscores
- Segments cannot contain consecutive underscores (`__`)
- Use dots (`.`) to create nested objects (e.g., `timing.inference` produces `{"timing": {"inference": ...}}`)
- Maximum 128 characters total
- Maximum 4 dot-separated segments
- Cannot be `predict_time` (reserved by runtime)
- Cannot start with `cog.` (reserved for system metrics)

Valid examples: `temperature`, `token_count`, `TTFT`, `T2I_latency`, `timing.preprocess`

Invalid examples: `_token`, `token_`, `foo__bar`, `.foo`, `foo..bar`, `foo bar`, `cog.system`

### Accumulation modes

By default, recording a metric replaces any previous value for that key. You can use accumulation modes to build up values across multiple calls:

```python
# Increment a counter (adds to the existing numeric value)
self.record_metric("token_count", 1, mode="incr")
self.record_metric("token_count", 1, mode="incr")
# Result: {"token_count": 2}

# Append to an array
self.record_metric("steps", "preprocessing", mode="append")
self.record_metric("steps", "inference", mode="append")
# Result: {"steps": ["preprocessing", "inference"]}

# Replace (default behavior)
self.record_metric("status", "running", mode="replace")
self.record_metric("status", "done", mode="replace")
# Result: {"status": "done"}
```

The `mode` parameter accepts `"replace"` (default), `"incr"`, or `"append"`.

### Dot-path keys

Use dot-separated keys to create nested objects in the metrics output:

```python
self.record_metric("timing.preprocess", 0.12)
self.record_metric("timing.inference", 0.85)
```

This produces nested JSON:

```json
{
  "metrics": {
    "timing": {
      "preprocess": 0.12,
      "inference": 0.85
    },
    "predict_time": 1.23
  }
}
```

### Type safety

Once a metric key has been assigned a value of a certain type, it cannot be changed to a different type without deleting it first. This prevents accidental type mismatches when using accumulation modes:

```python
self.record_metric("count", 1)

# This would raise an error — "count" is an int, not a string:
# self.record_metric("count", "oops")

# Delete first, then set with new type:
del self.scope.metrics["count"]
self.record_metric("count", "now a string")
```

Outside an active run, `self.record_metric()` and `self.scope` are silent no-ops — no need for `None` checks.

## Cancellation

When a run is canceled (via the [cancel HTTP endpoint](http.md#post-predictionsprediction_idcancel) or a dropped connection), the Cog runtime interrupts the running `run()` function. The exception raised depends on whether the runner is sync or async:

| Runner type             | Exception raised         |
| ----------------------- | ------------------------ |
| Sync (`def run`)        | `CancelationException`   |
| Async (`async def run`) | `asyncio.CancelledError` |

### `CancelationException`

```python
from cog import CancelationException
```

`CancelationException` is raised in **sync** runners when a run is cancelled. It is a `BaseException` subclass — **not** an `Exception` subclass. This means bare `except Exception` blocks in your run code will not accidentally catch it, matching the behavior of `KeyboardInterrupt` and `asyncio.CancelledError`.

You do **not** need to handle this exception in normal runner code — the runtime manages cancellation automatically. However, if you need to run cleanup logic when a run is cancelled, you can catch it explicitly:

```python
from cog import BaseRunner, CancelationException, Path

class Runner(BaseRunner):
    def run(self, image: Path) -> Path:
        try:
            return self.process(image)
        except CancelationException:
            self.cleanup()
            raise  # always re-raise
```

> [!WARNING]
> You **must** re-raise `CancelationException` after cleanup. Swallowing it will prevent the runtime from marking the run as canceled, and may result in the termination of the container.

`CancelationException` is available as:

- `cog.CancelationException` (recommended)
- `cog.exceptions.CancelationException`

For **async** runners, cancellation follows standard Python async conventions and raises `asyncio.CancelledError` instead.

## Input and output types

Each parameter of the `run()` method must be annotated with a type. The method's return type must also be annotated.

### Primitive types

These types can be used directly as input parameter types and output return types:

| Type                              | Description                               | JSON Schema                   |
| --------------------------------- | ----------------------------------------- | ----------------------------- |
| `str`                             | A string                                  | `string`                      |
| `int`                             | An integer                                | `integer`                     |
| `float`                           | A floating-point number                   | `number`                      |
| `bool`                            | A boolean                                 | `boolean`                     |
| [`cog.Path`](#cogpath)            | A path to a file on disk                  | `string` (format: `uri`)      |
| [`cog.File`](#cogfile-deprecated) | A file-like object (deprecated)           | `string` (format: `uri`)      |
| [`cog.Secret`](#cogsecret)        | A string containing sensitive information | `string` (format: `password`) |

### `cog.Path`

`cog.Path` is used to get files in and out of models. It represents a _path to a file on disk_.

`cog.Path` is a subclass of Python's [`pathlib.Path`](https://docs.python.org/3/library/pathlib.html#basic-use) and can be used as a drop-in replacement. Any `os.PathLike` subclass is also accepted as an input type and treated as `cog.Path`.

For models that return a `cog.Path` object, the output returned by Cog's built-in HTTP server will be a URL.

This example takes an input file, resizes it, and returns the resized image:

```python
import tempfile
from cog import BaseRunner, Input, Path

class Runner(BaseRunner):
    def run(self, image: Path = Input(description="Image to enlarge")) -> Path:
        upscaled_image = do_some_processing(image)

        # To output cog.Path objects the file needs to exist, so create a temporary file first.
        # This file will automatically be deleted by Cog after it has been returned.
        output_path = Path(tempfile.mkdtemp()) / "upscaled.png"
        upscaled_image.save(output_path)
        return Path(output_path)
```

### `cog.File` (deprecated)

> [!WARNING]  
> `cog.File` is deprecated and will be removed in a future version of Cog. Use [`cog.Path`](#cogpath) instead.

`cog.File` represents a _file handle_. For models that return a `cog.File` object, the output returned by Cog's built-in HTTP server will be a URL.

```python
from cog import BaseRunner, File, Input
from PIL import Image

class Runner(BaseRunner):
    def run(self, source_image: File = Input(description="Image to enlarge")) -> File:
        pillow_img = Image.open(source_image)
        upscaled_image = do_some_processing(pillow_img)
        return File(upscaled_image)
```

### `cog.Secret`

`cog.Secret` signifies that an input holds sensitive information like a password or API token.

`cog.Secret` redacts its contents in string representations to prevent accidental disclosure. Access the underlying value with `get_secret_value()`.

```python
from cog import BaseRunner, Secret

class Runner(BaseRunner):
    def run(self, api_token: Secret) -> None:
        # Prints '**********'
        print(api_token)

        # Use get_secret_value method to see the secret's content.
        print(api_token.get_secret_value())
```

A runner's `Secret` inputs are represented in OpenAPI with the following schema:

```json
{
  "type": "string",
  "format": "password",
  "x-cog-secret": true
}
```

Models uploaded to Replicate treat secret inputs differently throughout its system. When you create a run on Replicate, any value passed to a `Secret` input is redacted after being sent to the model.

> [!WARNING]  
> Passing secret values to untrusted models can result in
> unintended disclosure, exfiltration, or misuse of sensitive data.

### Wrapper types

Cog supports wrapper types that modify how a primitive type is treated.

#### `Optional`

Use `Optional[T]` or `T | None` (Python 3.10+) to mark an input as optional. Optional inputs default to `None` if not provided.

```python
from typing import Optional
from cog import BaseRunner, Input

class Runner(BaseRunner):
    def run(self,
        prompt: Optional[str] = Input(description="Input prompt"),
        seed: int | None = Input(description="Random seed", default=None),
    ) -> str:
        if prompt is None:
            return "hello"
        return "hello " + prompt
```

Prefer `Optional[T]` or `T | None` over `str = Input(default=None)` for inputs that can be `None`. This lets type checkers warn about error-prone `None` values:

```python
# Bad: type annotation says str but value can be None
def run(self, prompt: str = Input(default=None)) -> str:
    return "hello" + prompt  # TypeError at runtime if prompt is None

# Good: type annotation matches actual behavior
def run(self, prompt: Optional[str] = Input(description="prompt")) -> str:
    if prompt is None:
        return "hello"
    return "hello " + prompt
```

> [!NOTE]
> `Optional[T]` is supported in `BaseModel` output fields but **not** as a top-level return type. Use a `BaseModel` with optional fields instead.

#### `list`

Use `list[T]` or `List[T]` to accept or return a list of values. `T` can be a supported Cog type, but nested container types are not supported.

**As an input type:**

```py
from cog import BaseRunner, Path

class Runner(BaseRunner):
    def run(self, paths: list[Path]) -> str:
        output_parts = []
        for path in paths:
            with open(path) as f:
                output_parts.append(f.read())
        return "".join(output_parts)
```

With `cog run`, repeat the input name to pass multiple values:

```bash
$ echo test1 > 1.txt
$ echo test2 > 2.txt
$ cog run -i paths=@1.txt -i paths=@2.txt
```

**As an output type:**

```py
from cog import BaseRunner, Path

class Runner(BaseRunner):
    def run(self) -> list[Path]:
        items = ["foo", "bar", "baz"]
        output = []
        for i, item in enumerate(items):
            out_path = Path(f"/tmp/out-{i}.txt")
            with out_path.open("w") as f:
                f.write(item)
            output.append(out_path)
        return output
```

Files are named in the format `output.<index>.<extension>`, e.g. `output.0.txt`, `output.1.txt`, `output.2.txt`.

#### `dict`

Use `dict` to accept or return an opaque JSON object. The value is passed through as-is without type validation.

```python
from cog import BaseRunner, Input

class Runner(BaseRunner):
    def run(self,
        params: dict = Input(description="Arbitrary JSON parameters"),
    ) -> dict:
        return {"greeting": "hello", "params": params}
```

> [!NOTE]
> `dict` inputs and outputs are represented as `{"type": "object"}` in the OpenAPI schema with no additional structure. For structured data with validated fields, use a [`BaseModel`](#structured-output-with-basemodel) instead.

#### `cog.Opaque`

Cog statically analyzes `run()` type annotations to generate schemas. Some third-party package types, such as vLLM `TypedDict` definitions, may not be visible to that static analyzer even though they represent JSON-shaped object values at runtime.

Use `typing.Annotated` with `cog.Opaque` when you want Cog to accept or return those third-party object values without inspecting their fields:

```python
from typing import Annotated

from cog import BaseRunner, Opaque
from vllm.entrypoints.chat_utils import CustomChatCompletionMessageParam


class Runner(BaseRunner):
    def run(
        self,
        messages: Annotated[list[CustomChatCompletionMessageParam], Opaque],
    ) -> str:
        return str(messages)
```

`Opaque` emits an object schema for the wrapped type and preserves the container shape. For example, `Annotated[list[T], Opaque]` is represented as an array of opaque objects.

`Opaque` does not inspect, validate, encode, decode, or transform values. It only tells Cog's schema generator to treat the wrapped type as an opaque JSON object. If your type needs custom serialization or deserialization, provide that separately; `Opaque` only affects schema generation.

### Structured output with `BaseModel`

To return a complex object with multiple typed fields, define a class that inherits from `cog.BaseModel` or Pydantic's `BaseModel` and use it as your return type.

#### Using `cog.BaseModel`

`cog.BaseModel` subclasses are automatically converted to Python dataclasses. Define fields using standard type annotations:

```python
from typing import Optional
from cog import BaseRunner, BaseModel, Path

class Output(BaseModel):
    text: str
    confidence: float
    image: Optional[Path]

class Runner(BaseRunner):
    def run(self, prompt: str) -> Output:
        result = self.model.generate(prompt)
        return Output(
            text=result.text,
            confidence=result.score,
            image=None,
        )
```

The output class can have any name — it does not need to be called `Output`:

```python
from cog import BaseModel

class SegmentationResult(BaseModel):
    success: bool
    error: Optional[str]
    segmented_image: Optional[Path]
```

#### Using Pydantic `BaseModel`

If you already use Pydantic v2 in your model, you can use a Pydantic `BaseModel` subclass directly as the output type:

```python
from pydantic import BaseModel as PydanticBaseModel
from cog import BaseRunner

class Result(PydanticBaseModel):
    name: str
    score: float
    tags: list[str]

class Runner(BaseRunner):
    def run(self, prompt: str) -> Result:
        return Result(name="example", score=0.95, tags=["fast", "accurate"])
```

#### `BaseModel` field types

Fields in a `BaseModel` output support these types:

| Type                          | Example                   |
| ----------------------------- | ------------------------- |
| `str`, `int`, `float`, `bool` | `score: float`            |
| `cog.Path`                    | `image: Path`             |
| `cog.File`                    | `data: File` (deprecated) |
| `cog.Secret`                  | `token: Secret`           |
| `Optional[T]`                 | `error: Optional[str]`    |
| `list[T]`                     | `tags: list[str]`         |

### Type limitations

The following type patterns are **not** supported:

- **Nested generics**: `list[list[str]]`, `list[Optional[str]]`, `Optional[list[str]]` are not supported.
- **Union types beyond Optional**: `str | int`, `Union[str, int, None]` — only `Optional[T]` (i.e. `T | None`) is supported.
- **`Optional` as a top-level return type**: `-> Optional[str]` is not allowed. Use a `BaseModel` with optional fields instead.
- **Nested `BaseModel` fields**: A `BaseModel` field typed as another `BaseModel` is not supported in Cog's type system for schema generation.
- **Tuple, Set, or other collection types**: Only `list` and `dict` are supported as collection types.


---

# Training interface reference

> [!WARNING]  
> The `cog train` command is deprecated and will be removed in the next version of Cog. The training API described below may still be used with the HTTP API's `/trainings` endpoint, but the CLI command is no longer recommended for new projects.

Cog's training API allows you to define a fine-tuning interface for an existing Cog model, so users of the model can bring their own training data to create derivative fine-tuned models. Real-world examples of this API in use include [fine-tuning SDXL with images](https://replicate.com/blog/fine-tune-sdxl) or [fine-tuning Llama 2 with structured text](https://replicate.com/blog/fine-tune-llama-2).

## How it works

If you've used Cog before, you've probably seen the [Runner](./python.md) class, which defines the interface for running your model. Cog's training API works similarly: You define a Python function that describes the inputs and outputs of the training process. The inputs are things like training data, epochs, batch size, seed, etc. The output is typically a file with the fine-tuned weights.

`cog.yaml`:

```yaml
build:
  python_version: "3.13"
train: "train.py:train"
```

`train.py`:

```python
from cog import File
import io

def train(param: str) -> File:
    return io.StringIO("hello " + param)
```

Then you can run it like this:

```
$ cog train -i param=train
...

$ cat weights
hello train
```

You can also use classes if you want to run many model trainings and save on setup time. This works the same way as the [Runner](./python.md) class with the only difference being the `train` method.

`cog.yaml`:

```yaml
build:
  python_version: "3.13"
train: "train.py:Trainer"
```

`train.py`:

```python
from cog import File
import io

class Trainer:
    def setup(self) -> None:
        self.base_model = ... # Load a big base model

    def train(self, param: str) -> File:
        return self.base_model.train(param) # Train on top of a base model
```

## `Input(**kwargs)`

Use Cog's `Input()` function to define each of the parameters in your `train()` function:

```py
from cog import Input, Path

def train(
    train_data: Path = Input(description="HTTPS URL of a file containing training data"),
    learning_rate: float = Input(description="learning rate, for learning!", default=1e-4, ge=0),
    seed: int = Input(description="random seed to use for training", default=None)
) -> str:
  return "hello, weights"
```

The `Input()` function takes these keyword arguments:

- `description`: A description of what to pass to this input for users of the model.
- `default`: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to `None`, the input is optional.
- `ge`: For `int` or `float` types, the value must be greater than or equal to this number.
- `le`: For `int` or `float` types, the value must be less than or equal to this number.
- `min_length`: For `str` types, the minimum length of the string.
- `max_length`: For `str` types, the maximum length of the string.
- `regex`: For `str` types, the string must match this regular expression.
- `choices`: For `str` or `int` types, a list of possible values for this input.

Each parameter of the `train()` function must be annotated with a type like `str`, `int`, `float`, `bool`, etc. See [Input and output types](./python.md#input-and-output-types) for the full list of supported types.

Using the `Input` function provides better documentation and validation constraints to the users of your model, but it is not strictly required. You can also specify default values for your parameters using plain Python, or omit default assignment entirely:

```py
def train(self,
  training_data: str = "foo bar", # this is valid
  iterations: int                 # also valid
) -> str:
  # ...
```

## Training Output

Training output is typically a binary weights file. To return a custom output object or a complex object with multiple values, define a `TrainingOutput` object with multiple fields to return from your `train()` function, and specify it as the return type for the train function using Python's `->` return type annotation:

```python
from cog import BaseModel, Input, Path

class TrainingOutput(BaseModel):
    weights: Path

def train(
    train_data: Path = Input(description="HTTPS URL of a file containing training data"),
    learning_rate: float = Input(description="learning rate, for learning!", default=1e-4, ge=0),
    seed: int = Input(description="random seed to use for training", default=42)
) -> TrainingOutput:
  weights_file = generate_weights("...")
  return TrainingOutput(weights=Path(weights_file))
```

## Testing

If you are doing development of a Cog model like Llama or SDXL, you can test that the fine-tuned code path works before pushing by specifying a `COG_WEIGHTS` environment variable when running `run`:

```console
cog run -e COG_WEIGHTS=https://replicate.delivery/pbxt/xyz/weights.tar -i prompt="a photo of TOK"
```


---

# Using `cog` on Windows 11 with WSL 2

- [0. Prerequisites](#0-prerequisites)
- [1. Install the GPU driver](#1-install-the-gpu-driver)
- [2. Unlocking features](#2-unlocking-features)
  - [2.1. Unlock WSL2](#21-unlock-wsl2)
  - [2.2. Unlock virtualization](#22-unlock-virtualization)
  - [2.3. Reboot](#23-reboot)
- [3. Update MS Linux kernel](#3-update-ms-linux-kernel)
- [4. Configure WSL 2](#4-configure-wsl-2)
- [5. Configure CUDA WSL-Ubuntu Toolkit](#5-configure-cuda-wsl-ubuntu-toolkit)
- [6. Install Docker](#6-install-docker)
- [7. Install `cog` and pull an image](#7-install-cog-and-pull-an-image)
- [8. Run a model in WSL 2](#8-run-a-model-in-wsl-2)
- [9. References](#9-references)

Running cog on Windows is now possible thanks to WSL 2. Follow this guide to enable WSL 2 and GPU passthrough on Windows 11.

**Windows 10 is not officially supported, as you need to be on an insider build in order to use GPU passthrough.**

## 0. Prerequisites

Before beginning installation, make sure you have:

- Windows 11.
- NVIDIA GPU.
  - RTX 2000/3000 series
  - Kesler/Tesla/Volta/Ampere series
  - Other configurations are not guaranteed to work.

## 1. Install the GPU driver

Per NVIDIA, the first order of business is to install the latest Game Ready drivers for your NVIDIA GPU.

<https://www.nvidia.com/download/index.aspx>

I have an NVIDIA RTX 2070 Super, so filled out the form as such:

![a form showing the correct model number selected for an RTX 2070 Super](images/nvidia_driver_select.png)

Click "search", and follow the dialogue to download and install the driver.

Restart your computer once the driver has finished installation.

## 2. Unlocking features

Open Windows Terminal as an administrator.

- Use start to search for "Terminal"
- Right click -> Run as administrator...

Run the following powershell command to enable the Windows Subsystem for Linux and Virtual Machine Platform capabilities.

### 2.1. Unlock WSL2

```powershell
dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart
```

If you see an error about permissions, make sure the terminal you are using is run as an administrator and that you have an account with administrator-level privileges.

### 2.2. Unlock virtualization

```powershell
dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart
```

If this command fails, make sure to [enable virtualization capabilities](https://docs.microsoft.com/en-us/windows/wsl/troubleshooting#error-0x80370102-the-virtual-machine-could-not-be-started-because-a-required-feature-is-not-installed) in your computer's BIOS/UEFI. A successful output will print `The operation completed successfully.`

![Output from running the above commands successfully. Should read "The operation completed successfully".](images/enable_feature_success.png)

### 2.3. Reboot

Before moving forward, make sure you reboot your computer so that Windows 11 will have WSL2 and virtualization available to it.

## 3. Update MS Linux kernel

Download and run the [WSL2 Linux kernel update package for x64 machines](https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi) msi installer. When prompted for elevated permissions, click 'yes' to approve the installation.

To ensure you are using the correct WSL kernel, `open Windows Terminal as an administrator` and enter:

```powershell
wsl cat /proc/version
```

This will return a complicated string such as:

```sh
Linux version 5.10.102.1-microsoft-standard-WSL2 (oe-user@oe-host) (x86_64-msft-linux-gcc (GCC) 9.3.0, GNU ld (GNU Binutils) 2.34.0.20200220)
```

The version we are interested in is `Linux version 5.10.102.1`. At this point, you should have updated your kernel to be at least `Linux version 5.10.43.3`.

If you can't get the correct kernel version to show:

Open `Settings` → `Windows Update` → `Advanced options` and ensure `Receive updates for other Microsoft products` is enabled. Then go to `Windows Update` again and click `Check for updates`.

## 4. Configure WSL 2

First, configure Windows to use the virtualization-based version of WSL (version 2) by default. In a Windows Terminal with administrator privileges, type the following:

```powershell
wsl --set-default-version 2
```

Now, you will need to go to the Microsoft Store and [Download Ubuntu 18.04](https://www.microsoft.com/store/apps/9N9TNGVNDL3Q)

![Screenshot showing the "Ubuntu" store page](https://docs.microsoft.com/en-us/windows/wsl/media/ubuntustore.png)

Launch the "Ubuntu" app available in your Start Menu. Linux will require its own user account and password, which you will need to enter now:

![a terminal showing input for user account info on WSL 2](https://docs.microsoft.com/en-us/windows/wsl/media/ubuntuinstall.png)

## 5. Configure CUDA WSL-Ubuntu Toolkit

By default, a shimmed version of the CUDA tooling is provided by your Windows GPU drivers.

Important: you should _never_ use instructions for installing CUDA-toolkit in a generic linux fashion. in WSL 2, you _always_ want to use the provided `CUDA Toolkit using WSL-Ubuntu Package`.

First, open PowerShell or Windows Command Prompt in administrator mode
by right-clicking and selecting "Run as administrator".
Then enter the following command:

```powershell
wsl.exe
```

This should drop you into your running linux VM. Now you can run the following bash commands to install the correct version of cuda-toolkit for WSL-Ubuntu. Note that the version of CUDA used below may not be the version of CUDA your GPU supports.

```sh
sudo apt-key del 7fa2af80 # if this line fails, you may remove it.
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda-repo-wsl-ubuntu-11-7-local_11.7.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-11-7-local_11.7.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-11-7-local/cuda-B81839D3-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-11-7
```

## 6. Install Docker

Download and install [Docker Desktop for Windows](https://desktop.docker.com/win/main/amd64/Docker%20Desktop%20Installer.exe). It has WSL 2 support built in by default.

Once installed, run `Docker Desktop`, you can ignore the first-run tutorial. Go to **Settings → General** and ensure **Use the WSL 2 based engine** has a checkmark next to it. Click **Apply & Restart**.

!["Use the WSL 2 based engine" is checked in this interface](images/wsl2-enable.png)

Reboot your computer one more time.

## 7. Install `cog` and pull an image

Open Windows Terminal and enter your WSL 2 VM:

```powershell
wsl.exe
```

Download and install `cog` inside the VM:

```bash
sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
sudo chmod +x /usr/local/bin/cog
```

Make sure it's available by typing:

```bash
which cog # should output /usr/local/bin/cog
cog --version # should output the cog version number.
```

## 8. Run a model in WSL 2

Finally, make sure it works. Let's try running `afiaka87/glid-3-xl` locally:

```bash
cog run 'r8.im/afiaka87/glid-3-xl' -i prompt="a fresh avocado floating in the water" -o output.json
```

![Output from a running cog run in Windows Terminal](images/cog_model_output.png)

While your run is executing, you can use `Task Manager` to keep an eye on GPU memory consumption:

![Windows task manager will show the shared host/guest GPU memory](images/memory-usage.png)

This model just barely manages to fit under 8 GB of VRAM.

Notice that output is returned as JSON for this model as it has a complex return type. You will want to convert the base64 string in the json array to an image.

`jq` can help with this:

```sh
sudo apt install jq
```

The following bash uses `jq` to grab the first element in our output array and converts it from a base64 string to a `png` file.

```bash
jq -cs '.[0][0][0]' output.json | cut --delimiter "," --field 2 | base64 --ignore-garbage --decode > output.png
```

When using WSL 2, you can access Windows binaries with the `.exe` extension. This lets you open photos easily within linux.

```bash
explorer.exe output.png
```

![a square image of an avocado, generated by the model](images/glide_out.png)

## 9. References

- <https://docs.nvidia.com/cuda/wsl-user-guide/index.html>
- <https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0>
- <https://www.docker.com/blog/wsl-2-gpu-support-for-docker-desktop-on-nvidia-gpus/>
- <https://docs.microsoft.com/en-us/windows/wsl/install-manual#step-4---download-the-linux-kernel-update-package>
- <https://github.com/replicate/cog>


---

# `cog.yaml` reference

`cog.yaml` defines how to build a Docker image and how to run your model inside that image.

It has three keys: [`build`](#build), [`image`](#image), and [`run`](#run). It looks a bit like this:

```yaml
build:
  python_version: "3.13"
  python_requirements: requirements.txt
  system_packages:
    - "ffmpeg"
    - "git"
run: "run.py:Runner"
```

Tip: Run [`cog init`](getting-started-own-model.md#initialization) to generate an annotated `cog.yaml` file that can be used as a starting point for setting up your model.

## `build`

This stanza describes how to build the Docker image your model runs in. It contains various options within it:

<!-- Alphabetical order, please! -->

### `cuda`

Cog automatically picks the correct version of CUDA to install, but this lets you override it for whatever reason by specifying the minor (`11.8`) or patch (`11.8.0`) version of CUDA to use.

For example:

```yaml
build:
  cuda: "11.8"
```

### `gpu`

Enable GPUs for this model. When enabled, the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) base image will be used, and Cog will automatically figure out what versions of CUDA and cuDNN to use based on the version of Python, PyTorch, and Tensorflow that you are using.

For example:

```yaml
build:
  gpu: true
```

When you use `cog exec` or `cog run`, Cog will automatically pass the `--gpus=all` flag to Docker. When you run a Docker image built with Cog, you'll need to pass this option to `docker run`.

### `python_requirements`

A pip requirements file specifying the Python packages to install. For example:

```yaml
build:
  python_requirements: requirements.txt
```

Your `cog.yaml` file can set either `python_packages` or `python_requirements`, but not both. Use `python_requirements` when you need to configure options like `--extra-index-url` or `--trusted-host` to fetch Python package dependencies.

This follows the standard [requirements.txt](https://pip.pypa.io/en/stable/reference/requirements-file-format/) format.

To install Git-hosted Python packages, add `git` to the `system_packages` list, then use the `git+https://` syntax to specify the package name. For example:

`cog.yaml`:

```yaml
build:
  system_packages:
    - "git"
  python_requirements: requirements.txt
```

`requirements.txt`:

```
git+https://github.com/huggingface/transformers
```

You can also pin Python package installations to a specific git commit:

`cog.yaml`:

```yaml
build:
  system_packages:
    - "git"
  python_requirements: requirements.txt
```

`requirements.txt`:

```
git+https://github.com/huggingface/transformers@2d1602a
```

Note that you can use a shortened prefix of the 40-character git commit SHA, but you must use at least six characters, like `2d1602a` above.

### `python_packages`

**DEPRECATED**: This will be removed in future versions, please use [python_requirements](#python_requirements) instead.

A list of Python packages to install from the PyPi package index, in the format `package==version`. For example:

```yaml
build:
  python_packages:
    - pillow==8.3.1
    - tensorflow==2.5.0
```

Your `cog.yaml` file can set either `python_packages` or `python_requirements`, but not both.

### `python_version`

The minor (`3.13`) or patch (`3.13.1`) version of Python to use. For example:

```yaml
build:
  python_version: "3.13.1"
```

Cog supports Python 3.10, 3.11, 3.12, and 3.13. If you don't define a version, Cog will use the latest version of Python 3.13 or a version of Python that is compatible with the versions of PyTorch or TensorFlow you specify.

Note that these are the versions supported **in the Docker container**, not your host machine. You can run any version(s) of Python you wish on your host machine.

### `run`

A list of setup commands to run in the environment after your system packages and Python packages have been installed. If you're familiar with Docker, it's like a `RUN` instruction in your `Dockerfile`.

For example:

```yaml
build:
  run:
    - curl -L https://github.com/cowsay-org/cowsay/archive/refs/tags/v3.7.0.tar.gz | tar -xzf -
    - cd cowsay-3.7.0 && make install
```

Your code is _not_ available to commands in `run`. This is so we can build your image efficiently when running locally.

Each command in `run` can be either a string or a dictionary in the following format:

```yaml
build:
  run:
    - command: pip install
      mounts:
        - type: secret
          id: pip
          target: /etc/pip.conf
```

You can use secret mounts to securely pass credentials to setup commands, without baking them into the image. For more information, see [Dockerfile reference](https://docs.docker.com/engine/reference/builder/#run---mounttypesecret).

### `sdk_version`

Pin the version of the cog Python SDK installed in the container. Accepts a [PEP 440](https://peps.python.org/pep-0440/) version string. When omitted, the latest release is installed.

```yaml
build:
  python_version: "3.13"
  sdk_version: "0.18.0"
```

Pre-release versions are also supported:

```yaml
build:
  sdk_version: "0.18.0a1"
```

When a pre-release `sdk_version` is set, `--pre` is automatically passed to the pip install commands for both `cog` and `coglet`, so pip will resolve matching pre-release packages.

The minimum supported version is `0.16.0`. Specifying an older version will cause `cog build` to fail with an error.

The `COG_SDK_WHEEL` environment variable takes precedence over `sdk_version`. See [Environment variables](./environment.md) for details.

### `system_packages`

A list of Ubuntu APT packages to install. For example:

```yaml
build:
  system_packages:
    - "ffmpeg"
    - "libavcodec-dev"
```

## `concurrency`

> Added in cog 0.14.0.

This stanza describes the concurrency capabilities of the model. It has one option:

### `max`

The maximum number of concurrent runs the model can process. If this is set, the model must specify an [async `run()` method](python.md#async-runners-and-concurrency).

For example:

```yaml
concurrency:
  max: 10
```

## `image`

The name given to built Docker images. If you want to push to a registry, this should also include the registry name.

For example:

```yaml
image: "r8.im/your-username/your-model"
```

r8.im is Replicate's registry, but this can be any Docker registry.

If you don't set this, then a name will be generated from the directory name.

If you set this, then you can run `cog push` without specifying the model name.

If you specify an image name argument when pushing (like `cog push your-username/custom-model-name`), the argument will be used and the value of `image` in cog.yaml will be ignored.

## `run`

The pointer to the `Runner` object in your code, which defines how runs are executed on your model.

For example:

```yaml
run: "run.py:Runner"
```

`predict:` is still accepted for existing projects, but it is deprecated. New projects should use `run:`.

See [the Python API documentation for more information](python.md).

## `predict`

Deprecated compatibility field for [`run`](#run). Existing projects can continue using it, but Cog will warn and `cog doctor --fix` can migrate common projects to `run:`.

For example:

```yaml
predict: "predict.py:Predictor"
```

See [the Python API documentation for more information](python.md).
