Metadata-Version: 2.4
Name: cherry-pipelines
Version: 0.0.2
Summary: A collection of blockchain data pipelines built with cherry
License-Expression: MIT OR Apache-2.0
License-File: LICENSE-APACHE
License-File: LICENSE-MIT
Requires-Python: >=3.11
Requires-Dist: cherry-core>=0.5.7
Requires-Dist: cherry-etl>=0.6.4
Requires-Dist: clickhouse-connect>=0.8.17
Requires-Dist: dotenv>=0.9.9
Requires-Dist: pyarrow-stubs>=19.3
Requires-Dist: pyarrow>=20.0.0
Description-Content-Type: text/markdown

# cherry-pipelines

This is a collection of pipelines that are built using [cherry](https://github.com/steelcake/cherry) and ClickHouse materialized views.

All data is stored in ClickHouse.

## Python version

This project is meant to be run with Python 3.12

If you are using `uv` for development it should pick this up automatically because of the `.python-version` in the project root.

The docker image is configured to use this version of Python as well.

## Running a pipeline 

Use the `main` script to run a pipeline:

```bash
uv run scripts/main.py
```

It takes these parameters as environment variables:

- `CHERRY_PIPELINE_KIND`, "evm" or "svm".
- `CHERRY_PIPELINE_NAME`, name of the pipeline to run e.g. "erc20_transfers".
- `CHERRY_FROM_BLOCK`, specify the block that the indexing should start from. defaults to 0.
- `CHERRY_TO_BLOCK`, specify the block that the indexing should stop at. has no default. Indexing waits for new blocks
- `CHERRY_EVM_PROVIDER_KIND`, specify which provider to use when indexing evm chains. Can be `hypersync` or `sqd`. Has no default an is required when indexing evm.
- `CHERRY_EVM_CHAIN_ID`, specify the chain_id when indexing an evm chain. has no default and is required when indexing evm.
when it reaches the tip of the chain if this argument is left empty.
- `CLICKHOUSE_HOST`, defaults to `127.0.0.1`.
- `CLICKHOUSE_PORT`, defaults to `8123`.
- `CLICKHOUSE_USER`, defaults to `default`.
- `CLICKHOUSE_PASSWORD`, defaults to empty string,
- `RUST_LOG` as explained in [env-logger docs](https://docs.rs/env_logger/latest/env_logger/#enabling-logging)
- `PY_LOG` as explained in [python logging docs](https://docs.python.org/3/howto/logging.html). Defaults to "INFO"

An `.env` file placed in the project root can be used to define these for development.

## Running with docker

We publish a docker image that runs the `main` script.

## Dev Setup

Run the docker-compose file to start a clickhouse instance for development.

```bash
docker-compose up -d
```

Run this to delete the data on disk:
```bash
docker-compose down -v
```

And this to stop the container without deleting the data:
```bash
docker-compose down
```

## Development

This repo uses `uv` for development.

- Format the code with `uv run ruff format`
- Lint the code with `uv run ruff check`
- Run type checks with `uv run pyright`
- Run the tests with `uv run pytest`

## Data Provider

All svm pipelines use `SQD`.

All evm pipelines are configurable using the `CHERRY_EVM_PROVIDER_KIND` env variable.

## Table definitions

Automatic table creation features of cherry aren't used and table definitions are managed separately. 

## Materialized Views

Materialized views are defined in SQL files with an accompanying script that deploys them.

## EVM multi-chain structure

The evm pipelines are multi-chain and index multiple blockchains in parallel.

All chains are written to their own tables. For example the table for erc20 transfers would have a table named
`erc20_chain1` for ethereum and `erc20_chain10` for optimism.

Specify the `CHERRY_EVM_CHAIN_ID` env variable to set the chain you want to index when indexing evm.

## License

Licensed under either of

 * Apache License, Version 2.0
   ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
 * MIT license
   ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

## Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be
dual licensed as above, without any additional terms or conditions.

## Sponsors

[<img src="https://steelcake.com/envio-logo.png" width="150px" />](https://envio.dev)
[<img src="https://steelcake.com/sqd-logo.png" width="165px" />](https://sqd.ai)
[<img src="https://steelcake.com/space-operator-logo.webp" height="75px" />](https://linktr.ee/spaceoperator)
