Metadata-Version: 2.4
Name: paradigm_absorb
Version: 0.3.0
Summary: python interface for interacting with flashbots mempool dumpster
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Typing :: Typed
License-File: LICENSE-APACHE
License-File: LICENSE-MIT
Requires-Dist: polars >= 1.0
Requires-Dist: rclone-python>=0.1.23
Requires-Dist: requests>=2.0
Requires-Dist: rich>=13.9.4
Requires-Dist: rich_argparse >= 1.5.2
Requires-Dist: toolstr >= 0.9.11
Requires-Dist: tooltime >= 0.3.4
Requires-Dist: typing-extensions >=4.9.0 ; extra == "all"
Requires-Dist: dune-spice>=0.2.7 ; extra == "all"
Requires-Dist: google-auth>=2.40.3 ; extra == "all"
Requires-Dist: google-cloud-bigquery>=3.34.0 ; extra == "all"
Requires-Dist: google-cloud>=0.34.0 ; extra == "all"
Requires-Dist: google>=3.0.0 ; extra == "all"
Requires-Dist: nitwit>=0.1.1 ; extra == "all"
Requires-Dist: paradigm-garlic>=0.1.3 ; extra == "all"
Requires-Dist: ipython>=8.18.1 ; extra == "all"
Requires-Dist: ipdb>=0.13.13 ; extra == "all"
Requires-Dist: dune-spice>=0.2.7 ; extra == "datasources"
Requires-Dist: google-auth>=2.40.3 ; extra == "datasources"
Requires-Dist: google-cloud-bigquery>=3.34.0 ; extra == "datasources"
Requires-Dist: google-cloud>=0.34.0 ; extra == "datasources"
Requires-Dist: google>=3.0.0 ; extra == "datasources"
Requires-Dist: nitwit>=0.1.1 ; extra == "datasources"
Requires-Dist: paradigm-garlic>=0.1.3 ; extra == "datasources"
Requires-Dist: ipython>=8.18.1 ; extra == "interactive"
Requires-Dist: ipdb>=0.13.13 ; extra == "interactive"
Requires-Dist: typing-extensions >=4.9.0 ; extra == "test"
Project-URL: Documentation, https://github.com/paradigmxyz/absorb
Project-URL: Source, https://github.com/paradigmxyz/absorb
Provides-Extra: all
Provides-Extra: datasources
Provides-Extra: interactive
Provides-Extra: test

![image](https://github.com/user-attachments/assets/7323b83e-fc5b-496c-b67b-bad6a188873b)

# absorb 🧽🫧🫧

*the sovereign dataset manager*

`absorb` makes it easy to 1) collect, 2) query, 3) manage, and 4) customize datasets from nearly any data source

## Features
- **limitless dataset library**: access to millions of datasets across 20+ diverse data sources
- **intuitive cli+python interfaces**: collect or query any dataset in a single line of code
- **maximal modularity**: built on open standards for frictionless integration with other tools
- **easy extensibility**: add new datasets or data sources with just a few lines of code

## Contents
1. [Installation](#installation)
2. [Example Usage](#example-usage)
    1. [Command Line](#example-command-line-usage)
    2. [Python](#example-python-usage)
3. [Supported Data Sources](#supported-data-sources)
4. [Output Format](#output-format)
5. [Configuration](#configuration)


## Installation

basic installation
```bash
uv tool install paradigm_absorb
```

install with all extras
```bash
uv tool install paradigm_absorb[test,datasources,interactive]
```

install from source
```bash
git clone git@github.com:paradigmxyz/absorb.git
uv tool install --editable .[test,datasources,interactive]
```


## Example Usage

#### Example Command Line Usage

```bash
# collect dataset and save as local files
absorb collect kalshi

# list datasets that are collected or available
absorb ls

# show schemas of dataset
absorb schema kalshi

# create new custom dataset
absorb new custom_dataset

# upload custom dataset
absorb upload custom_dataset
```

#### Example Python Usage

```python
import absorb

# collect dataset and save as local files
absorb.collect('kalshi')

# list datasets that are collected or available
datasets = absorb.list()

# get schemas of dataset
schema = absorb.schema('kalshi')

# load dataset as polars DataFrame
df = absorb.load('kalshi')

# scan dataset as polars LazyFrame
lf = absorb.scan('kalshi')

# create new custom dataset
absorb.new('custom_dataset')

# upload custom dataset
absorb.upload('custom_dataset')
```


## Supported Data Sources

`absorb` collects data from each of these sources:

- [4byte](https://www.4byte.directory) function and event signatures
- [allium](https://www.allium.so) crypto data platform
- [bigquery](https://cloud.google.com/blockchain-analytics/docs/supported-datasets) crypto ETL datasets
- [binance](https://data.binance.vision) trades and OHLC candles on the Binance CEX
- [blocknative](https://docs.blocknative.com/data-archive/mempool-archive) Ethereum mempool archive
- [chain_ids](https://github.com/ethereum-lists/chains) chain id's
- [coingecko](https://www.coingecko.com/) token prices
- [cryo](https://github.com/paradigmxyz/cryo) EVM datasets
- [defillama](https://defillama.com) DeFi data
- [dune](https://dune.com) tables and queries
- [fred](https://fred.stlouisfed.org) federal macroeonomic data
- [git](https://git-scm.com) commits, authors, and file diffs of a repo
- [growthepie](https://www.growthepie.xyz) L2 metrics
- [kalshi](https://kalshi.com) prediction market metrics
- [l2beat](https://l2beat.com) L2 metrics
- [mempool dumpster](https://mempool-dumpster.flashbots.net) Ethereum mempool archive
- [snowflake](https://www.snowflake.com/) generalized data platform
- [sourcify](https://sourcify.dev) verified contracts
- [tic](https://ticdata.treasury.gov) usa treasury department data
- [tix](https://github.com/paradigmxyz/tix) price feeds
- [vera](https://verifieralliance.org) verified contract archives
- [xatu](https://github.com/ethpandaops/xatu-data) many Ethereum datasets

To list all available datasets and data sources, type `absorb ls` on the command line.


## Output Format

To display information about the schema and other metadata of a dataset, type `absorb help <DATASET>` on the command line.

`absorb` stores each dataset as a collection of parquet files.

Datasets can be stored in any location on your disks, and absorb will use symlinks to organize those files in the `ABSORB_ROOT` tree.

the `ABSORB_ROOT` filesystem directory is organized as:

```
{ABSORB_ROOT}/
    datasets/
        <source>/
            tables/
                <datatype>/
                    {filename}.parquet
                table_metadata.json
            repos/
                {repo_name}/
    absorb_config.json
```

## Configuration

`absorb` uses a config file to specify which datasets to track.

Schema of `absorb_config.json`:

```python
{
    'tracked_tables': list[TableDict]
}
```

schema of `dataset_config.json`:

```python
{
    "name": str,
    "definition": str,
    "parameters": dict[str, Any],
    "repos": [str]
}
```

