Metadata-Version: 2.4
Name: gapless-network-data
Version: 5.1.0
Summary: Ethereum blockchain network metrics collection infrastructure with zero-gap guarantee (multi-chain support planned)
Project-URL: Homepage, https://github.com/terrylica/gapless-network-data
Project-URL: Documentation, https://github.com/terrylica/gapless-network-data#readme
Project-URL: Repository, https://github.com/terrylica/gapless-network-data
Project-URL: Issues, https://github.com/terrylica/gapless-network-data/issues
Author-email: Terry Li <terry@eonlabs.com>
License: MIT
License-File: LICENSE
Keywords: bitcoin,blockchain,cryptocurrency,data-collection,ethereum,fee-estimation,gas-prices,llamarpc,mempool,multi-chain,network-metrics
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Requires-Dist: clickhouse-connect>=0.10.0
Requires-Dist: httpx>=0.28.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: polars>=1.0.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: tenacity>=9.0.0
Requires-Dist: web3>=7.0.0
Provides-Extra: dev
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pre-commit>=3.6.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.3.0; extra == 'dev'
Description-Content-Type: text/markdown

# Gapless Network Data

Ethereum blockchain network metrics for financial time series forecasting.

## Quick Start

```python
import gapless_network_data as gmd

# Fetch latest blocks (recommended for live trading)
df = gmd.fetch_blocks(limit=1000)

# Compute block utilization (alpha feature #2)
df['utilization'] = df['gas_used'] / df['gas_limit']

# Date range query (inclusive [start, end])
# Returns all blocks from Jan 1 through Jan 31 (both dates included)
df = gmd.fetch_blocks(start='2024-01-01', end='2024-01-31')

# Start-only: all blocks from Jan 1 to most recent
df = gmd.fetch_blocks(start='2024-01-01')

# End-only: all blocks from genesis to Jan 31
df = gmd.fetch_blocks(end='2024-01-31')

# Same-day query (returns all blocks on March 13)
df = gmd.fetch_blocks(start='2024-03-13', end='2024-03-13')

# Second-precision query (explicit times supported)
df = gmd.fetch_blocks(start='2024-03-13 12:00:00', end='2024-03-13 12:05:00')
```

## Installation

```bash
pip install gapless-network-data
```

## AI Agent Discoverability

This package is optimized for Claude Code CLI and AI coding agents.

### Probe Module

```python
import gapless_network_data as gmd

# Get ranked features for ML pipelines
features = gmd.probe.get_alpha_features()
for f in features[:5]:
    print(f"{f.rank}. {f.name} ({f.importance})")

# Get protocol era boundaries
eras = gmd.probe.get_protocol_eras()

# Get setup workflow
workflow = gmd.probe.get_setup_workflow()

# Get example code
print(gmd.probe.get_quick_start())
```

## Alpha Feature Rankings

AI agents should prioritize these features for ML pipelines:

| Rank | Feature           | Importance | Formula              |
| ---- | ----------------- | ---------- | -------------------- |
| 1    | base_fee_per_gas  | critical   | raw                  |
| 2    | block_utilization | critical   | gas_used / gas_limit |
| 3    | transaction_count | high       | raw                  |
| 4    | timestamp         | high       | raw                  |
| 5    | number            | high       | raw                  |
| 6    | size              | medium     | raw                  |
| 7    | blob_gas_used     | medium     | raw (post-EIP4844)   |
| 8    | excess_blob_gas   | low        | raw (post-EIP4844)   |
| 9    | gas_limit         | low        | raw                  |
| 10   | gas_used          | low        | raw                  |

Get rankings programmatically: `gmd.probe.get_alpha_features()`

## Protocol Era Boundaries

Filter data appropriately based on protocol changes:

- **EIP-1559** (block 12,965,000, Aug 2021): base_fee_per_gas introduced
- **The Merge** (block 15,537,394, Sep 2022): difficulty=0 forever
- **EIP-4844** (block 19,426,587, Mar 2024): blob_gas fields introduced

Get eras programmatically: `gmd.probe.get_protocol_eras()`

## API Reference

### fetch_blocks()

```python
gmd.fetch_blocks(
    start: str | None = None,     # ISO 8601 date (inclusive)
    end: str | None = None,       # ISO 8601 date (inclusive for date-only)
    limit: int | None = None,     # Max blocks (0 = empty DataFrame)
    include_deprecated: bool = False  # Include difficulty fields
) -> pd.DataFrame
```

**Date Range Semantics (inclusive [start, end]):**

- Date-only inputs include the entire day: `end='2024-03-13'` includes all of March 13
- Explicit times are preserved: `end='2024-03-13 12:00:00'` excludes blocks after noon
- Same-day queries work: `start='2024-03-13', end='2024-03-13'` returns all blocks on March 13

**Parameter Requirements:**

- At least one of `start`, `end`, or `limit` must be specified
- Empty strings (`""`) are rejected — use `None` to omit
- `start` must be ≤ `end` if both provided
- `limit=0` returns empty DataFrame (0 rows, not entire blockchain)

Returns pandas DataFrame with columns:

- timestamp (datetime64[ns, UTC])
- number (uint64)
- gas_limit, gas_used, base_fee_per_gas, transaction_count, size (uint64)
- blob_gas_used, excess_blob_gas (Int64, nullable - pd.NA for pre-EIP4844)

### Deprecated Fields

Excluded by default (use `include_deprecated=True` for pre-Merge analysis):

- `difficulty`: Always 0 post-Merge (Sep 2022)
- `total_difficulty`: Frozen post-Merge

## Setup

Credentials via .env file (simplest), Doppler (recommended for teams), or environment variables.

### Environment Variables

| Variable                       | Description               |
| ------------------------------ | ------------------------- |
| `CLICKHOUSE_HOST_READONLY`     | ClickHouse Cloud hostname |
| `CLICKHOUSE_USER_READONLY`     | Read-only username        |
| `CLICKHOUSE_PASSWORD_READONLY` | Password                  |

```bash
# Option 1: .env file (simplest for small teams)
# Create .env in your project root:
CLICKHOUSE_HOST_READONLY=<host>
CLICKHOUSE_USER_READONLY=<user>
CLICKHOUSE_PASSWORD_READONLY=<password>

# Option 2: Doppler (recommended for production)
doppler configure set token <token_from_1password>
doppler setup --project gapless-network-data --config prd

# Option 3: Environment variables
export CLICKHOUSE_HOST_READONLY=<host>
export CLICKHOUSE_USER_READONLY=<user>
export CLICKHOUSE_PASSWORD_READONLY=<password>
```

Get setup instructions: `gmd.probe.get_setup_workflow()`

## Time Precision

- **Timestamp storage**: Millisecond precision (DateTime64(3))
- **Block granularity**: ~12 second intervals (Ethereum block time)
- **Query precision**: Second-level supported for start/end parameters

**Supported timestamp formats:**

| Format            | Example                     | Behavior                          |
| ----------------- | --------------------------- | --------------------------------- |
| Date-only         | `'2024-03-13'`              | Expands to include full day       |
| Date + time       | `'2024-03-13 12:30:45'`     | Preserved exactly                 |
| ISO 8601          | `'2024-03-13T12:30:45'`     | Preserved exactly                 |
| With milliseconds | `'2024-03-13 12:30:45.123'` | Preserved (truncated to 3 digits) |

## Data Coverage

- **Blocks**: 23.87M Ethereum blocks (2015-2025)
- **Update frequency**: Real-time (~12 second intervals)
- **Storage**: ClickHouse Cloud (AWS)
- **Deduplication**: Automatic via ReplacingMergeTree

## Exceptions

All exceptions include structured context (timestamp, endpoint, HTTP status):

**Credential & Database:**

- `CredentialException`: Credential resolution failed
- `DatabaseException`: ClickHouse query failed

**Parameter Validation (fetch_blocks):**

- `ValueError`: Empty string for start/end (use `None` to omit)
- `ValueError`: No parameters specified (must have start, end, or limit)
- `ValueError`: Reversed date range (start > end)

## Feature Engineering Integration

Combine with OHLCV price data:

```python
import gapless_crypto_data as gcd
import gapless_network_data as gmd

# Fetch both data sources
df_ohlcv = gcd.get_data(symbol="ETHUSDT", timeframe="1m", start_date="2024-01-01")
df_blocks = gmd.fetch_blocks(start="2024-01-01", end="2024-01-02")

# Temporal alignment (forward-fill prevents data leakage)
df_blocks_aligned = df_blocks.set_index('timestamp').reindex(
    df_ohlcv.index, method='ffill'
)

# Join and engineer features
df = df_ohlcv.join(df_blocks_aligned)
df['gas_pressure'] = df['base_fee_per_gas'] / df['base_fee_per_gas'].rolling(60).median()
df['block_utilization'] = df['gas_used'] / df['gas_limit']
```

## Infrastructure (Reference)

Dual-pipeline architecture for production reliability:

| Component           | Purpose                          | Technology       |
| ------------------- | -------------------------------- | ---------------- |
| BigQuery Sync       | Hourly batch from public dataset | Cloud Run Job    |
| Real-Time Collector | Block-level streaming            | e2-micro VM      |
| Database            | Storage with deduplication       | ClickHouse Cloud |
| Monitoring          | Dead Man's Switch                | Healthchecks.io  |

## Related Projects

- [gapless-crypto-data](https://github.com/terrylica/gapless-crypto-data) - OHLCV data collection
- [BigQuery Ethereum Dataset](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=crypto_ethereum)

## Documentation

- [Architecture Overview](https://github.com/terrylica/gapless-network-data/blob/main/docs/architecture/README.md)
- [Data Format Specification](https://github.com/terrylica/gapless-network-data/blob/main/docs/architecture/DATA_FORMAT.md)

## License

[MIT License](https://github.com/terrylica/gapless-network-data/blob/main/LICENSE)
