Metadata-Version: 2.4
Name: pymetabase
Version: 0.1.0
Summary: A Python client and CLI for the Metabase API with automatic chunking for large dataset exports
Project-URL: Homepage, https://github.com/mokarimi19/pymetabase
Project-URL: Documentation, https://github.com/mokarimi19/pymetabase#readme
Project-URL: Repository, https://github.com/mokarimi19/pymetabase
Project-URL: Issues, https://github.com/mokarimi19/pymetabase/issues
Author-email: PyMetabase Contributors <pymetabase@users.noreply.github.com>
License: MIT
License-File: LICENSE
Keywords: analytics,api,business-intelligence,cli,data,etl,export,metabase
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.25.0
Requires-Dist: sqlglot>=20.0.0
Provides-Extra: dev
Requires-Dist: black>=23.0; extra == 'dev'
Requires-Dist: isort>=5.0; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: excel
Requires-Dist: openpyxl>=3.1.5; extra == 'excel'
Requires-Dist: pandas>=2.0.3; extra == 'excel'
Provides-Extra: parquet
Requires-Dist: pyarrow>=10.0.0; extra == 'parquet'
Description-Content-Type: text/markdown

# PyMetabase

A Python library for exporting data from Metabase with automatic chunking for large datasets.

## Features

- **Automatic Chunking**: Uses ROW_NUMBER() for efficient export of millions of rows
- **Multiple Formats**: Export to JSONL, JSON, or CSV
- **Resume Support**: Checkpoint files allow resuming interrupted exports
- **Retry Logic**: Automatic retry with exponential backoff
- **CLI Tool**: Command-line interface for quick exports
- **SQL Parsing**: Uses sqlglot for proper handling of CTEs and complex queries
- **Flexible Configuration**: Environment variables, config files, or direct parameters

## Installation

```bash
pip install pymetabase
```

## Quick Start

### Python API

```python
from pymetabase import Metabase

# Connect to Metabase
mb = Metabase(
    url="https://metabase.example.com",
    username="user@example.com",
    password="password"
)

# Export query results
with mb:
    result = mb.export(
        database="MyDatabase",
        query="SELECT * FROM users",
        output="users.jsonl"
    )

print(f"Exported {result.total_rows:,} rows in {result.duration_seconds:.1f}s")
```

### Using Credentials File

```python
from pymetabase import Metabase

mb = Metabase(credentials_file="credentials.json")

with mb:
    result = mb.export(
        database="Club",
        query="SELECT * FROM auth_level",
        output="export.jsonl"
    )
```

### Command Line

```bash
# Export a query
pymetabase export \
    -c credentials.json \
    --database Club \
    --query "SELECT * FROM users" \
    -o users.jsonl

# Export with SQL from file (supports multi-line queries)
pymetabase export \
    -c credentials.json \
    --database Club \
    --query-file query.sql \
    -o users.jsonl

# Export with custom chunk size
pymetabase export \
    -c credentials.json \
    --database Club \
    --query "SELECT * FROM users" \
    -o users.jsonl \
    --chunk-size 100000

# Export a table
pymetabase export-table \
    -c credentials.json \
    --database Club \
    --table users \
    --where "active = true" \
    -o active_users.csv

# Export table with custom chunk size
pymetabase export-table \
    -c credentials.json \
    --database Club \
    --table users \
    -o users.csv \
    --chunk-size 50000

# List databases
pymetabase list-databases -c credentials.json

# List tables
pymetabase list-tables -c credentials.json --database Club
```

## Remote Management (rclone-style)

PyMetabase supports rclone-style remote configuration for managing multiple Metabase servers.

### Configure a Remote

```bash
# Interactive configuration
pymetabase config

# This opens a menu:
#   1) Add new remote
#   2) Edit existing remote
#   3) Delete remote
```

### List Remotes

```bash
pymetabase listremotes

# Output:
# Configured remotes (2):
#   - production (default): https://metabase.example.com
#   - staging: https://staging.metabase.com
```

### Use a Remote

```bash
# Use specific remote
pymetabase -r production export -d Club -q "SELECT * FROM users" -o users.jsonl

# Use default remote (no -r needed)
pymetabase export -d Club -q "SELECT * FROM users" -o users.jsonl
```

### Select Default Remote

```bash
pymetabase selectremote

# Interactive selection:
# Select default remote:
#   1) production (current)
#   2) staging
# Enter number:
```

### Test Connection

```bash
# Test specific remote
pymetabase testremote production

# Test default remote
pymetabase testremote
```

### Show/Delete Remote

```bash
# Show remote config (password hidden)
pymetabase showremote production

# Delete remote
pymetabase deleteremote staging
```

### First-time Setup

When no remotes are configured and you run an export command, PyMetabase will prompt you to set up a remote:

```bash
$ pymetabase export -d Club -q "SELECT * FROM users" -o users.jsonl
No remotes configured. Set up now? [Y/n] y
Remote name: production
Server URL: https://metabase.example.com
Username: user@example.com
Password: ****
Set as default? [Y/n] y
Remote 'production' created and set as default.
```

### Config File Location

Remotes are stored in:
- **macOS/Linux**: `~/.config/pymetabase/config.json`
- **Windows**: `%APPDATA%\pymetabase\config.json`

### Using Remotes in Python

```python
from pymetabase import Metabase, RemoteManager

# Load from configured remote
manager = RemoteManager()
remote = manager.get_remote("production")

mb = Metabase(
    url=remote.url,
    username=remote.username,
    password=remote.password
)

with mb:
    result = mb.export(
        database="Club",
        query="SELECT * FROM users",
        output="users.jsonl"
    )
```

## Configuration

### Credentials File (credentials.json)

```json
[
  {
    "SERVER_NAME": "https://metabase.example.com",
    "USERNAME": "user@example.com",
    "PASSWORD": "password"
  }
]
```

### Config File (config.yaml)

```yaml
metabase:
  url: https://metabase.example.com
  username: user@example.com
  password: password

defaults:
  chunk_size: 500000
  auto_chunk_threshold: 1000000
  format: jsonl

retry:
  max_retries: 3
  delay: 1.0
  timeout: 600

logging:
  level: INFO
  file: export.log
```

### Environment Variables

```bash
export METABASE_URL=https://metabase.example.com
export METABASE_USERNAME=user@example.com
export METABASE_PASSWORD=password
export METABASE_CHUNK_SIZE=500000
```

## Advanced Usage

### SQL Query Files

You can store complex, multi-line SQL queries in `.sql` or `.txt` files and reference them in your exports:

**query.sql:**
```sql
-- Complex query with CTEs
WITH active_users AS (
    SELECT
        id,
        name,
        email,
        created_at
    FROM users
    WHERE active = true
),
recent_orders AS (
    SELECT
        user_id,
        COUNT(*) as order_count,
        SUM(total) as total_spent
    FROM orders
    WHERE created_at >= '2024-01-01'
    GROUP BY user_id
)
SELECT
    u.id,
    u.name,
    u.email,
    COALESCE(o.order_count, 0) as orders,
    COALESCE(o.total_spent, 0) as spent
FROM active_users u
LEFT JOIN recent_orders o ON u.id = o.user_id
ORDER BY o.total_spent DESC
```

**Usage:**
```bash
# Use -Q or --query-file
pymetabase export -d Club -Q query.sql -o results.jsonl

# Works with remotes too
pymetabase -r production export -d Club --query-file query.sql -o results.jsonl
```

### Custom Chunk Sizes

Control the number of rows per chunk for optimal performance:

```bash
# Small chunks for memory-constrained environments
pymetabase export -d Club -q "SELECT * FROM big_table" -o data.jsonl --chunk-size 50000

# Large chunks for better performance on fast networks
pymetabase export -d Club -q "SELECT * FROM users" -o data.jsonl --chunk-size 1000000
```

**In Python:**
```python
with mb:
    result = mb.export(
        database="Club",
        query="SELECT * FROM big_table",
        output="data.jsonl",
        chunk_size=100000  # Custom chunk size
    )
```

### Export with Progress Callback

```python
def progress(current, total, rate):
    pct = current / total * 100
    print(f"Progress: {pct:.1f}% ({current:,}/{total:,}) - {rate:.0f} rows/sec")

with mb:
    result = mb.export(
        database="Club",
        query="SELECT * FROM large_table",
        output="export.jsonl",
        progress_callback=progress
    )
```

### Resume Interrupted Export

```python
with mb:
    result = mb.export(
        database="Club",
        query="SELECT * FROM large_table",
        output="export.jsonl",
        checkpoint_file="export_progress.json"
    )
```

If the export is interrupted, run the same command again and it will resume from where it left off.

### Export Table with Filtering

```python
with mb:
    result = mb.export_table(
        database="Club",
        table="users",
        output="active_users.csv",
        columns=["id", "name", "email"],
        where="active = true AND created_at > '2024-01-01'",
        order_by="created_at DESC",
        limit=10000
    )
```

### Complex Queries (CTEs)

PyMetabase uses sqlglot to properly handle complex queries:

```python
query = """
WITH active_users AS (
    SELECT * FROM users WHERE active = true
),
user_orders AS (
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    GROUP BY user_id
)
SELECT u.*, o.order_count
FROM active_users u
LEFT JOIN user_orders o ON u.id = o.user_id
"""

with mb:
    result = mb.export(
        database="MyDB",
        query=query,
        output="user_orders.jsonl"
    )
```

## Output Formats

### JSONL (JSON Lines) - Default

Each row is a separate JSON object on its own line:

```jsonl
{"id": 1, "name": "Alice", "email": "alice@example.com"}
{"id": 2, "name": "Bob", "email": "bob@example.com"}
```

### JSON

Standard JSON array:

```json
[
  {"id": 1, "name": "Alice", "email": "alice@example.com"},
  {"id": 2, "name": "Bob", "email": "bob@example.com"}
]
```

### CSV

Standard CSV with headers:

```csv
id,name,email
1,Alice,alice@example.com
2,Bob,bob@example.com
```

## How It Works

### Automatic Chunking

For large datasets (>1M rows by default), PyMetabase automatically chunks the export using ROW_NUMBER():

```sql
WITH _numbered AS (
    SELECT *, ROW_NUMBER() OVER () AS _row_num
    FROM (your_query) AS _base
)
SELECT * FROM _numbered
WHERE _row_num >= 1 AND _row_num <= 500000
```

This approach:
- Works on ANY query (CTEs, JOINs, UNIONs)
- No special column required
- Even chunk sizes
- No missing or duplicate rows

### Retry Logic

Failed chunks are automatically retried with exponential backoff:
- Attempt 1: Immediate
- Attempt 2: Wait 1 second
- Attempt 3: Wait 2 seconds

## API Reference

### Metabase Class

```python
Metabase(
    url: str = None,
    username: str = None,
    password: str = None,
    config_file: str = None,
    credentials_file: str = None,
    **kwargs
)
```

### Methods

- `connect()` - Connect and authenticate
- `disconnect()` - Disconnect from Metabase
- `list_databases()` - List available databases
- `list_tables(database)` - List tables in a database
- `export(query, output, ...)` - Export query results
- `export_table(table, output, ...)` - Export entire table

### ExportResult

```python
@dataclass
class ExportResult:
    total_rows: int
    chunks: int
    duration_seconds: float
    output_file: str
    rate_per_second: float
    format: str
```

## Requirements

- Python 3.8+
- requests
- sqlglot
- pyyaml

## License

MIT License
