Metadata-Version: 2.4
Name: dbslice
Version: 0.1.3
Summary: Extract minimal, referentially-intact database subsets for local development
Project-URL: Homepage, https://github.com/nabroleonx/dbslice
Project-URL: Documentation, https://github.com/nabroleonx/dbslice#readme
Project-URL: Repository, https://github.com/nabroleonx/dbslice
Project-URL: Issues, https://github.com/nabroleonx/dbslice/issues
Author: nabroleonx
License-Expression: MIT
Keywords: cli,database,extraction,mysql,postgresql,sqlite,subset
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Requires-Dist: faker>=20.0.0
Requires-Dist: psycopg2-binary>=2.9.9
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: typer>=0.9.0
Provides-Extra: mysql
Requires-Dist: mysql-connector-python>=8.0.0; extra == 'mysql'
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://raw.githubusercontent.com/nabroleonx/dbslice/main/docs/assets/logo.png" alt="dbslice logo" width="128">
</p>

# dbslice

[![PyPI version](https://img.shields.io/pypi/v/dbslice)](https://pypi.org/project/dbslice/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)

Extract minimal, referentially-intact database subsets for local development and debugging.

## The Problem

Copying an entire production database to your machine is infeasible. But reproducing a bug often requires having the exact data that caused it. **dbslice** solves this by extracting only the records you need, following foreign key relationships to ensure referential integrity.

<p align="center">
  <img src="https://raw.githubusercontent.com/nabroleonx/dbslice/main/docs/assets/dbslice-overview.png" alt="dbslice — seed to subset">
</p>

## Quick Start

```bash
# Install globally
uv tool install dbslice   # or: pip install dbslice

# Extract an order and all related records
dbslice extract postgres://localhost/myapp --seed "orders.id=12345" > subset.sql

# Import into local database
psql -d localdb < subset.sql
```

## Features

- **Zero-config start** -- Introspects schema automatically, no data model file required
- **Single command** -- Extract complete data subsets with one CLI invocation
- **Safe by default** -- Auto-detects and anonymizes sensitive fields (emails, phones, SSNs, etc.)
- **Multiple output formats** -- SQL, JSON, and CSV
- **Streaming** -- Memory-efficient extraction for large datasets (100K+ rows)
- **Virtual foreign keys** -- Support for Django GenericForeignKeys and implicit relationships via config
- **Config files** -- YAML-based configuration for repeatable extractions
- **Validation** -- Checks referential integrity of extracted data

### Database Support

| Database   | Status                |
|------------|-----------------------|
| PostgreSQL | Fully supported       |
| MySQL      | Planned (not yet implemented) |
| SQLite     | Planned (not yet implemented) |

## Installation

```bash
# Install with uv (recommended)
uv add dbslice

# Try without installing
uvx dbslice --help

# Or with pip
pip install dbslice
```

## Usage

### Basic Extraction

```bash
# Extract by primary key
dbslice extract postgres://user:pass@host:5432/db --seed "orders.id=12345"

# Extract with WHERE clause
dbslice extract postgres://localhost/db --seed "orders:status='failed' AND created_at > '2024-01-01'"

# Multiple seeds
dbslice extract postgres://localhost/db \
  --seed "orders.id=100" \
  --seed "orders.id=101"
```

### Control Traversal

```bash
# Limit depth (default: 3)
dbslice extract postgres://... --seed "orders.id=1" --depth 2

# Direction: up (parents only), down (children only), both (default)
dbslice extract postgres://... --seed "orders.id=1" --direction up
```

### Anonymization

```bash
# Auto-anonymize detected sensitive fields
dbslice extract postgres://... --seed "users.id=1" --anonymize

# Redact additional fields
dbslice extract postgres://... --seed "users.id=1" --anonymize --redact "audit_logs.ip_address"
```

### Output Formats

```bash
# SQL (default)
dbslice extract postgres://... --seed "orders.id=1" --output sql

# JSON fixtures
dbslice extract postgres://... --seed "orders.id=1" --output json --out-file fixtures/

# CSV
dbslice extract postgres://... --seed "orders.id=1" --output csv --out-file data/
```

### Virtual Foreign Keys

For relationships not defined in the database schema (Django GenericForeignKeys, implicit relationships):

```yaml
# dbslice.yaml
database:
  url: postgres://localhost:5432/myapp

virtual_foreign_keys:
  - source_table: notifications
    source_columns: [object_id]
    target_table: orders
    description: "Generic FK to orders via ContentType"

  - source_table: audit_log
    source_columns: [user_id]
    target_table: users
    description: "Implicit FK without DB constraint"
```

```bash
dbslice extract --config dbslice.yaml --seed "users.id=1"
```

### Inspect Schema

```bash
dbslice inspect postgres://localhost/myapp
```

### Configuration File

```bash
# Generate config from database
dbslice init postgres://localhost/myapp --out-file dbslice.yaml

# Use config
dbslice extract --config dbslice.yaml --seed "orders.id=12345"
```

## How It Works

1. **Introspect** -- Reads database schema to discover tables and foreign key relationships
2. **Traverse** -- Starting from seed record(s), follows FK relationships via BFS
3. **Extract** -- Fetches all identified records
4. **Sort** -- Topologically sorts tables for correct INSERT order
5. **Output** -- Generates SQL/JSON/CSV with proper escaping

## Comparison

| Feature | dbslice | Jailer | Greenmask | slice-db |
|---------|---------|--------|-----------|----------|
| Language | Python | Java | Go | Ruby |
| Configuration | Zero-config | Requires model file | Config required | Manual YAML |
| Setup time | Seconds | Hours | Medium | Medium |
| Anonymization | Built-in (Faker) | Plugin-based | Advanced transformers | Not available |
| Subsetting | FK traversal | FK traversal | Limited | FK traversal |
| Output formats | SQL, JSON, CSV | SQL, XML, CSV | SQL | SQL only |
| Cycle handling | Automatic | Manual config | N/A | Manual |
| Streaming | Built-in | Configurable | Built-in | Not available |
| Maintenance | Active | Active | Active | Unmaintained |

**dbslice** is the lightweight, zero-config Python option: install and extract in under a minute.

## Development

```bash
git clone https://github.com/nabroleonx/dbslice.git
cd dbslice
uv sync --dev
uv run pytest
```

## License

MIT
