Metadata-Version: 2.4
Name: tigris-boto3-ext
Version: 0.3.0
Summary: Extend boto3 with Tigris-specific features like snapshots and bucket forking
Author-email: Tigris Data <support@tigrisdata.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/tigrisdata/tigris-boto3-ext
Project-URL: Documentation, https://github.com/tigrisdata/tigris-boto3-ext#readme
Project-URL: Repository, https://github.com/tigrisdata/tigris-boto3-ext
Project-URL: Issues, https://github.com/tigrisdata/tigris-boto3-ext/issues
Keywords: tigris,boto3,s3,object-storage,snapshot,fork
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: boto3>=1.26.0
Requires-Dist: urllib3>=1.25.4
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: boto3-stubs[s3]>=1.26.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Requires-Dist: build>=1.0.0; extra == "dev"
Dynamic: license-file

# tigris-boto3-ext

[![CI](https://github.com/tigrisdata/tigris-boto3-ext/actions/workflows/ci.yml/badge.svg)](https://github.com/tigrisdata/tigris-boto3-ext/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/tigrisdata/tigris-boto3-ext/branch/main/graph/badge.svg)](https://codecov.io/gh/tigrisdata/tigris-boto3-ext)
[![Python Version](https://img.shields.io/pypi/pyversions/tigris-boto3-ext.svg)](https://pypi.org/project/tigris-boto3-ext/)
[![PyPI version](https://badge.fury.io/py/tigris-boto3-ext.svg)](https://badge.fury.io/py/tigris-boto3-ext)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

Extend boto3 with Tigris-specific features like snapshots and bucket forking, while maintaining full boto3 compatibility.

## Features

- **Bundle API**: Fetch thousands of objects in a single request as a streaming tar archive — designed for ML training workloads
- **Snapshot Support**: Create, list, and read from bucket snapshots
- **Bucket Forking**: Create forked buckets from existing buckets or snapshots
- **Object Rename**: Rename objects in place without rewriting their data
- **Multiple Usage Patterns**: Context managers, decorators, helper functions, or wrapper client
- **Zero Configuration**: Works with existing boto3 code
- **Type Safe**: Full type hints for IDE support
- **Pythonic API**: Uses familiar Python patterns

## Installation

```bash
pip install tigris-boto3-ext
```

## Usage Patterns

### 1. Context Managers (Recommended)

#### Enable Snapshots for Bucket Creation

```python
from tigris_boto3_ext import TigrisSnapshotEnabled

with TigrisSnapshotEnabled(s3_client):
    s3_client.create_bucket(Bucket='my-snapshot-bucket')
```

#### Work with Snapshots

```python
from tigris_boto3_ext import TigrisSnapshot

# List snapshots for a bucket
with TigrisSnapshot(s3_client, 'my-bucket'):
    snapshots = s3_client.list_buckets()

# Read objects from a specific snapshot
with TigrisSnapshot(s3_client, 'my-bucket', snapshot_version='12345'):
    obj = s3_client.get_object(Bucket='my-bucket', Key='file.txt')
    objects = s3_client.list_objects_v2(Bucket='my-bucket')
```

#### Create Forked Buckets

```python
from tigris_boto3_ext import TigrisFork

# Fork from current state
with TigrisFork(s3_client, 'source-bucket'):
    s3_client.create_bucket(Bucket='forked-bucket')

# Fork from specific snapshot
with TigrisFork(s3_client, 'source-bucket', snapshot_version='12345'):
    s3_client.create_bucket(Bucket='forked-from-snapshot')
```

#### Rename Objects

Tigris implements rename as a `copy_object` request plus the `X-Tigris-Rename: true`
header — no data is rewritten, only the key changes. Keep the context tight so
unrelated `copy_object` calls are not turned into renames.

```python
from tigris_boto3_ext import TigrisRename

with TigrisRename(s3_client):
    s3_client.copy_object(
        Bucket='my-bucket',
        CopySource='my-bucket/old-name.txt',
        Key='new-name.txt',
    )
```

### 2. Decorators

```python
from tigris_boto3_ext import snapshot_enabled, with_snapshot, forked_from, with_rename

@snapshot_enabled
def create_snapshot_enabled_bucket(s3_client, bucket_name):
    return s3_client.create_bucket(Bucket=bucket_name)

# List available snapshots
@with_snapshot('my-bucket')
def list_snapshots(s3_client):
    return s3_client.list_buckets()

# Read from specific snapshot
@with_snapshot('my-bucket', snapshot_version='12345')
def read_from_snapshot(s3_client, key):
    return s3_client.get_object(Bucket='my-bucket', Key=key)

@forked_from('source-bucket', snapshot_version='12345')
def create_my_fork(s3_client, new_bucket):
    return s3_client.create_bucket(Bucket=new_bucket)

@with_rename
def rename_file(s3_client, bucket, old_key, new_key):
    return s3_client.copy_object(
        Bucket=bucket,
        CopySource=f'{bucket}/{old_key}',
        Key=new_key,
    )

# Use the decorated functions
create_snapshot_enabled_bucket(s3_client, 'my-bucket')
snapshots = list_snapshots(s3_client)
obj = read_from_snapshot(s3_client, 'file.txt')
create_my_fork(s3_client, 'my-fork')
rename_file(s3_client, 'my-bucket', 'old.txt', 'new.txt')
```

### 3. Helper Functions

```python
from tigris_boto3_ext import (
    create_snapshot_bucket,
    create_snapshot,
    list_snapshots,
    create_fork,
    get_object_from_snapshot,
    get_snapshot_version,
    list_objects_from_snapshot,
    head_object_from_snapshot,
    has_snapshot_enabled,
    get_bucket_info,
    rename_object,
)

# Create snapshot-enabled bucket
create_snapshot_bucket(s3_client, 'my-bucket')

# Check if bucket has snapshots enabled
if has_snapshot_enabled(s3_client, 'my-bucket'):
    print("Snapshots are enabled!")

# Get comprehensive bucket information
info = get_bucket_info(s3_client, 'my-bucket')
print(f"Snapshot enabled: {info['snapshot_enabled']}")

# Create snapshots
result = create_snapshot(s3_client, 'my-bucket', snapshot_name='backup-1')
version = get_snapshot_version(result)

# List snapshots
snapshots = list_snapshots(s3_client, 'my-bucket')

# Create forks
create_fork(s3_client, 'new-bucket', 'source-bucket', snapshot_version=version)

# Access snapshot data
obj = get_object_from_snapshot(s3_client, 'my-bucket', 'file.txt', version)
objects = list_objects_from_snapshot(s3_client, 'my-bucket', '12345', Prefix='data/')
metadata = head_object_from_snapshot(s3_client, 'my-bucket', 'file.txt', '12345')

# Rename an object in place (no data rewrite)
rename_object(s3_client, 'my-bucket', 'old-name.txt', 'new-name.txt')
```

## Complete Examples

### Example 1: Backup and Restore Workflow

```python
import boto3
from tigris_boto3_ext import (
    create_snapshot_bucket,
    create_snapshot,
    list_snapshots,
    create_fork,
    get_snapshot_version,
)

s3 = boto3.client('s3')

# Create a snapshot-enabled bucket
create_snapshot_bucket(s3, 'production-data')

# Add some data
s3.put_object(Bucket='production-data', Key='important.txt', Body=b'critical data')

# Create a snapshot
snapshot_result = create_snapshot(s3, 'production-data', snapshot_name='daily-backup')
snapshot_version = get_snapshot_version(snapshot_result)

# List all snapshots
snapshots = list_snapshots(s3, 'production-data')
for bucket in snapshots.get('Buckets', []):
    print(f"Snapshot: {bucket['Name']}")

# Restore from snapshot by creating a fork
create_fork(s3, 'restored-data', 'production-data', snapshot_version=snapshot_version)
```

### Example 2: Testing with Snapshot Isolation

```python
import boto3
from tigris_boto3_ext import create_fork, create_snapshot, get_snapshot_version

s3 = boto3.client('s3')

# Create a snapshot of production data
snapshot_result = create_snapshot(s3, 'production-data', snapshot_name='test-snapshot')
snapshot_version = get_snapshot_version(snapshot_result)

# Fork for testing (isolated copy)
create_fork(s3, 'test-data', 'production-data', snapshot_version=snapshot_version)

# Run tests against test-db without affecting production
s3.put_object(Bucket='test-data', Key='test-data.txt', Body=b'test data')

# Clean up test bucket when done
s3.delete_bucket(Bucket='test-data')
```

### Example 3: Time-Travel Queries

```python
import boto3
from tigris_boto3_ext import get_object_from_snapshot, list_objects_from_snapshot

s3 = boto3.client('s3')

# Get object as it was at a specific snapshot
historical_obj = get_object_from_snapshot(
    s3,
    'my-bucket',
    'config.json',
    snapshot_version='12345'
)
old_config = historical_obj['Body'].read()

# List all objects in historical snapshot
historical_objects = list_objects_from_snapshot(
    s3,
    'my-bucket',
    snapshot_version='12345',
    Prefix='logs/2024/'
)

for obj in historical_objects.get('Contents', []):
    print(f"Historical object: {obj['Key']}")
```

### Example 4: Retrieving Bucket Snapshot and Fork Information

```python
import boto3
from tigris_boto3_ext import (
    create_snapshot_bucket,
    create_snapshot,
    create_fork,
    get_snapshot_version,
    has_snapshot_enabled,
    get_bucket_info,
)

s3 = boto3.client('s3')

# Check if a bucket has snapshots enabled
bucket_name = 'my-bucket'

create_snapshot_bucket(s3, bucket_name)

if has_snapshot_enabled(s3, bucket_name):
    print(f"✓ Snapshots are enabled for {bucket_name}")
else:
    print(f"✗ Snapshots are not enabled for {bucket_name}")

# Get comprehensive bucket information
info = get_bucket_info(s3, bucket_name)
print(f"Snapshot enabled: {info['snapshot_enabled']}")

# Example: Check fork lineage
source_bucket = 'production-data'
create_snapshot_bucket(s3, source_bucket)

# Create a snapshot
snapshot_result = create_snapshot(s3, source_bucket, snapshot_name='v1')
snapshot_version = get_snapshot_version(snapshot_result)

# Create a fork
forked_bucket = 'test-data'
create_fork(s3, forked_bucket, source_bucket, snapshot_version=snapshot_version)

# Inspect the fork
fork_info = get_bucket_info(s3, forked_bucket)
print(f"Forked from: {fork_info['fork_source_bucket']}")
print(f"Snapshot version: {fork_info['fork_source_snapshot']}")
```

### Example 5: Bundle API — Fetch Multiple Objects in One Request

```python
import tarfile
import boto3
from tigris_boto3_ext import bundle_objects, BundleError, BUNDLE_ON_ERROR_FAIL

s3 = boto3.client('s3')

# Fetch a batch of training images as a streaming tar archive
keys = [f"dataset/train/img_{i:05d}.jpg" for i in range(1000)]
response = bundle_objects(s3, 'my-dataset-bucket', keys)

with tarfile.open(fileobj=response, mode="r|") as tar:
    for member in tar:
        if member.name == "__bundle_errors.json":
            continue  # skip the error manifest
        f = tar.extractfile(member)
        if f is not None:
            image_bytes = f.read()
            # feed to training pipeline

# Use fail mode for inference where every object must be present
try:
    response = bundle_objects(
        s3, 'my-bucket', keys, on_error=BUNDLE_ON_ERROR_FAIL
    )
except BundleError as e:
    print(f"Bundle failed (HTTP {e.status_code}): {e.body}")
```

See [`examples/bundle_usage.py`](examples/bundle_usage.py) for more patterns including error handling, response metadata, and ML training batches.

## How It Works

This library uses boto3's event system to inject Tigris-specific headers into S3 API requests:

### Request Headers (Sent to Tigris)

- **`X-Tigris-Enable-Snapshot: true`** - Enables snapshot support for bucket creation
- **`X-Tigris-Snapshot: true; name=<name>`** - Creates a snapshot
- **`X-Tigris-Snapshot: <bucket_name>`** - Lists snapshots for a bucket
- **`X-Tigris-Snapshot-Version: <version>`** - Reads from specific snapshot version
- **`X-Tigris-Fork-Source-Bucket: <bucket>`** - Specifies fork source
- **`X-Tigris-Fork-Source-Bucket-Snapshot: <version>`** - Forks from specific snapshot
- **`X-Tigris-Rename: true`** - Turns a `CopyObject` request into an in-place rename

### Response Headers (Returned by Tigris)

The following custom headers are returned in HeadBucket responses and can be accessed via `get_bucket_info()` and `has_snapshot_enabled()`:

- **`X-Tigris-Enable-Snapshot: true`** - Present when snapshots are enabled for the bucket
- **`X-Tigris-Fork-Source-Bucket: <bucket_name>`** - Present on forked buckets, indicates the parent bucket
- **`X-Tigris-Fork-Source-Bucket-Snapshot: <version>`** - Present on forked buckets, indicates the snapshot version

The library registers event handlers on `before-sign.s3.*` events to add request headers transparently.

## Requirements

- Python 3.9+
- boto3 >= 1.26.0

## Development

### Setup

```bash
# Clone the repository
git clone https://github.com/tigrisdata/tigris-boto3-ext.git
cd tigris-boto3-ext

# Install with dev dependencies using uv
uv sync --all-extras

# Or with pip
pip install -e ".[dev]"
```

### Running Tests

#### Integration Tests

Integration tests run against a real Tigris S3 service. See [`tests/integration/README.md`](tests/integration/README.md) for detailed setup instructions.

```bash
# Set up environment variables
export AWS_ENDPOINT_URL_S3="https://t3.storage.dev"
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"

# Run integration tests
uv run pytest tests/integration/ -v
```

### Code Quality

```bash
# Type checking
uv run mypy tigris_boto3_ext

# Linting
uv run ruff check tigris_boto3_ext

# Auto-fix linting issues
uv run ruff check --fix tigris_boto3_ext

# Code formatting
uv run ruff format tigris_boto3_ext

# Check formatting without making changes
uv run ruff format --check tigris_boto3_ext
```

## License

Apache-2.0

## Contributing

Contributions welcome! Please open an issue or PR on GitHub.

## Support

For issues and questions:

- GitHub Issues: <https://github.com/tigrisdata/tigris-boto3-ext/issues>
- Documentation: <https://www.tigrisdata.com/docs>
