Metadata-Version: 2.3
Name: llm-redact
Version: 0.1.1
Summary: Privacy-first text redaction using local LLM models with rule generation capabilities
License: MIT
Keywords: privacy,redaction,llm,pii,data-protection,sensitive-data,ai
Author: LLM Redact Contributors
Author-email: yuqil@lookr.fyi
Requires-Python: >=3.12.4,<4.0.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Security
Classifier: Topic :: Text Processing
Requires-Dist: pydantic (>=2.0.0,<3.0.0)
Requires-Dist: pydantic-settings (>=2.0.0,<3.0.0)
Requires-Dist: pyyaml (>=6.0.0,<7.0.0)
Requires-Dist: requests (>=2.31.0,<3.0.0)
Requires-Dist: sqlalchemy (>=2.0.0,<3.0.0)
Project-URL: Documentation, https://github.com/lookr-fyi/llm-redact/blob/main/README.md
Project-URL: Homepage, https://github.com/lookr-fyi/llm-redact
Project-URL: Repository, https://github.com/lookr-fyi/llm-redact
Description-Content-Type: text/markdown

# LLM Redact

Privacy-first text redaction using local LLM models. Automatically detect and redact sensitive information like names, emails, phone numbers, and more.

## Features

- 🔒 **Privacy-first** - Uses local LLM models, no data sent to external services
- 🚀 **Simple API** - One-liner redaction: `llm_redact.mask(text)`
- 💾 **Smart Caching** - SQLite database for caching and history
- 🔧 **Configurable** - Custom rules, models, and database connections
- 📊 **Tracking** - Full history and analytics of redaction operations

## Installation

```bash
pip install llm-redact
```

## Quick Start

```python
import llm_redact

# Simple redaction
result = llm_redact.mask("Hi, I'm John Doe from john@example.com")
print(result.redacted_text)
# Output: "Hi, I'm |_NAME_A1B2C3D4_| from |_EMAIL_E5F6G7H8_|"

print(result.replacements)
# Output: [
#   Replacement(original_text="John Doe", replacement_text="|_NAME_A1B2C3D4_|"),
#   Replacement(original_text="john@example.com", replacement_text="|_EMAIL_E5F6G7H8_|")
# ]

# Note: Placeholders contain unique IDs and can be stored in database for restoration
# Each placeholder like |_NAME_A1B2C3D4_| maps to original text via database lookup
```

## Configuration

### Environment Variables

```bash
# LLM Host (default: http://localhost:8000)
export LLM_REDACT_LLM_HOST_URL=http://localhost:8000

# Database (default: sqlite:///llm_redact.db)
export LLM_REDACT_DATABASE_URL=sqlite:///my_redact.db

# Model (default: gemma3:1b)
export LLM_REDACT_DEFAULT_MODEL=gemma3:1b

# Caching (default: True)
export LLM_REDACT_ENABLE_CACHING=true
```

### Custom Database

```python
import llm_redact

# Use PostgreSQL
llm_redact.configure_client(
    database_url="postgresql://user:pass@localhost/redact_db"
)

# Use custom LLM host
llm_redact.configure_client(
    llm_host_url="http://my-llm-server:8000"
)
```

## Advanced Usage

### Custom Rules

```python
from llm_redact import RedactionRule

custom_rules = [
    RedactionRule(
        name="Replace SSN with [SSN]", 
        description="Social Security Numbers",
        data_type="SSN"
    ),
    RedactionRule(
        name="Replace addresses with [ADDRESS]", 
        description="Physical addresses",
        data_type="ADDRESS"
    )
]

result = llm_redact.mask(
    "My SSN is 123-45-6789 and I live at 123 Main St",
    rules=custom_rules
)
```

### Using the Client Directly

```python
from llm_redact import LLMRedactClient

client = LLMRedactClient(
    llm_host_url="http://localhost:8000",
    database_url="sqlite:///custom.db"
)

result = client.mask("Sensitive text here")

# Get history
history = client.get_history(limit=10)

# Create custom rules
rule = client.create_rule(
    name="Replace API keys with [API_KEY]",
    description="API keys and tokens"
)
```

## Prerequisites

1. **LLM Host Server**: Run the llm-redact host server locally:
   ```bash
   # Install and run the LLM host
   ollama serve
   ollama pull gemma3:1b
   
   # Run llm-redact host server
   python -m llm_redact_host
   ```

2. **Database**: SQLite (default) or any SQLAlchemy-supported database

## Supported Redaction Types

- Personal names → `|_NAME_XXXX_|`
- Email addresses → `|_EMAIL_XXXX_|`
- Phone numbers → `|_PHONE_XXXX_|`
- Countries → `|_COUNTRY_XXXX_|`
- Universities → `|_UNIVERSITY_XXXX_|`
- Job titles → `|_JOB_TITLE_XXXX_|`
- Addresses → `|_ADDRESS_XXXX_|`
- Social Security Numbers → `|_SSN_XXXX_|`
- Credit card numbers → `|_CREDIT_CARD_XXXX_|`

Where `XXXX` is a unique 8-character hash ID for each piece of data.

## API Reference

### `llm_redact.mask(text, rules=None, model=None)`

Redact sensitive information from text.

**Parameters:**
- `text` (str): Text to redact
- `rules` (list, optional): Custom redaction rules
- `model` (str, optional): LLM model to use

**Returns:** `RedactionResult` object

### `RedactionResult`

- `original_text`: Original input text
- `redacted_text`: Text with sensitive data redacted
- `replacements`: List of replacements made
- `is_redacted`: Whether any redactions were made
- `processing_time_ms`: Processing time in milliseconds
- `cached`: Whether result was from cache

## License

MIT License 
