Metadata-Version: 2.4
Name: encypher-ai
Version: 1.1.0
Summary: Metadata encoding and extraction for AI-generated content
Author: EncypherAI Team
License-Expression: AGPL-3.0
Project-URL: Homepage, https://github.com/encypherai/encypher-ai
Project-URL: Bug Tracker, https://github.com/encypherai/encypher-ai/issues
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: litellm>=1.30.3
Requires-Dist: rich
Requires-Dist: pyyaml
Provides-Extra: dev
Requires-Dist: black>=24.3.0; extra == "dev"
Requires-Dist: black[jupyter]>=24.3.0; extra == "dev"
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: ruff>=0.0.270; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: pre-commit>=3.3.3; extra == "dev"
Dynamic: license-file

<p align="center">
  <img src="docs/assets/horizontal-logo.png" alt="EncypherAI Logo" width="600">
</p>

# EncypherAI Core

[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![Documentation](https://img.shields.io/badge/docs-docs.encypherai.com-blue)](https://docs.encypherai.com)

A Python package for embedding and extracting metadata in text using Unicode variation selectors without affecting readability.

## Overview

EncypherAI Core provides tools for invisibly encoding metadata (such as model information, timestamps, and custom data) into text generated by AI models. This enables:

- **Provenance tracking**: Identify which AI model generated a piece of text
- **Timestamp verification**: Know when text was generated
- **Custom metadata**: Embed any additional information you need
- **Streaming support**: Works with both streaming and non-streaming LLM outputs

The encoding is done using Unicode variation selectors, which are designed to specify alternative forms of characters without affecting text appearance or readability.

## Demo Video

[![EncypherAI Demo Video](https://img.youtube.com/vi/amE_utPpEy0/0.jpg)](https://www.youtube.com/watch?v=amE_utPpEy0)

Watch our demo video to see EncypherAI in action, demonstrating how to embed and verify metadata in AI-generated content.

## Installation

```bash
uv pip install encypher-ai
```

## Quick Start

### Basic Encoding and Decoding

```python
from encypher.core.unicode_metadata import UnicodeMetadata
import time

# Encode metadata into text
encoded_text = UnicodeMetadata.embed_metadata(
    text="This is a sample text generated by an AI model.",
    model_id="gpt-4",
    timestamp=int(time.time()),  # Current Unix timestamp
    target="whitespace",  # Embed in whitespace characters
    hmac_secret_key="your-secret-key"  # Optional: Only needed for HMAC verification
)

# Extract metadata from text
metadata = UnicodeMetadata.extract_metadata(encoded_text)

# If you need to verify the integrity of the metadata with HMAC
from encypher.core.metadata_encoder import MetadataEncoder
encoder = MetadataEncoder(hmac_secret_key="your-secret-key")
metadata_dict, is_verified = encoder.extract_verified_metadata(encoded_text)
print(f"Metadata verified: {is_verified}")
```

### Using MetadataEncoder (Alternative Method)

```python
from encypher.core.metadata_encoder import MetadataEncoder
import time

# Initialize encoder with optional HMAC secret key
encoder = MetadataEncoder(secret_key="your-secret-key")

# Encode metadata
metadata = {
    "model_id": "gpt-4",
    "timestamp": int(time.time()),  # Current Unix timestamp
    "custom_field": "custom value"
}
encoded_text = encoder.encode_metadata(
    text="This is a sample text generated by an AI model.",
    metadata=metadata
)

# Decode and verify metadata
is_valid, extracted_metadata, clean_text = encoder.verify_text(encoded_text)
if is_valid:
    print(f"Model: {extracted_metadata.get('model_id')}")
    print(f"Timestamp: {extracted_metadata.get('timestamp')}")
    print(f"Custom field: {extracted_metadata.get('custom_field')}")
```

### Streaming Support

```python
from encypher.streaming.handlers import StreamingHandler

# Initialize streaming handler
handler = StreamingHandler(
    metadata={
        "model_id": "gpt-4",
        "custom_field": "custom value"
    },
    target="whitespace",
    encode_first_chunk_only=True  # Only encode the first non-empty chunk
)

# Process streaming chunks
chunks = [
    "This is ",
    "a sample ",
    "text generated ",
    "by an AI model."
]

for chunk in chunks:
    processed_chunk = handler.process_chunk(chunk)
    print(processed_chunk)  # Use in your streaming response
```

### Configuration

```python
from encypher.config.settings import Settings

# Load settings from environment variables and/or config file
settings = Settings(
    config_file="config.json",  # Optional
    env_prefix="ENCYPHER_"  # Environment variable prefix
)

# Get configuration values
metadata_target = settings.get_metadata_target()
hmac_secret_key = settings.get_hmac_secret_key()
encode_first_chunk_only = settings.get_encode_first_chunk_only()
```

### Including Custom Metadata

```python
from encypher.core.unicode_metadata import UnicodeMetadata
import time

# Include custom metadata along with required fields
encoded_text = UnicodeMetadata.embed_metadata(
    text="This is a sample text generated by an AI model.",
    model_id="gpt-4",
    timestamp=int(time.time()),  # Current Unix timestamp
    custom_metadata={
        "user_id": "user123",
        "session_id": "abc456",
        "context": {
            "source": "knowledge_base",
            "reference_id": "doc789"
        }
    }
)

# Later extract and use all metadata
is_valid, metadata = UnicodeMetadata.extract_metadata(encoded_text)
if is_valid:
    model = metadata["model_id"]  # "gpt-4"
    timestamp = metadata["timestamp"]  # Timestamp
    
    # Access custom metadata
    if "custom" in metadata:
        user_id = metadata["custom"]["user_id"]  # "user123"
        context = metadata["custom"]["context"]  # Nested object
```

## Features

- **Invisible Embedding**: Metadata is embedded using Unicode variation selectors that don't affect text appearance
- **Flexible Targets**: Choose where to embed metadata (whitespace, punctuation, etc.)
- **Streaming Support**: Works with both streaming and non-streaming LLM outputs
- **HMAC Verification**: Optionally verify the integrity of embedded metadata
- **Customizable**: Embed any JSON-serializable data
- **LLM Integration**: Ready-to-use integrations with popular LLM providers

## Metadata Target Options

You can specify where to embed metadata using the `target` parameter:

- `whitespace`: Embed in whitespace characters (default, least noticeable)
- `punctuation`: Embed in punctuation marks
- `first_letter`: Embed in the first letter of each word
- `last_letter`: Embed in the last letter of each word
- `all_characters`: Embed in all characters (not recommended)
- `none`: Don't embed metadata (for testing/debugging)

## Security Features

### HMAC Authentication

EncypherAI uses HMAC (Hash-based Message Authentication Code) to ensure the security and integrity of embedded metadata:

- **Tamper Detection**: Cryptographically verifies that metadata hasn't been modified
- **Authentication**: Confirms metadata was created by an authorized source
- **Integrity Protection**: Ensures the relationship between content and metadata remains intact

```python
# Example of verifying metadata with HMAC
from encypher.core.unicode_metadata import UnicodeMetadata

encoder = UnicodeMetadata()  # Uses secret key from environment variable
encoded_text = "AI-generated text with embedded metadata..."

# Returns (is_valid, metadata)
is_valid, metadata = encoder.extract_metadata(encoded_text)

if is_valid:
    print(f"Verified metadata: {metadata}")
else:
    print("Warning: Metadata has been tampered with!")
```

For production use, set your HMAC secret key via the ENCYPHER_SECRET_KEY environment variable or pass it directly to the constructor.

## FastAPI Integration

See the `examples/fastapi_example.py` for a complete example of integrating EncypherAI with FastAPI, including:

- Encoding endpoint
- Decoding endpoint
- Streaming support

## CLI Usage

The package includes a comprehensive command-line interface:

```bash
# Encode metadata into text
python -m encypher.examples.cli_example encode --text "This is a test" --model-id "gpt-4" --target "whitespace"

# Encode with custom metadata
python -m encypher.examples.cli_example encode --input-file input.txt --output-file output.txt --model-id "gpt-4" --custom-metadata '{"source": "test", "user_id": 123}'

# Decode metadata from text
python -m encypher.examples.cli_example decode --input-file encoded.txt --show-clean

# Decode with debug information
python -m encypher.examples.cli_example decode --text "Your encoded text here" --debug
```

## Development and Contributing

We welcome contributions to EncypherAI! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.

### Code Style

EncypherAI follows PEP 8 style guidelines with Black as our code formatter. All code must pass Black formatting checks before being merged. We use pre-commit hooks to automate code formatting and quality checks.

To set up the development environment:

```bash
# Clone the repository
git clone https://github.com/encypherai/encypher-ai.git
cd encypher-ai

# Install development dependencies
uv pip install -e ".[dev]"

# Set up pre-commit hooks
pre-commit install
```

The pre-commit hooks will automatically:
- Format your code with Black (including Jupyter notebooks)
- Sort imports with isort
- Check for common issues with flake8 and ruff
- Perform type checking with mypy

You can also run the formatting tools manually:

```bash
# Format all Python files
black encypher

# Format Python files including Jupyter notebooks
black --jupyter encypher
```

### Running Tests

```bash
# Run all tests
pytest

# Run tests with coverage
pytest --cov=encypher
```

## License

EncypherAI is provided under a dual licensing model:

### Open Source License (AGPL-3.0)

The core EncypherAI package is released under the [GNU Affero General Public License v3.0 (AGPL-3.0)](https://www.gnu.org/licenses/agpl-3.0.en.html). This license allows you to use, modify, and distribute the software freely, provided that:

- You disclose the source code when you distribute the software
- Any modifications you make are also licensed under AGPL-3.0
- If you run a modified version of the software as a service (e.g., over a network), you must make the complete source code available to users of that service

### Commercial License

For organizations that wish to incorporate EncypherAI into proprietary applications without the source code disclosure requirements of AGPL-3.0, we offer a commercial licensing option.

Benefits of the commercial license include:

- **Proprietary Integration**: Use EncypherAI in closed-source applications without AGPL obligations
- **Legal Certainty**: Clear licensing terms for commercial use
- **Support & Indemnification**: Access to professional support and IP indemnification

For commercial licensing inquiries, please contact [enterprise@encypherai.com](mailto:enterprise@encypherai.com).

See the [LICENSE](LICENSE.md) file for details of the AGPL-3.0 license.

## Acknowledgments

- Thanks to all contributors who have helped shape this project
- Special thanks to the open-source community for their invaluable tools and libraries

## Contact

For questions, feedback, or support, please [open an issue](https://github.com/encypherai/encypher-ai/issues) on our GitHub repository.
