Metadata-Version: 2.4
Name: chatpack
Version: 0.1.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
License-File: LICENSE
Summary: High-performance Python bindings for chatpack - parse chat exports from Telegram, WhatsApp, Instagram, and Discord
Keywords: chat,parser,telegram,whatsapp,instagram,discord,nlp,llm
License: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://docs.rs/chatpack
Project-URL: Homepage, https://github.com/berektassuly/chatpack
Project-URL: Repository, https://github.com/berektassuly/chatpack-py

# chatpack-py 🚀

High-performance Python bindings for [chatpack](https://docs.rs/chatpack) - parse chat exports from Telegram, WhatsApp, Instagram, and Discord with Rust-powered speed.

## Features

- ⚡ **Blazing Fast**: Rust implementation for maximum performance
- 🔄 **Multiple Platforms**: Telegram, WhatsApp, Instagram, Discord
- 💾 **Memory Efficient**: Streaming API for large files
- 🐍 **Pythonic API**: Easy to use, well-documented
- 🎯 **Type Hints**: Full IDE support with `.pyi` stubs
- 🔧 **Flexible**: Filter, merge, and transform messages

## Installation

```bash
pip install chatpack
```

Or build from source:

```bash
pip install maturin
maturin develop --release
```

## Quick Start

### Simple Parsing

```python
import chatpack

# Parse Telegram export
messages = chatpack.parse_telegram("result.json", merge=True, min_length=5)

# Parse WhatsApp export
messages = chatpack.parse_whatsapp("chat.txt", merge=True)

# Parse Instagram export
messages = chatpack.parse_instagram("messages.json")

# Parse Discord export
messages = chatpack.parse_discord("export.json")
```

### Object-Oriented API

```python
# Create parser instance
parser = chatpack.TelegramParser()

# Parse with filters
messages = parser.parse(
    "result.json",
    merge=True,
    min_length=10,
    date_from="2024-01-01",
    date_to="2024-12-31"
)

# Access message properties
for msg in messages:
    print(f"{msg.sender}: {msg.content}")
    print(f"Timestamp: {msg.timestamp}")
```

### Streaming Large Files

For files that don't fit in memory:

```python
# Stream messages one by one
parser = chatpack.TelegramStreamParser("huge_export.json")

for msg in parser:
    process_message(msg)  # O(1) memory usage
```

### Integration with Pandas

```python
import chatpack
import pandas as pd

# Parse messages
messages = chatpack.parse_telegram("result.json", merge=True)

# Convert to DataFrame
df = pd.DataFrame([m.to_dict() for m in messages])

# Analyze
print(df.groupby('sender')['content'].count())
```

### Filtering Messages

```python
# Create filter configuration
config = chatpack.FilterConfig(
    min_length=10,
    max_length=1000,
    sender="Alice",
    date_from="2024-01-01",
    date_to="2024-12-31"
)

# Apply filters
filtered = chatpack.apply_filters(messages, config)
```

### Merging Consecutive Messages

```python
# Merge messages from same sender within 5 minutes
merged = chatpack.merge_consecutive(messages, time_threshold=300)
```

## API Reference

### Parsers

#### Eager Loading

- `parse_telegram(path, merge=False, min_length=None, date_from=None, date_to=None)`
- `parse_whatsapp(path, merge=False, min_length=None, date_from=None, date_to=None)`
- `parse_instagram(path, merge=False, min_length=None, date_from=None, date_to=None)`
- `parse_discord(path, merge=False, min_length=None, date_from=None, date_to=None)`

#### Streaming (for large files)

- `TelegramStreamParser(path)` - Returns iterator
- `WhatsAppStreamParser(path)` - Returns iterator
- `InstagramStreamParser(path)` - Returns iterator
- `DiscordStreamParser(path)` - Returns iterator

### Classes

#### `Message`

```python
msg = chatpack.Message(
    sender="Alice",
    content="Hello, world!",
    timestamp="2024-01-15T10:30:00Z",
    platform="telegram"
)

# Properties
msg.sender      # str
msg.content     # str
msg.timestamp   # Optional[str] (ISO 8601)
msg.platform    # Optional[str]

# Methods
msg.to_dict()   # Convert to dictionary
```

#### `FilterConfig`

```python
config = chatpack.FilterConfig(
    min_length=5,
    max_length=1000,
    sender="Alice",
    date_from="2024-01-01",
    date_to="2024-12-31"
)

# Builder pattern
config.with_min_length(10)
config.with_sender("Bob")
```

#### `OutputConfig`

```python
config = chatpack.OutputConfig(
    include_timestamps=True,
    include_platform=True
)
```

### Utility Functions

- `merge_consecutive(messages, time_threshold=300)` - Merge messages from same sender
- `apply_filters(messages, config)` - Apply filter configuration

## Platform Support

| Platform | Format | Special Features |
|----------|--------|------------------|
| **Telegram** | JSON | Service messages, forwarded messages |
| **WhatsApp** | TXT | Auto-detects 4 locale date formats |
| **Instagram** | JSON | Fixes Mojibake encoding (Meta bug) |
| **Discord** | JSON/CSV/TXT | Attachments, stickers, replies |

## Performance

chatpack-py leverages Rust for parsing, making it significantly faster than pure Python implementations:

- **10-100x faster** than regex-based parsers
- **Memory efficient** streaming for multi-GB files
- **Zero-copy** where possible with PyO3

## Development

### Setup

```bash
# Clone repository
git clone https://github.com/berektassuly/chatpack-py
cd chatpack-py

# Install development dependencies
pip install maturin pytest

# Build in development mode
maturin develop

# Run tests
pytest
```

### Project Structure

```
chatpack-py/
├── Cargo.toml          # Rust dependencies
├── pyproject.toml      # Python package metadata
├── src/
│   ├── lib.rs          # PyO3 module entry point
│   ├── types.rs        # Python type wrappers
│   ├── parsers.rs      # Parser implementations
│   ├── streaming.rs    # Streaming iterators
│   └── conversion.rs   # Rust ↔ Python conversion
├── python/
│   └── chatpack/
│       ├── __init__.py
│       └── chatpack.pyi  # Type stubs
└── tests/
    ├── test_basic.py
    └── test_parsers.py
```

### Building Wheels

```bash
# Build for current platform
maturin build --release

# Build for multiple platforms (requires Docker)
maturin build --release --manylinux 2014
```

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request

## License

MIT License - see [LICENSE](LICENSE) for details.

## Credits

Built on top of the excellent [chatpack](https://github.com/berektassuly/chatpack) Rust library by [Berektassuly](https://github.com/berektassuly).

## Links

- [Documentation](https://docs.rs/chatpack)
- [Rust chatpack](https://crates.io/crates/chatpack)
- [Issue Tracker](https://github.com/berektassuly/chatpack-py/issues)
