Metadata-Version: 2.4
Name: cognitive-twin
Version: 3.0.0
Summary: User pattern learning with trajectory-aware DPO training
Project-URL: Homepage, https://github.com/mohameddiomande/cognitive-twin
Project-URL: Documentation, https://github.com/mohameddiomande/cognitive-twin#readme
Project-URL: Repository, https://github.com/mohameddiomande/cognitive-twin
Author: Mohamed Diomande
License-Expression: MIT
Keywords: cognitive-twin,dpo,machine-learning,nlp,trajectory-learning,user-modeling
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: anthropic>=0.18.0
Requires-Dist: httpx>=0.24.0
Requires-Dist: openai>=1.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: tenacity>=8.0.0
Requires-Dist: tqdm>=4.65.0
Provides-Extra: all
Requires-Dist: accelerate>=0.24.0; extra == 'all'
Requires-Dist: bitsandbytes>=0.41.0; extra == 'all'
Requires-Dist: datasets>=2.14.0; extra == 'all'
Requires-Dist: mypy>=1.0.0; extra == 'all'
Requires-Dist: peft>=0.6.0; extra == 'all'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'all'
Requires-Dist: pytest>=7.0.0; extra == 'all'
Requires-Dist: rag-plusplus>=1.0.0; extra == 'all'
Requires-Dist: ruff>=0.1.0; extra == 'all'
Requires-Dist: torch>=2.0.0; extra == 'all'
Requires-Dist: transformers>=4.35.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: ragpp
Requires-Dist: rag-plusplus>=1.0.0; extra == 'ragpp'
Provides-Extra: training
Requires-Dist: accelerate>=0.24.0; extra == 'training'
Requires-Dist: bitsandbytes>=0.41.0; extra == 'training'
Requires-Dist: datasets>=2.14.0; extra == 'training'
Requires-Dist: peft>=0.6.0; extra == 'training'
Requires-Dist: torch>=2.0.0; extra == 'training'
Requires-Dist: transformers>=4.35.0; extra == 'training'
Description-Content-Type: text/markdown

# CognitiveTwin

User pattern learning with trajectory-aware DPO (Direct Preference Optimization) training.

## Overview

CognitiveTwin is a sophisticated system for learning user communication patterns through:

- **Corpus Surgery**: Data cleaning, validation, and quality filtering
- **WORMS**: Trajectory generators for synthetic training data
  - Conversation Worm: Dialogue trajectory generation
  - Repo Worm: Code repository analysis
  - Task Worm: Task execution patterns
  - DPO Generator: Preference pair generation
- **Dataset Building**: Preference pair labeling and export
- **Evaluation Suite**: Comprehensive testing framework

## Installation

```bash
pip install cognitive-twin

# With training dependencies
pip install cognitive-twin[training]
```

## Quick Start

```python
from cognitive_twin.v3 import pipeline, schema
from cognitive_twin.framework import config

# Initialize pipeline
cfg = config.CognitiveTwinConfig(
    model_name="your-base-model",
    output_dir="./output"
)

# Run corpus surgery
pipeline.run_corpus_surgery(cfg)

# Generate training data
pipeline.generate_dpo_pairs(cfg)

# Train
pipeline.train(cfg)
```

## Components

### v3/ - Main Implementation

- `corpus_surgery/` - Data cleaning and validation
- `dataset/` - Dataset generation and labeling
- `eval/` - Evaluation framework
- `generators/` - Batch and DPO generators
- `ingest/` - Data ingestion (Claude, OpenAI, Supabase)
- `worms/` - Trajectory generators
- `pipeline.py` - Main orchestrator
- `schema.py` - Type definitions

### framework/ - Supporting Infrastructure

- `config.py` - Configuration management
- `twin.py` - Core twin abstraction
- `trainer.py` - Training loop

## Documentation

See the `docs/` directory for detailed documentation:

- `00_OVERVIEW.md` - System overview
- `01_CORPUS_SURGERY.md` - Data cleaning pipeline
- `02_REPO_WORM.md` - Repository analysis
- `03_CONVERSATION_WORM.md` - Dialogue generation
- `04_ENHANCER_AGENT.md` - Quality enhancement
- `05_DATASET_BUILDER.md` - Dataset construction
- `06_TRAINING_PIPELINE.md` - Training guide
- `07_EVALUATION_SUITE.md` - Evaluation metrics
- `08_API_INTEGRATION.md` - API usage

## License

MIT
