Metadata-Version: 2.4
Name: context-engineer
Version: 0.2.0
Summary: Multi-Task Attention-based Transformer for Sequential API Recommendation
Project-URL: Homepage, https://github.com/yiqiao-yin/context-engineer-repo
Project-URL: Repository, https://github.com/yiqiao-yin/context-engineer-repo
Project-URL: Issues, https://github.com/yiqiao-yin/context-engineer-repo/issues
Author-email: Yiqiao Yin <yy2502@columbia.edu>
License-Expression: MIT
License-File: LICENSE
Keywords: api-recommendation,attention,context-engineering,deep-learning,markov-chain,multi-task-learning,sequential-recommendation,transformer
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Requires-Dist: numpy>=1.21.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: torch>=2.0.0
Provides-Extra: dev
Requires-Dist: black>=24.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Provides-Extra: viz
Requires-Dist: matplotlib>=3.5.0; extra == 'viz'
Requires-Dist: seaborn>=0.12.0; extra == 'viz'
Description-Content-Type: text/markdown

# Context Engineer

[![PyPI version](https://badge.fury.io/py/context-engineer.svg)](https://pypi.org/project/context-engineer/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)

A reproducible research package for the paper:

> **Rethink Context Engineering Using an Attention-based Architecture**
> Yiqiao Yin — University of Chicago Booth School of Business / Columbia University

This package implements a **multi-task attention-based transformer** for sequential API recommendation. It simultaneously predicts the next API action, session goal, and session boundary from user interaction sequences modeled as Markov chains.

## Key Results (from the paper)

| Metric | Value |
|---|---|
| API Prediction Accuracy (Top-1) | **79.83%** |
| Top-5 Hit Rate | **99.97%** |
| Top-10 Hit Rate | **100.00%** |
| Goal Prediction Accuracy | **81.6%** |
| Session End Accuracy | **99.3%** |
| Improvement over Markov baseline | **+432%** |

## Installation

```bash
pip install context-engineer
```

For visualization support:

```bash
pip install context-engineer[viz]
```

## Quick Start

### Reproduce the Full Experiment

```python
from context_engineer import run_pipeline

# Run with paper defaults: 2000 users, 100 APIs, 60 epochs
results = run_pipeline()

# Access results
model = results["model"]        # Trained PyTorch model
metrics = results["metrics"]    # Evaluation metrics dict
history = results["training_history"]
```

### Generate Simulation Data Only

```python
from context_engineer import simulate_multitask_markov_data

# Generate user session logs with 4 persona types
sequences, goals = simulate_multitask_markov_data(
    num_users=500,
    num_apis=100,
    clicks_per_user=10,
)
# sequences: list of API call sequences per user
# goals: list of session goal labels (0-3)
```

### Train with Custom Parameters

```python
from context_engineer import run_pipeline

results = run_pipeline(
    num_users=1000,
    num_epochs=30,
    embed_dim=64,
    num_heads=4,
    learning_rate=0.001,
    seed=123,
)
```

### Command-Line Interface

```bash
# Run the full pipeline
context-engineer run

# Custom parameters
context-engineer run --num-users 1000 --epochs 30 --seed 123

# Generate data only (outputs JSON)
context-engineer generate --num-users 500 --output data.json
```

## Architecture

The model uses a **shared transformer encoder** with three task-specific prediction heads:

```
Input Sequence [API_1, API_2, ..., API_t]
        |
   Embedding + Positional Encoding
        |
   Transformer Encoder (3 layers, 8 heads)
        |
   Shared Feature Representation
       /|\
      / | \
     /  |  \
    v   v   v
 Next  Goal  Session
 API   Head  End Head
 Head
```

- **Primary task**: Next API prediction (100-class classification)
- **Auxiliary task 1**: Session goal classification (4 classes)
- **Auxiliary task 2**: Session end detection (binary)
- **Loss**: Weighted combination (1.0 / 0.3 / 0.2)

## Dataset Design

The simulator generates realistic API usage patterns with:

- **100 APIs** across 10 functional categories (Auth, User Mgmt, Data Input, Data Processing, ML Training, ML Prediction, Basic Viz, Advanced Viz, Export, Admin)
- **4 user personas**: Data Scientists (80% adherence), Business Analysts (90%), Developers (60%), Power Users (70%)
- **4 session goals**: ML Pipeline, Data Analysis, User Management, Quick Visualization
- **Markov chain transitions** with high-probability workflow patterns (75-90%)

## Package Structure

```
src/context_engineer/
    __init__.py    # Public API
    data.py        # Data simulation & dataset classes
    model.py       # Transformer model architecture
    train.py       # Training, evaluation, inference
    pipeline.py    # End-to-end pipeline
    cli.py         # Command-line interface
```

## API Reference

### Core Functions

| Function | Description |
|---|---|
| `run_pipeline(**kwargs)` | Run the full experiment end-to-end |
| `simulate_multitask_markov_data(...)` | Generate simulated user sessions |
| `create_multitask_training_pairs(...)` | Convert sequences to supervised pairs |
| `train_multitask_model(...)` | Train with early stopping and cosine annealing |
| `evaluate_multitask_model(...)` | Evaluate with accuracy, MRR, Hit Rate@K |
| `set_random_seeds(seed)` | Set seeds for reproducibility |

### Core Classes

| Class | Description |
|---|---|
| `MultiTaskMarkovChainAPISimulator` | Configurable Markov chain data generator |
| `MultiTaskMarkovAPIRecommender` | Multi-task transformer model |
| `MultiTaskMarkovDataset` | PyTorch Dataset for training data |

## Citation

If you use this package in your research, please cite:

```bibtex
@article{yin2025rethink,
  title={Rethink Context Engineering Using an Attention-based Architecture},
  author={Yin, Yiqiao},
  year={2025}
}
```

## License

MIT License. See [LICENSE](LICENSE) for details.
