Metadata-Version: 2.4
Name: controlmt
Version: 0.1.1
Summary: Python SDK for the ControlMT v2.3 Kannada↔English translator
Author: Anand Kaman
License: Apache-2.0
Project-URL: Homepage, https://huggingface.co/anandkaman/controlmt-v2.3
Project-URL: Source, https://github.com/anandkaman/ControlMT
Project-URL: Demo, https://huggingface.co/spaces/anandkaman/controlmt-demo
Keywords: translation,kannada,machine-translation,nlp,indic,controlmt
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: torch<3,>=2.0
Requires-Dist: transformers<5,>=4.40
Requires-Dist: sentencepiece>=0.1.99
Requires-Dist: safetensors>=0.4
Requires-Dist: huggingface_hub>=0.27
Provides-Extra: test
Requires-Dist: pytest>=7; extra == "test"

# controlmt

Python SDK for **[ControlMT v2.3](https://huggingface.co/anandkaman/controlmt-v2.3)** —
a compact 139M-parameter Kannada ↔ English translator.

```bash
pip install controlmt
```

## Quick start

```python
from controlmt import ControlMT

model = ControlMT.from_hf()                                    # auto everything
print(model.translate("ನಾನು ಕನ್ನಡ ಮಾತನಾಡುತ್ತೇನೆ."))
# → "I speak Kannada."
```

That's it. The SDK auto-detects:
- **Device** — GPU if CUDA is available, else CPU (overridable)
- **Dtype** — fp16 on GPU, bf16 on CPU (overridable; bf16 only when supported)
- **Direction** — Kannada → English if input is mostly Kannada chars, else reverse

## Explicit control

```python
# Force CPU, even on a GPU box
model = ControlMT.from_hf(device="cpu")

# Force GPU; falls back to CPU with a warning if not present
model = ControlMT.from_hf(device="gpu")

# Pick an exact dtype
model = ControlMT.from_hf(device="cpu", dtype="bf16")    # bfloat16
model = ControlMT.from_hf(device="gpu", dtype="fp16")    # float16
model = ControlMT.from_hf(dtype="fp32")                  # full precision

# CPU int8 dynamic quantization (~2× faster than bf16 on CPU)
model = ControlMT.from_hf(device="cpu", quant="int8")

# Specific HF revision (e.g. a pre-quantized branch)
model = ControlMT.from_hf(model_id="anandkaman/controlmt-v2.3-int8", quant="int8")

# Loud: print the auto-pick decisions
model = ControlMT.from_hf(verbose=True)
```

Inspect the resolved config:
```python
>>> model
<ControlMT model_id='anandkaman/controlmt-v2.3' cuda · float16>
>>> model.config
ResolvedConfig(device='cuda', dtype_str='float16', quant='none', bf16_cpu=True)
```

## Batched translation

By design: **you must specify `batch_size` to opt into batching.** Otherwise the SDK
runs one sentence at a time — predictable memory, no surprises.

```python
texts = ["ನಾನು ಕನ್ನಡ.", "I speak English.", ...]

# Default: one at a time (safe everywhere)
outs = model.batch_translate(texts)

# Explicit fixed batch size
outs = model.batch_translate(texts, batch_size=8)

# Auto-pick batch size from free VRAM (GPU only)
outs = model.batch_translate(texts, auto_batch=True)
```

| Mode | CPU | GPU |
|---|---|---|
| `(no batch_size, no auto_batch)` | 1 sentence at a time | 1 sentence at a time |
| `batch_size=N` | uses N | uses N |
| `auto_batch=True` | ignored + warning → 1 | probes `torch.cuda.mem_get_info()`, picks N ≤ 64 |

## Other endpoints

```python
# Heuristic direction detection (>30% KN chars → kn2en, else en2kn)
ControlMT.detect_direction("ನಾನು ಕನ್ನಡ.")    # → "kn2en"

# JIT/compile warmup — kills the 5–10s "first request" lag in production
model.warmup()

# Run the 6-pair DEPLOYMENT.md benchmark suite on YOUR hardware
result = model.benchmark()
# {'config': 'cuda · float16', 'num_beams': 2, 'median_latency_s': 0.19, 'rows': [...]}
```

## Architecture note

ControlMT v2.3 is an **encoder-decoder seq2seq** model (T5/mBART family), not a
decoder-only LM. That means:

- ✅ Works: this SDK, raw Transformers, FastAPI, Docker, HF Inference Endpoints
- ❌ Doesn't work without significant adapter work: vLLM, Ollama, llama.cpp/GGUF, HF TGI

See [DEPLOYMENT.md](https://huggingface.co/anandkaman/controlmt-v2.3/blob/main/DEPLOYMENT.md) Section 9 for the full "not supported" table and why.

## License

Apache 2.0. Same as the underlying model weights.
