Metadata-Version: 2.4
Name: dlmserve
Version: 0.0.0
Summary: First OSS production-grade serving engine for diffusion language models
Project-URL: Homepage, https://dlmserve.dev
Project-URL: Repository, https://github.com/mazen-aoun/dlmserve
Project-URL: Issues, https://github.com/mazen-aoun/dlmserve/issues
License: MIT
License-File: LICENSE
Keywords: diffusion,inference,language-model,llada,serving
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Requires-Dist: accelerate>=0.34.0
Requires-Dist: fastapi>=0.115.0
Requires-Dist: prometheus-client>=0.21.0
Requires-Dist: pydantic>=2.8.0
Requires-Dist: torch<3.0,>=2.5
Requires-Dist: transformers>=4.44.0
Requires-Dist: uvicorn[standard]>=0.30.0
Provides-Extra: attn
Requires-Dist: flash-attn>=2.6.0; extra == 'attn'
Provides-Extra: dev
Requires-Dist: pre-commit>=4.0.0; extra == 'dev'
Requires-Dist: pyright>=1.1.385; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest>=8.3.0; extra == 'dev'
Requires-Dist: ruff>=0.7.0; extra == 'dev'
Description-Content-Type: text/markdown

# dlmserve

**The first OSS production-grade serving engine for diffusion language models.**

> Coming soon. Diffusion LLM serving engine — LLaDA, DiffuLLaMA, and more.

---

Diffusion language models (LLaDA, DiffuLLaMA, Mercury Coder) are architecturally
distinct from autoregressive transformers. They need their own scheduler, KV cache
semantics, batching strategy, and sampling logic. dlmserve is the missing piece.

- Bidirectional attention, not causal
- Denoising-step-aware continuous batching
- Committed/pending KV cache split
- OpenAI-compatible HTTP API

## Status

Pre-alpha. Not ready for use. Watch this space.

## License

MIT
