Metadata-Version: 2.4
Name: cdn-ai
Version: 0.3.0b1
Summary: Condensa — hyper-efficient AI-to-AI communication language. 71.7% token reduction, 95.8% zero-shot interpretability.
Author: Worachet Dee
License: MIT
Project-URL: Homepage, https://github.com/worachetdee/condensa
Project-URL: Repository, https://github.com/worachetdee/condensa
Project-URL: Documentation, https://github.com/worachetdee/condensa/tree/main/docs
Project-URL: Issues, https://github.com/worachetdee/condensa/issues
Keywords: ai,agents,communication,protocol,compression,tokens,multi-agent,llm,condensa,cdn
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Communications
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tiktoken>=0.7.0
Provides-Extra: llm
Requires-Dist: anthropic>=0.30.0; extra == "llm"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Provides-Extra: all
Requires-Dist: cdn-ai[dev,llm]; extra == "all"
Dynamic: license-file

# Condensa

> A hyper-efficient language designed exclusively for AI-to-AI communication, optimized for minimal token usage while maximizing semantic density.

### Three Editions

| Edition | Code | Focus | Best For |
|---------|------|-------|----------|
| **condensa** (core) | `!:cdn` | Max performance (71.7% compression, 95.8% interpretability) | Agent swarms, pipelines, batch ops |
| **condensa** (expressive) | `~:cdn` | Tone + negotiation (soft/firm/tentative intent) | Collaborative AI teams |
| **condensa** (secure) | `@:cdn` | Enterprise security (classification, encryption, audit) | Healthcare, finance, defense |

---

## What It Does

Condensa replaces verbose natural language and bloated JSON in AI-to-AI communication with a dense, position-encoded notation that current LLMs already understand zero-shot. Multi-turn agent conversations waste 50-80% of tokens on structural overhead, context re-transmission, and politeness filler. Condensa eliminates all three, achieving 71.7% token reduction in live agent benchmarks across 149 tested scenarios.

**Before** (101 tokens):
```
AgentC, I need you to perform a thorough code review of the file that AgentB
just wrote at /workspace/src/transaction_processor.py. Please check the code
against the following criteria: code style and PEP 8 compliance, potential bugs
or logic errors, performance issues, security vulnerabilities, and type safety.
Format your review as a structured report with severity and line numbers.
```

**After** (10 tokens):
```
>:@C review $_.path checks:(style,bugs,perf,security,types) /fmt:report
```

---

## Results

| Metric | Value |
|--------|-------|
| Compression vs NL | **66.9%** (static), **71.7%** (live agent) |
| Compression vs JSON | **71.8%** |
| Zero-shot interpretability | **95.8%** avg across 5 LLMs |
| Cross-model execution | **93.8%** (Claude to Gemini Flash, 8 turns) |
| Cost savings at 1M conversations | **$18,261** (at GPT-4o pricing) |
| Prompt overhead break-even | 2 messages (ultra) / 5 messages (minimal) |

---

## Quick Start

```bash
python3 -m venv .venv && source .venv/bin/activate
pip install tiktoken pyyaml
python -c "from src.encoder import encode; print(encode('Search the web for SpaceX news and return the top 5 results'))"
# Output: !:srch SpaceX /n:5
```

Full setup, benchmarks, and LLM encoder usage: [Quick Start Guide](docs/QUICK-START.md)

---

## Documentation

| Document | Description |
|----------|-------------|
| [Quick Start](docs/QUICK-START.md) | Setup, encode/decode, run benchmarks, LLM encoder |
| [Language Reference](docs/LANGUAGE-REFERENCE.md) | Syntax, quick reference card, 6 worked examples |
| [Features](docs/FEATURES.md) | All 11 features (v0.2 + v0.3) + v0.4 tone research |
| [Benchmarks](docs/BENCHMARKS.md) | 149 scenarios, live agent data, cost analysis |
| [Architecture](docs/ARCHITECTURE.md) | Project structure, design, version history, branches |
| [Research Summary](RESEARCH-SUMMARY.md) | Full audit trail |
| [Interpretability Tests](experiments/INTERPRETABILITY-TEST.md) | 5-model zero-shot testing |
| [Transparency](research/transparency_notes.md) | Honest limitations |
| [Multilingual](research/multilingual_analysis.md) | Cross-lingual analysis |
| [Prompt Overhead](research/prompt_overhead_analysis.md) | Break-even analysis |

---

## Multilingual

Condensa's structure is 100% language-neutral -- verbs (`srch`, `filt`, `grp`) are code patterns, not English words. Non-English agents benefit MORE because their NL instructions are more expensive under BPE tokenization (Thai: 37.1% savings, Japanese: 37.5%, Arabic: 31.7%). A Japanese agent and Chinese agent can collaborate without understanding each other's language -- Condensa serves as the lingua franca.

Full analysis: [research/multilingual_analysis.md](research/multilingual_analysis.md)

---

## Transparency

Honest documentation of where Condensa does NOT work well. Dense human prose saves only 4.4% (already near information-theoretic minimum). Chinese NL is -5.6% (worse) because Chinese is already extremely dense. The regex encoder has 35% fidelity (prototype only; the LLM encoder achieves 82%). Condensa wins where machines talk to machines verbosely -- agent frameworks, JSON exchanges, multi-turn workflows.

Full notes: [research/transparency_notes.md](research/transparency_notes.md)

---

## Roadmap

| Phase | Status | Description |
|-------|--------|-------------|
| Phase 1: Analysis & Theory | Complete | Token economics audit, compression survey, 8 design principles |
| Phase 2: Language Specification | Complete | v0.1, v0.2, v0.3 specs, EBNF grammar, primitives registry |
| Phase 3: Implementation | Complete | Encoder, decoder, 149-scenario benchmarks, validation suite |
| Phase 3.5: Interpretability Testing | Complete | 5-model zero-shot test (95.8%), v0.3 token redesign |
| Phase 3.6: Cross-Model Execution | Complete | Claude to Gemini Flash, 100% task execution, 93.8% overall |
| Phase 4: Optimization | In progress | LLM encoder (done), fine-tuning dataset (done), agent framework integration (planned), `pip install condensa` (planned) |

---

## License

Research project.
