Metadata-Version: 2.4
Name: xfms
Version: 0.1.0
Summary: Pick the right LLM for your task. Xpansion Framework Model Source — aggregates 8 independent benchmark sources via the hosted XFMS API. BYOK so your inference cost stays with you.
Author-email: Russ Wright <russ@visionairy.biz>
Maintainer-email: Russ Wright <russ@visionairy.biz>
License: MIT
Project-URL: Homepage, https://xpansion.dev
Project-URL: Repository, https://github.com/VisionAIrySE/XFMS
Project-URL: Documentation, https://github.com/VisionAIrySE/XFMS#readme
Project-URL: Methodology, https://github.com/VisionAIrySE/XFMS/blob/main/docs/methodology.md
Project-URL: Xpansion Framework, https://github.com/VisionAIrySE/XFMS/blob/main/docs/xpansion-overview.md
Project-URL: Bug Tracker, https://github.com/VisionAIrySE/XFMS/issues
Project-URL: Sponsor, https://github.com/sponsors/VisionAIrySE
Keywords: llm,model-selection,ai,ai-tools,claude-code,openrouter,benchmarks,llm-evaluation,llm-routing,recommendation-system,xpansion-framework,byok,ai-evaluation,model-recommender,llm-comparison,anthropic,openai,google-gemini,deepseek,lmsys-arena
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: httpx>=0.27
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Dynamic: license-file

# XFMS — Xpansion Framework Model Source

**Pick the right LLM for your task — without the Twitter vibes.**

State what you're using the model for. XFMS aggregates evidence from
eight independent benchmark sources, normalizes it onto a common
scale, lets your intent decide which dimensions matter, and returns
a ranked shortlist with plain-English rationale for every pick.

XFMS is one module of the **[Xpansion Framework](https://xpansion.dev)** —
a unified architecture for governing AI-assisted work.

---

## What this repository is

A **thin Python client** and **command-line tool** for calling the
hosted XFMS API at `xfms.vercel.app`. About 250 lines of code. It
turns a one-liner into a ranked LLM shortlist.

What this repository is **not**: the recommender engine, the score
catalog, or the ingestion pipeline. Those run on the hosted service.
The methodology behind every pick is published in full at
[docs/methodology.md](docs/methodology.md) — every claim there maps
to code that runs at request time, you just don't run it locally.

---

## What you say:

> *"Fixing bugs in our Python codebase."*

## What you get:

```
My pick: GPT-5.5

Strong on structured output and instruction following — the two
dimensions that dominate code-edit work. Beats the Claude family on
Aider Polyglot and matches it on LiveBench reasoning, at roughly
60% of the per-token cost.

Alternatives:
2. claude-sonnet-4.6  — closer on coding quality, higher cost
3. gemini-3-pro       — fastest, slightly weaker on tool use

Inferred weights from your purpose:
  • structured_output_reliability  42.0%  ← BigCodeBench, Aider
  • instruction_following          28.0%  ← LiveBench, Arena
  • factuality                     20.0%  ← MMLU, GPQA
  • coherence                      10.0%  ← LongBench
```

---

## Install

```bash
pip install xfms
```

You need two free keys:

- **Xpansion Framework Model Source access key** — identifies you
  to the hosted API. Request one by submitting your email to the
  signup endpoint:

  ```bash
  curl -X POST https://xfms.vercel.app/signup \
    -H "Content-Type: application/json" \
    -d '{"email":"you@yourdomain.com"}'
  ```

  You'll get a confirmation email; click the button inside and
  your API key arrives in a follow-up email.

- **OpenRouter key** — your BYOK (bring-your-own-key). XFMS makes a
  small LLM call per pick to figure out which benchmarks matter for
  your stated purpose. That call goes through *your* OpenRouter
  account, so your inference cost stays with you (~$0.001 per
  pick). Sign up at [openrouter.ai/keys](https://openrouter.ai/keys).

Configure them once:

```bash
export XFMS_API_KEY=xfms_live_...
export OPENROUTER_API_KEY=sk-or-v1-...
```

## Use

**Command line:**

```bash
xfms rank "writing a tight editorial under a budget"
```

```bash
xfms pick "fixing bugs in our Python codebase"
```

```bash
xfms rank "summarizing a long legal contract" --top-n 3
```

```bash
xfms rank "OCR a handwritten manifest" -c vision -c tool_use
```

**Python:**

```python
from xfms_client import XFMSClient

with XFMSClient() as xfms:
    result = xfms.rank("writing a tight editorial under a budget")
    print(result["models"][0]["name"])
```

Or the one-shot:

```python
from xfms_client import pick
print(pick("fixing bugs in our Python codebase")["name"])
```

## Override the system's inference

If you know which quality dimension matters most for your task, say
so — your preference always wins over the LLM's inference:

```bash
xfms rank "code refactor" --leaf-priorities "structured_output_reliability=1.0,factuality=0.5"
```

```python
xfms.rank(
    "code refactor",
    leaf_priorities={"structured_output_reliability": 1.0, "factuality": 0.5},
)
```

---

## Why BYOK

The hosted XFMS endpoint runs your purpose through a small language
model to figure out which benchmarks matter most for your task —
that's how the "inferred weights" block in the response gets built.

That model call goes through *your* OpenRouter account, not ours.
You pay for your own thinking; we pay for keeping the catalog
fresh. It's the right alignment of who's on the hook for what.

Typical cost per pick: about **$0.001** on OpenRouter (one short
classifier call).

---

## How XFMS picks — the four principles

Methodology in full at [`docs/methodology.md`](docs/methodology.md).
The short version:

1. **No provider self-reports.** Every score comes from a
   third-party evaluator running the same protocol across every
   model.
2. **No single-source dependence.** Eight independent benchmark
   sources contribute today; no single leaderboard determines a
   pick.
3. **User intent beats LLM inference.** The system infers weights
   from your purpose, but your stated `leaf_priorities` always
   override the inference.
4. **Honest gaps over invented signal.** Missing data is recorded
   as missing — no interpolation, no synthetic scores. Coverage
   gaps surface on every pick.

---

## Part of the Xpansion Framework

XFMS is one piece of a bigger architecture. The whole picture lives
at [`docs/xpansion-overview.md`](docs/xpansion-overview.md).

**Xpansion is in pre-signup right now.** Early access and founding
licenses are open at [xpansion.dev](https://xpansion.dev).

---

## Local development

```bash
git clone https://github.com/VisionAIrySE/XFMS.git
cd XFMS
python3 -m venv .venv
.venv/bin/pip install -e .[dev]
.venv/bin/python -m pytest tests/ -v
```

The tests mock the HTTP layer so they run offline — no API keys
needed to develop.

---

## License

This client library is MIT-licensed. The recommender engine, the
catalog, and the ingestion pipeline are not open source. See
[`NOTICE`](NOTICE) for the patent reservation language and the
relationship to the broader Xpansion Framework IP.

---

## Contact

- **Russ Wright** — russ@visionairy.biz
- **Xpansion Framework** — [xpansion.dev](https://xpansion.dev)
- **Security disclosures** — see [`SECURITY.md`](SECURITY.md)
