Metadata-Version: 2.4
Name: phylo
Version: 0.0.0
Summary: An open stack to build, RL-post-train, and deploy frontier robot policies.
Project-URL: Homepage, https://github.com/worv-ai/phylo
License: Apache-2.0
Keywords: policy,reinforcement-learning,robot-foundation-model,robotics
Requires-Python: >=3.11
Description-Content-Type: text/markdown

# phylo

**An open stack to build, RL-post-train, and deploy frontier robot policies — from pretraining to the real world.**

`phylo` is the unified framework behind the WoRV flagship robot-foundation-model effort. It does **not** reimplement training or inference; it borrows mature components (HuggingFace Trainer, vLLM, LeRobot data, the [`vla-eval`](../vla-evaluation-harness) benchmark harness, starVLA model code) and unifies them behind one light surface.

**Keystone:** one serving stack is used identically for RL rollout, evaluation, and deployment — parity-tested, so what you train against is what you ship.

## Architecture

A dual-system policy (PonderPounce lineage):

- **`cortex`** — System-2 *Foundation Cortex*: a reusable, shared "brain" that reads context (human demonstration, memory, rules, current observation) and emits action-relevant **cognition tokens**.
- **`reflex`** — System-1 execution layer: turns cognition tokens into a specific embodiment's actions. Small, swappable per robot. Adapted via SFT + **Reflex RL** (RLT-style actor-critic on the *frozen* cortex's cognition tokens).
- **`platform`** — shared infra: the parity serving stack (rollout = eval = deploy), data/checkpoint registries, `vla-eval` integration.

## Training recipe (stages)

`[0] action-prior warm-up → [1] cortex pretrain (web-VLM co-training + cross-embodiment robot data) → [2] reflex SFT → [3] reflex RL (+ real-world online)`

VLM-handling is a **config dimension**, not a fixed default — independent `tune_vision / tune_llm / tune_projector` flags + a Knowledge-Insulation (stop-gradient) option + per-stage freeze transitions; the default follows the imported model's published recipe.

## Status

Early scaffold. The **Reflex RL** prototype (RLT on sim) is validated end-to-end on `vla-eval` (SimplerEnv × Qwen3-GR00T) — the server lives at `vla-evaluation-harness/src/vla_eval/model_servers/starvla_rlt.py` and will be folded into `reflex/`.

## Borrows (heavy internals)

HuggingFace Trainer/Transformers · vLLM · LeRobot dataset format · `vla-eval` (benchmark harness + serving protocol) · starVLA (QwenGR00T model code).
