Metadata-Version: 2.4
Name: percept-vision
Version: 0.6.0
Summary: Research preview — the open-source cognition layer for goal-driven, proactive vision agents.
Author-email: Divi <divi@velvee.ai>
License: Apache-2.0
Project-URL: Homepage, https://github.com/divi-vijayakumar/Percept
Project-URL: Repository, https://github.com/divi-vijayakumar/Percept
Project-URL: Documentation, https://github.com/divi-vijayakumar/Percept/blob/main/docs/README.md
Project-URL: Issues, https://github.com/divi-vijayakumar/Percept/issues
Keywords: vision,agents,proactive,perception,cognition,vlm,video,real-time
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Multimedia :: Video
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: claude
Requires-Dist: anthropic>=0.40; extra == "claude"
Provides-Extra: gemini
Requires-Dist: google-genai>=1.0; extra == "gemini"
Provides-Extra: openrouter
Requires-Dist: openai>=1.40; extra == "openrouter"
Provides-Extra: deepgram
Requires-Dist: deepgram-sdk>=3; extra == "deepgram"
Provides-Extra: audio
Requires-Dist: numpy>=1.24; extra == "audio"
Provides-Extra: harness
Requires-Dist: numpy>=1.24; extra == "harness"
Provides-Extra: edge
Requires-Dist: numpy>=1.24; extra == "edge"
Provides-Extra: runtime
Requires-Dist: opencv-python-headless>=4.8; extra == "runtime"
Requires-Dist: numpy>=1.24; extra == "runtime"
Provides-Extra: screen
Requires-Dist: mss>=9; extra == "screen"
Provides-Extra: bench
Requires-Dist: numpy>=1.24; extra == "bench"
Requires-Dist: opencv-python-headless>=4.8; extra == "bench"
Provides-Extra: pose
Requires-Dist: mmpose>=1.3; extra == "pose"
Requires-Dist: mmdet<4,>=3.0; extra == "pose"
Requires-Dist: mmcv<2.2,>=2.0; extra == "pose"
Requires-Dist: mmengine>=0.10; extra == "pose"
Requires-Dist: opencv-python-headless>=4.8; extra == "pose"
Requires-Dist: numpy>=1.24; extra == "pose"
Provides-Extra: blazepose
Requires-Dist: mediapipe>=0.10; extra == "blazepose"
Provides-Extra: otel
Requires-Dist: opentelemetry-api>=1.20; extra == "otel"
Provides-Extra: ui
Provides-Extra: all
Requires-Dist: anthropic>=0.40; extra == "all"
Requires-Dist: google-genai>=1.0; extra == "all"
Requires-Dist: openai>=1.40; extra == "all"
Requires-Dist: deepgram-sdk>=3; extra == "all"
Requires-Dist: numpy>=1.24; extra == "all"
Requires-Dist: opencv-python-headless>=4.8; extra == "all"
Requires-Dist: mss>=9; extra == "all"
Requires-Dist: mmpose>=1.3; extra == "all"
Requires-Dist: mmdet<4,>=3.0; extra == "all"
Requires-Dist: mmcv<2.2,>=2.0; extra == "all"
Requires-Dist: mmengine>=0.10; extra == "all"
Requires-Dist: opentelemetry-api>=1.20; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: ruff>=0.6; extra == "dev"
Dynamic: license-file

# percept-vision

The open-source **cognition layer** for goal-driven, proactive vision agents.
**Package: `percept-vision` · import: `import percept`** (name ≠ import).

> **⚠️ Research preview — v0.6.0.** Published for real-life testing and feedback, **not** for
> production. The cognition core (gate, entity graph, executor, events, scheduler, consent) runs and is
> tested offline with fakes. APIs may change between `0.6.x` releases — pin a version. **No benchmark
> numbers are published yet** (the model-sweep phase comes later). Issues and feedback welcome.

You state a goal in plain language — *"nudge me when I drink a coffee"*, *"tell me when the kettle
boils"* — and percept turns a live audio-visual stream into an agent that **reasons over entities
across time**, **fires only on the rising edge** of a condition becoming true, and **refuses to
guess** — a three-state gate maps *known → act · not → silent · unknown → ask*.

The wedge is **temporal cognition** — entity memory + the three-state gate + reasoning over events —
which a raw VLM-in-a-loop and a per-frame pipeline both lack. percept builds on frontier models for
perception behind vendor-neutral seams.

## Install

```bash
pip install percept-vision        # core: PURE STDLIB — runs offline with fake backends, no keys
```

The core has **zero dependencies** and runs with deterministic fake backends, so `pip install` then run
works with no API keys. Python **≥ 3.10**. Frontier backends are opt-in extras:

```bash
pip install "percept-vision[gemini]"     # GeminiVision
pip install "percept-vision[claude]"     # AnthropicVision (Claude)
pip install "percept-vision[deepgram]"   # Deepgram STT + TTS (voice)
pip install "percept-vision[runtime]"    # camera/video/RTSP runtime worker
pip install "percept-vision[bench]"      # benchmark + scorecard tooling
pip install "percept-vision[edge]"       # Python tier-0 gate + detector registry
pip install "percept-vision[all]"        # everything
```

## 60-second quickstart (no keys)

Fully offline — fake backends, deterministic, nothing to configure. This script runs **as-is**:

```python
import asyncio
from percept import Percept, Goal

async def main():
    # Fake backends by default — offline, no keys. discover_plugins=False skips plugin lookup.
    agent = Percept.create(discover_plugins=False)

    agent.add_goal(Goal(
        id="caffeine",
        condition="the user is drinking coffee",
        say="Heads up — stepping back from caffeine?",
    ))

    # Each frame is judged; the gate fires ONCE on the rising edge.
    # ("sip-coffee" is a token the fake vision backend scripts as a confident YES.)
    fires = await agent.perceive_judged("sip-coffee")
    for ev in fires:                       # ev is a FireEvent
        print(ev.action, ev.goal_id, ev.text)   # -> fire caffeine Heads up — stepping back from caffeine?

    # The same frame again does NOT re-fire — rising-edge, not level-triggered.
    print(await agent.perceive_judged("sip-coffee"))   # -> []

asyncio.run(main())
```

`perceive_judged(frame)` returns a list of `FireEvent(goal_id, action, text, entity_id, verdict)`,
where `action` is `"fire"` or `"ask"`. Swap the fake for real eyes — the cognition layer is unchanged:

```python
agent = Percept.create(vision="gemini")          # needs percept-vision[gemini] + GEMINI_API_KEY
```

Backends are selected by name (`"fake"` · `"gemini"` · `"claude"`), by adapter instance, or by env
(`PERCEPT_VISION_BACKEND`).

## Architecture — two layers, four tiers

percept cleanly splits the **eyes** from the **brain**: a cheap, stateful **System-1 cognition** that
reasons over a stream of small structured facts (the three-state gate, the entity graph, the concern
primitive — always in the loop), and an expensive, stateless **System-2 perception** it summons only
when it must (`vision.judge(condition, frame) → Verdict`, behind vendor-neutral registry seams). A raw
VLM-in-a-loop collapses the two; percept keeps them apart so the common case costs nothing.

Full architecture — the four tiers (Feeds → Measures → Cognition → VLM), every claim anchored to code,
plus the **generated** architecture diagram — is in the repo:
[Architecture](https://github.com/divi-vijayakumar/Percept/blob/main/docs/explanation/architecture.md)
· [Concepts](https://github.com/divi-vijayakumar/Percept/blob/main/docs/explanation/concepts.md)
· [docs index](https://github.com/divi-vijayakumar/Percept/blob/main/docs/README.md).

## The envelope (refusals, stated proudly)

- **Assistant-class, never the safety mechanism.** percept informs a human who stays responsible. It is
  never the thing standing between a person and harm on a clock it can't guarantee.
- **Watches the user's own world, with the user as beneficiary** — never a non-consenting third party.

These refusals are a feature. **Do not lead with surveillance demos.**

## Packages & versioning

- **percept-vision** (this package) — the Python SDK: cognition core, fakes, runtime worker, the
  Python tier-0 edge/harness, and benchmark tooling (extras). The tier-0 harness and the frozen
  wire-contract are bundled in as the `percept.harness` and `percept.contracts` subpackages.
- **percept-edge** (npm) — the on-device reactive edge in JS/WASM: VAD + motion gate that share the
  Tier0Signal/detector wire-contract. The edge's detector + gate **outputs** are byte-for-byte
  identical to the Python harness (verified, 12/12); the WatchSpec it consumes is a documented subset
  (server-only fields like `schedule`/`cadence` are enforced in the cognition layer, not on-device).

> **Versioning.** Package version `percept-vision 0.6.0` (research preview). The **wire-contract**
> version is separate — `CONTRACT_VERSION = "1.2.0"` (additive-only). A `0.6.0` install exporting
> contract `1.2.0` is expected, not a bug.

## Benchmark status

**No Percept accuracy numbers are published yet.** The benchmark is designed but the reproducible
model-sweep run is a separate, later phase — see the
[methodology](https://github.com/divi-vijayakumar/Percept/blob/main/docs/reference/benchmark-methodology.md).
The scorecard tooling operates on a manifest you provide; the bundled golden set ships with the **source
repo, not the wheel**, so the default-manifest commands run from a repo checkout:

```bash
# from a repo checkout (the golden set is repo data, not packaged):
percept-bench scorecard --manifest eval/golden-v1/MANIFEST.json --out out/scorecard.txt
# or point --manifest at your own labelled-clip manifest, anywhere.
```

## Status

**v0.6.0 — research preview.** The cognition core runs and is tested offline with fakes (the L1 lane,
no keys); the frontier backends and the edge packages are wired behind their seams. No benchmark
numbers yet. We are in **real-life testing** — APIs may change. Issues and contributions welcome:
[github.com/divi-vijayakumar/Percept](https://github.com/divi-vijayakumar/Percept).

## License

Apache-2.0.
