Your AI product team.
Describe what you want.
Wake up to shipping code.

Eight specialized agents. Three human approval gates. One CLI. The builder never grades its own work — a separate, skeptical evaluator does.

Standalone Python CLI. Any LLM provider. No Claude Code required.

$ pip install productteam
$ productteam run "I want a CLI tool that estimates API costs"
Anthropic Claude OpenAI GPT-4o Ollama (free, local) Google Gemini LM Studio vLLM

Forge: Phone to Product

Start the daemon on your workstation. Open the dashboard on your phone. Type a product idea. Hit "Forge it." Go to bed. The pipeline runs headlessly — PRD, plan, build, evaluate, document. When a gate needs your approval, you get a Slack notification. Tap approve. Wake up to a built, tested, documented codebase.

Zero infrastructure. File-based queue. Python stdlib HTTP server. No React, no build step, no npm.

# Start the daemon + dashboard (localhost by default)
productteam forge --listen --dashboard

# Or expose to your LAN for phone access
productteam forge --listen --dashboard --lan
# Dashboard: http://localhost:7654
# From phone: http://192.168.1.42:7654
1
Submit
From your phone's browser, CLI, or GitHub Issue
2
Daemon runs
PRD, plan, build, evaluate, document — fully headless
3
Approve gates
Notified via Slack or webhook. Approve from your phone.
4
Product ready
Code written, tests passing, docs generated. Ship it.

The builder never grades its own work.

Most AI coding tools let a single agent build something and then declare it done. That's a student grading their own exam.

ProductTeam separates builder from judge. The Builder writes code and says "ready for review." The Evaluator — a separate agent, separate prompt, skeptical by default — reads the source, runs the tests, tries to break things. It grades PASS, NEEDS_WORK, or FAIL.

If NEEDS_WORK, findings route back to the Builder automatically. Maximum 3 loops. After loop 3, the plan is wrong — not the implementation. The Builder can never ship its own code.

Other AI ToolsProductTeam
Agent self-evaluatesSeparate skeptical judge
"Done" when builder says so"Done" only when Evaluator grades PASS
State in conversation memoryState in files that persist across sessions
All agents or nothingDrop in only the skills you need
Complex setuppip install and run
One quality standardCode evaluator + design evaluator

From idea to shipped product

Eight specialized agents pass structured artifacts through a pipeline with three human approval gates.

PRD Writer
Product Manager
Planner
Tech Lead
max 3 loops
Builder
Engineer
Evaluator
QA Engineer
Doc Writer
Technical Writer
Ship
Done

Three approval gates

The pipeline runs automatically between gates. You stop exactly three times to confirm intent, scope, and readiness to ship.

Gate 1
PRD Approval
"Does this capture your intent?" Review the PRD before planning begins.
Gate 2
Sprint Approval
"Does this scope look right?" Review sprint contracts and acceptance criteria.
Gate 3
Ship Approval
"Ready to commit/push/publish?" All evaluations passed. Review and ship.

What we guarantee

These aren't marketing claims. They're architectural constraints enforced by the code.

The Doc Writer reads code. It never fabricates.

The Doc Writer is a doer stage — it reads every source file via read_file before writing documentation. If a function doesn't exist in the code, it doesn't appear in the docs. No hallucinated APIs. No invented features.

The Builder cannot ship its own code.

Only the Evaluator can grade a sprint PASS. The Builder declares "ready for review" — never "done." This is the GAN-inspired insight: separate the generator from the discriminator.

State survives crashes.

state.json is written on every state change. Crash, timeout, or Ctrl+C at any point — productteam run resumes from exactly where you left off. Passed sprints are skipped.

Your API keys are never exposed to build commands.

Sensitive environment variables (*_KEY, *_TOKEN, *_SECRET) are stripped from the subprocess environment before run_bash executes. The Builder writes Python and runs tests — it doesn't need your credentials.

Four tools. No more.

Doer agents get read_file, write_file, run_bash, list_dir. A narrow tool surface means more predictable behavior and a smaller attack surface than frameworks with dozens of tools.

Use only what you need.

Each agent is a standalone markdown skill file. Want just the Evaluator as a QA agent? Just the PRD Writer as a thinking tool? Drop in the skills you need. Skip the rest.

8 specialized agents

Each skill is a markdown file. Readable, editable, replaceable.

prd-writer
Product Manager
Takes a concept, applies sensible defaults, produces a structured PRD with requirements, constraints, and success criteria.
planner
Tech Lead
Reads PRD, decomposes into sprint contracts with testable acceptance criteria. Writes sprint YAML files to disk. Never writes code.
builder
Engineer
Implements sprint contracts with production-quality code and tests. Declares "ready for review" — never "done."
ui-builder
Frontend Engineer
Specialized builder for visual work. Landing pages, dashboards, web UIs. Dark theme, responsive, WCAG AA by default.
evaluator
QA Engineer
Skeptical by default. Reads source, runs tests, verifies acceptance criteria, tries to break things. PASS / NEEDS_WORK / FAIL.
evaluator-design
Design Reviewer
Grades visual artifacts on Coherence, Originality, Craft, and Functionality. 1-5 scale. 4.0+ to pass.
doc-writer
Technical Writer
Reads every source file. Produces README, landing page, changelog with real data only. Never fabricates features.
orchestrator
Project Manager
Routes work between agents, manages build-evaluate loops (max 3), enforces approval gates, writes handoff artifacts.

Getting started

Install, init, run. Three commands to your first pipeline.

# Install (Python 3.11+)
pip install productteam

# Set up your provider (pick one)
export ANTHROPIC_API_KEY=sk-ant-...
# Or: export OPENAI_API_KEY=sk-...
# Or: ollama serve (free, local)

# Init a project
productteam init

# Run the full pipeline
productteam run "a CLI tool that estimates LLM API costs"
# Pipeline control
productteam run # resume
productteam recover # unstick
productteam run --auto-approve
# Forge: phone to product
productteam forge "idea"
productteam forge --listen --dashboard
productteam forge status
# Diagnostics
productteam doctor
productteam status
productteam test

Who this is for

ProductTeam is an opinionated, auditable idea-to-code operating system for small software teams.

Solo founders and indie hackers

You can describe a product but want structured, auditable AI execution instead of chatting with a coding assistant. ProductTeam gives you a delivery pipeline, not a conversation partner.

Small product teams

You want PRD → Sprint → Build → Evaluate → Document → Ship with human gates at every strategic decision point. ProductTeam encodes a software delivery doctrine you can trust.

Anyone tired of AI that grades its own homework

The evaluator loop is the difference between "the AI said it's done" and "the AI proved it works." If you've been burned by hallucinated features or rubber-stamped tests, this is for you.