Metadata-Version: 2.4
Name: pixie-qa
Version: 0.5.1
Summary: Automated quality assurance for AI applications
Project-URL: Homepage, https://github.com/yiouli/pixie-qa
Project-URL: Repository, https://github.com/yiouli/pixie-qa
Project-URL: Documentation, https://yiouli.github.io/pixie-qa/
Project-URL: Bug Tracker, https://github.com/yiouli/pixie-qa/issues
License: MIT License
        
        Copyright (c) 2026 Yiou Li
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: ai,evals,llm,observability,opentelemetry,testing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.11
Requires-Dist: autoevals>=0.1.0
Requires-Dist: jsonpickle>=4.0.0
Requires-Dist: openai>=2.29.0
Requires-Dist: openinference-instrumentation>=0.1.44
Requires-Dist: opentelemetry-api>=1.27.0
Requires-Dist: opentelemetry-sdk>=1.27.0
Requires-Dist: pydantic>=2.0
Requires-Dist: python-dotenv>=1.2.2
Requires-Dist: starlette>=1.0.0
Requires-Dist: uvicorn>=0.42.0
Requires-Dist: watchfiles>=1.1.1
Provides-Extra: all
Requires-Dist: openinference-instrumentation-anthropic; extra == 'all'
Requires-Dist: openinference-instrumentation-dspy; extra == 'all'
Requires-Dist: openinference-instrumentation-google-genai; extra == 'all'
Requires-Dist: openinference-instrumentation-langchain; extra == 'all'
Requires-Dist: openinference-instrumentation-openai; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: openinference-instrumentation-anthropic; extra == 'anthropic'
Provides-Extra: dspy
Requires-Dist: openinference-instrumentation-dspy; extra == 'dspy'
Provides-Extra: google
Requires-Dist: openinference-instrumentation-google-genai; extra == 'google'
Provides-Extra: langchain
Requires-Dist: openinference-instrumentation-langchain; extra == 'langchain'
Provides-Extra: openai
Requires-Dist: openinference-instrumentation-openai; extra == 'openai'
Description-Content-Type: text/markdown

# pixie-qa

An agent skill that makes coding agents the QA engineer for LLM applications.

## What the Skill Does

The `qa-eval` skill guides your coding agent through the full eval-based QA loop for LLM applications:

1. **Understand the code** — read the codebase, trace the data flow, learn what the code is supposed to do
2. **Instrument it** — use `wrap()` for data-object tracing and OpenInference auto-instrumentation for LLM span capture
3. **Build a dataset** — create JSON datasets of representative inputs and expected outputs
4. **Write eval tests** — generate `test_*.py` files with appropriate evaluators
5. **Run the tests** — `pixie test` to run all evals and report per-case scores
6. **Analyse results** — `pixie analyze <test_id>` to get LLM-generated analysis of test results
7. **Investigate failures** — diagnose failures, fix, repeat

## Getting Started

### 1. Add the skill to your coding agent

```bash
npx skills add yiouli/pixie-qa
```

The accompanying python package would be installed by the skill automatically when it's used.

### 2. Ask coding agent to set up evals

Open a conversation and say something like when developing a python based AI project:

> "setup QA for my agent"

Your coding agent will read your code, instrument it, build a dataset from a few real runs, write and run eval-based tests, investigate failures and fix.

## Python Package

The `pixie-qa` Python package (imported as `pixie`) is what Claude installs and uses inside your project. API docs are auto-generated by `pdoc3` into [docs/pixie/index.md](docs/pixie/index.md) via pre-commit. The markdown renderer uses [scripts/pdoc_templates/text.mako](scripts/pdoc_templates/text.mako) so async functions and methods are explicitly shown as `async def` in signatures.

Install hooks once per clone:

```bash
uv run pre-commit install
```

## Web UI

View all eval artifacts (results, markdown docs, datasets, and legacy scorecards) in a live-updating local web UI:

```bash
pixie start              # initializes pixie_qa/ (if needed) and opens http://localhost:7118
pixie start my_dir       # use a custom artifact root
pixie init               # scaffolds pixie_qa/ without starting the server
```

The web UI provides tabbed navigation for results, scorecards (legacy), datasets, and markdown files. Changes to artifacts are pushed to the browser in real time via SSE.

The server writes a `server.lock` file to the artifact root directory on startup (containing the port number) and removes it on shutdown, allowing other processes to discover whether the server is already running.

## Configuration

Pixie reads configuration from environment variables and a local `.env` file through a single central config layer. Existing process env vars win over `.env` values.

Useful settings include:

- `PIXIE_ROOT` to move all generated artefacts under a different root directory
- `PIXIE_RATE_LIMIT_ENABLED=true` to enable evaluator throttling for `pixie test`
- `PIXIE_RATE_LIMIT_RPS`, `PIXIE_RATE_LIMIT_RPM`, `PIXIE_RATE_LIMIT_TPS`, and `PIXIE_RATE_LIMIT_TPM` to tune request and token throughput for LLM-as-judge evaluators
