Metadata-Version: 2.4
Name: qprompt-cli
Version: 0.1.2
Summary: Flight recorder for LLM apps
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Provides-Extra: bedrock
Requires-Dist: boto3>=1.28; extra == "bedrock"
Dynamic: license-file

# qprompt-cli

Internal trace utility for LLM workflows.  
Goal: make answers inspectable with structured records of parsing, tool execution, evidence, and risks.

## Why we use this internally

- Identify why a model answer is wrong without re-running blind.
- Detect when an answer claims tool usage that did not actually happen.
- Preserve a portable artifact for incident review and QA.
- Standardize trace shape across model/tool backends.

## What is captured

- Request metadata: `trace_id`, `timestamp`, `model`, `question`
- Parse stage: `intent`, `entities`, `assumptions`, `missing_context`, `suggested_tools`
- Request envelope: model messages and available tools
- Tool execution: name, redacted input, output summary, status, error
- Evidence records: claim/source/evidence id
- Model response metrics: latency, token usage estimates (or provider usage when available)
- Audit output: claims, unsupported claims, risk flags

## Explicit limitations

- No hidden chain-of-thought extraction.
- No neuron/attention internals for hosted closed models.
- Token usage depends on provider payload; may be estimate-only.

## Install

```bash
python -m pip install -e .
```

Import:

```python
from qprompt import Tracer
```

## CLI

```bash
qprompt run "why did revenue drop in March?"           # real path: stub LLM, no synthetic tools
qprompt run "why did revenue drop in March?" --demo    # synthetic SQL + evidence (marked is_demo=true)
qprompt list
qprompt show <trace_id_or_path>
qprompt diff <trace_a> <trace_b>
```

The `--demo` flag is opt-in; the default never injects fake tool calls or evidence. Demo traces carry `is_demo: true` and are flagged on stdout/stderr so they can never be silently mistaken for real data.

Default storage:

```text
.traces/YYYY-MM-DD/trace_<uuid>.json
```

## Data contract

- JSON schema: `src/llmtrace/trace_schema.json`
- Runtime builder/validator: `src/llmtrace/schema.py`

## Operational behavior

- Trace write occurs only after schema validation.
- Failed tool calls are recorded as step errors and surfaced as risks.
- Multi-month phrasing (e.g. "April vs March") is preserved in parsed period.

## Integration notes

- `Tracer.run(...)` currently includes a mock model path for local validation.
- For production usage, replace the callable used by `Tracer.chat(...)` with provider-specific calls and pass back usage fields when available.
- For SQL/tool-backed workflows, run tools in code and pass outputs into the traced context; prompt text alone does not execute tools.
