Metadata-Version: 2.4
Name: promptcue
Version: 0.3.0
Summary: PromptCue - Classify and enrich prompts with routing cues for LLM pipelines
Author-email: Informity <contact@informity.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/informity/promptcue
Project-URL: Source, https://github.com/informity/promptcue
Project-URL: Bug Tracker, https://github.com/informity/promptcue/issues
Project-URL: Changelog, https://github.com/informity/promptcue/blob/master/CHANGELOG.md
Keywords: nlp,query-understanding,classification,rag,llm,semantic-search,routing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.13
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic<3,>=2.0
Requires-Dist: PyYAML<7,>=6.0
Requires-Dist: numpy<3,>=1.24
Provides-Extra: semantic
Requires-Dist: sentence-transformers<6,>=2.2; extra == "semantic"
Provides-Extra: detection
Requires-Dist: langdetect<2,>=1.0.9; extra == "detection"
Provides-Extra: linguistic
Requires-Dist: spacy<4,>=3.7; extra == "linguistic"
Provides-Extra: keywords
Requires-Dist: keybert<1,>=0.7; extra == "keywords"
Provides-Extra: dev
Requires-Dist: build<2,>=1.0; extra == "dev"
Requires-Dist: pytest<10,>=7.0; extra == "dev"
Requires-Dist: pytest-asyncio<2,>=0.23; extra == "dev"
Requires-Dist: pytest-cov<8,>=4.0; extra == "dev"
Requires-Dist: ruff<1,>=0.1; extra == "dev"
Requires-Dist: mypy<2,>=1.0; extra == "dev"
Requires-Dist: types-PyYAML<7,>=6.0; extra == "dev"
Provides-Extra: all
Requires-Dist: sentence-transformers<6,>=2.2; extra == "all"
Requires-Dist: langdetect<2,>=1.0.9; extra == "all"
Requires-Dist: spacy<4,>=3.7; extra == "all"
Requires-Dist: keybert<1,>=0.7; extra == "all"
Dynamic: license-file

<p align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)"
            srcset="https://api.iconify.design/ri/search-ai-3-line.svg?color=%23e6edf3&width=72&height=72" />
    <img src="https://api.iconify.design/ri/search-ai-3-line.svg?color=%23222222&width=72&height=72" alt="" />
  </picture>
</p>

# PromptCue — Prompt Intent Classifier for LLM Pipelines

[![PyPI version](https://img.shields.io/pypi/v/promptcue.svg)](https://pypi.org/project/promptcue/)
[![Python versions](https://img.shields.io/pypi/pyversions/promptcue.svg)](https://pypi.org/project/promptcue/)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![CI](https://github.com/informity/promptcue/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/informity/promptcue/actions/workflows/ci.yml)

PromptCue classifies the intent behind a natural-language prompt and returns structured
routing cues — telling your LLM pipeline, RAG system, or query router not just *what*
the user asked, but *how* it should be answered: retrieve, reason, compare, enumerate,
check recency, or ask for clarification.

---

## How it works

PromptCue uses a **cascade classifier**:

1. **Deterministic pass** — scores the query against a YAML registry of query types
   using trigger-phrase matching and vocabulary overlap. Fast, zero ML dependencies,
   returns immediately when confidence is high.
2. **Semantic fallback** — when the deterministic result is ambiguous or below threshold,
   sentence-level embeddings re-score the query against example sentences per type.
   Activates automatically when `sentence-transformers` is installed, or immediately when
   you supply your own embed function via `PromptCueConfig(embed_fn=...)` (hosted mode —
   no model loaded by PromptCue).

The result is a Pydantic model (`PromptCueQueryObject`) carrying the classification, confidence,
scope, routing hints, action directives, and any enrichment you have enabled.

---

## Requirements

- Python **3.13+**
- Core dependencies: `pydantic`, `PyYAML`, `numpy` (always installed)
- All ML/NLP components are **optional** — the package installs and runs without them
- **Language:** English only. Triggers, examples, and pre-classification detectors
  (continuation, structure, temporal scope) are all English-specific.

---

## Install

Core install — deterministic classifier only, no ML dependencies:

```bash
pip install promptcue
```

With semantic scoring (`sentence-transformers`):

```bash
pip install "promptcue[semantic]"
```

> **Hosted mode** — if your application already has an embedding model loaded (e.g. for RAG),
> pass it via `PromptCueConfig(embed_fn=your_model.encode)`. PromptCue will use it directly
> and you do **not** need `[semantic]` — no second model is loaded.

With language detection (`langdetect`):

```bash
pip install "promptcue[detection]"
```

With linguistic enrichment (`spaCy`):

```bash
pip install "promptcue[linguistic]"
python -m spacy download en_core_web_sm
```

With keyword extraction (`KeyBERT`):

```bash
pip install "promptcue[keywords]"
```

With everything:

```bash
pip install "promptcue[all]"
python -m spacy download en_core_web_sm
```

Development install (editable, with test and lint tools):

```bash
pip install -e ".[dev]"
```

---

## Production deployment

PromptCue requires semantic scoring to produce production-quality results.
The deterministic-only path (`pip install promptcue`, no `[semantic]`) achieves
approximately 40–50% accuracy on naturalistic queries and is **not a supported
production configuration** — it is suitable for evaluation or development only.

Semantic scoring can be provided in two ways:

- **Standalone mode** — install `pip install "promptcue[semantic]"` and let PromptCue
  load its own `all-MiniLM-L6-v2` model.
- **Hosted mode** — pass an existing embedding function via `PromptCueConfig(embed_fn=...)`.
  No `[semantic]` install required; PromptCue delegates encoding to the caller's model.
  See [Hosted mode](#hosted-mode-reusing-an-existing-embedding-model).

For standalone mode, every deployment must:

1. Install `pip install "promptcue[semantic]"`.
2. Pre-download the model **before** the service starts — not on first query.
3. Call `warm_up()` (or `warm_up_async()`) at startup and gate readiness on it succeeding.

Progress bars from `sentence-transformers` are disabled by default in standalone mode
(`show_progress_bar=False`) so server logs stay clean. Set
`PromptCueConfig(show_progress_bar=True)` only when you explicitly want tqdm batch output.

If the model cannot be loaded, PromptCue raises `PromptCueModelLoadError` immediately.
It never silently falls back to deterministic-only mode — a misconfigured deployment
fails loudly at startup rather than producing quietly wrong results at query time.

### Model cache location

By default the model is stored in HuggingFace's standard cache (`~/.cache/huggingface/`).
For deployments that cannot rely on the default cache, set the path explicitly:

```python
from pathlib import Path
from promptcue import PromptCueAnalyzer, PromptCueConfig

analyzer = PromptCueAnalyzer(PromptCueConfig(
    model_cache_dir=Path('/opt/models')
))
analyzer.warm_up()   # raises PromptCueModelLoadError if the model is not at that path
```

Or via environment variable — no code change required:

```bash
export PROMPTCUE_MODEL_CACHE=/opt/models
```

### Hosted mode: reusing an existing embedding model

If your application already has an embedding model loaded — for RAG, document indexing, or
any other purpose — pass its encode function to `PromptCueConfig(embed_fn=...)`. PromptCue
will delegate all vector computation to that function and will never load a model of its own.

```python
from promptcue import PromptCueAnalyzer, PromptCueConfig

# my_embedder is already loaded elsewhere in your application
def my_encode(text: str) -> list[float]:
    return my_embedder.encode(text)        # or my_embedder.embed_query(text), etc.

config   = PromptCueConfig(embed_fn=my_encode)   # no model loaded by promptcue
analyzer = PromptCueAnalyzer(config)

# warm_up() is a no-op — the external model is already loaded by the caller
result = analyzer.analyze('How do I configure VPC peering?')
print(result.primary_query_type)   # procedure
```

The type alias `PromptCueEmbedFn = Callable[[str], list[float]]` is exported from the
package root and can be used to annotate injected functions:

```python
from promptcue import PromptCueEmbedFn

def build_embed_fn(model) -> PromptCueEmbedFn:
    return lambda text: model.encode(text)
```

**When to use hosted mode:**
- Your application loads `nomic-embed-text-v1.5`, `BAAI/bge-large-en-v1.5`, or any other
  model for retrieval/RAG and wants to classify queries with the same model — zero extra memory.
- You are integrating PromptCue into a service that already manages its own model lifecycle
  and you want PromptCue to be a pure classifier with no model side-effects.
- You are running in a memory-constrained environment where loading a second model is not
  acceptable.

**Notes:**
- `enable_semantic_scoring` is forced to `True` automatically when `embed_fn` is set, even if
  `sentence-transformers` is not installed.
- The inject function signature is single-text: `(str) -> list[float]`. If your model has
  a batch API, wrap it: `lambda text: model.encode([text])[0]`.
- `warm_up()` is a no-op. `is_loaded` returns `True` immediately.

---

### Deployment patterns

| Environment | Model management approach |
|---|---|
| Local dev | Leave `model_cache_dir` unset — HuggingFace downloads on first `warm_up()` |
| EC2 / EBS | Pre-download to EBS volume; set `HF_HOME=/opt/models` or `model_cache_dir` |
| Lambda (container image) | Bake model into Docker image at build time — **required**, Lambda `/tmp` is ephemeral |
| Lambda (EFS mount) | Pre-populate EFS volume; set `model_cache_dir=Path('/mnt/models')` |
| Docker / CI | Download during image build; volume-mount for local dev |

For Lambda container images, bake the model in at build time:

```dockerfile
FROM python:3.11-slim
RUN pip install "promptcue[semantic]"
ENV HF_HOME=/app/models
RUN python -c "from sentence_transformers import SentenceTransformer; \
    SentenceTransformer('all-MiniLM-L6-v2')"
```

## Quick start

### Basic — no ML dependencies required

```python
from promptcue import PromptCueAnalyzer

analyzer = PromptCueAnalyzer()
result   = analyzer.analyze('Compare Aurora and OpenSearch for RAG on AWS')

print(result.primary_query_type)   # comparison
print(result.scope)                # comparative
print(result.confidence)           # 0.9
print(result.routing_hints)        # {'needs_retrieval': True, 'needs_reasoning': True, ...}
print(result.action_hints)         # {'should_compare': True, ...}
```

### With semantic scoring — requires `pip install "promptcue[semantic]"`

Semantic scoring is **enabled automatically** when `sentence-transformers` is installed.
Call `warm_up()` at startup to pre-load the model and avoid first-query latency.

```python
from promptcue import PromptCueAnalyzer

analyzer = PromptCueAnalyzer()
analyzer.warm_up()  # loads ~90 MB model once; cached after first download

result = analyzer.analyze('Should we use DynamoDB or RDS for a high-read catalog?')
print(result.primary_query_type)   # recommendation
print(result.classification_basis) # semantic_similarity
print(result.confidence)           # 0.25
```

### With full enrichment

```python
from promptcue import PromptCueAnalyzer, PromptCueConfig

analyzer = PromptCueAnalyzer(PromptCueConfig(
    enable_language_detection    = True,   # requires promptcue[detection]
    enable_linguistic_extraction = True,   # requires promptcue[linguistic]
    enable_keyword_extraction    = True,   # requires promptcue[keywords]
))
analyzer.warm_up()

result = analyzer.analyze(
    'How do I set up a VPC with private subnets and NAT gateway step by step?'
)
print(result.language)       # en
print(result.main_verbs)     # ['set']
print(result.noun_phrases)   # ['a VPC', 'private subnets', 'NAT gateway']
print(result.keywords)       # [PromptCueKeyword(text='vpc private subnets', weight=0.72, ...), ...]
print(result.entities)       # []  (no named entities in this query)
```

### In an async application

Both `.warm_up_async()` and `.analyze_async()` delegate to `asyncio.to_thread()`,
so they are safe to await in FastAPI handlers or any other async framework without
blocking the event loop.

```python
import asyncio
from promptcue import PromptCueAnalyzer

async def main() -> None:
    analyzer = PromptCueAnalyzer()
    await analyzer.warm_up_async()

    result = await analyzer.analyze_async('Compare option A and option B')
    print(result.primary_query_type)   # comparison

asyncio.run(main())
```

### With an injected embed function (hosted mode)

Use this when your application already has an embedding model loaded and you want PromptCue
to reuse it rather than loading a second model. No `[semantic]` extra required.

```python
from promptcue import PromptCueAnalyzer, PromptCueConfig

# Stub — replace with your actual model's encode method
def my_encode(text: str) -> list[float]:
    return my_existing_model.embed_query(text)

analyzer = PromptCueAnalyzer(PromptCueConfig(embed_fn=my_encode))
# warm_up() not needed — model is already loaded externally

result = analyzer.analyze('How do I configure VPC peering step by step?')
print(result.primary_query_type)   # procedure
```

### Full JSON output

```python
print(result.model_dump_json(indent=2))
```

---

## Query types

PromptCue ships with a default registry of 12 query types:

| Label | Scope | Description |
|---|---|---|
| `analysis` | exploratory | Deep evaluation of a system, architecture, or decision |
| `chitchat` | broad | Social or conversational, not a knowledge query |
| `comparison` | comparative | Asks to compare two or more options |
| `coverage` | broad | Broad overview or "tell me everything" request |
| `generation` | focused | Produce entirely new content from scratch with no existing source to condense |
| `lookup` | focused | Factual question with a single direct answer |
| `procedure` | focused | Step-by-step instructions for a task |
| `recommendation` | focused | Asks for a decision or suggestion given constraints |
| `summarization` | focused | Condense existing content — provided, referenced, or in-context — into a shorter form |
| `troubleshooting` | focused | Diagnosing or fixing a problem |
| `update` | focused | Latest news, releases, or changes |
| `validation` | focused | Verify or fact-check a specific stated claim, assumption, or belief |

You can replace or extend the registry by pointing `PromptCueConfig.registry_path` at your
own YAML file — the schema is documented in `src/promptcue/data/query_types_en.yaml`.

---

## Which field should I use?

`PromptCueQueryObject` surfaces several dimensions. Use the one that matches what your
pipeline actually needs to decide — you rarely need all of them.

| I need to know... | Use this field | Example values |
|---|---|---|
| What the user is asking for | `primary_query_type` | `procedure`, `comparison`, `lookup` |
| How broad or specific the query is | `scope` | `broad`, `focused`, `comparative`, `exploratory` |
| How to structure the LLM response | `action_hints` | `should_enumerate`, `should_compare`, `should_direct_answer` |
| Whether to retrieve / reason / check freshness | `routing_hints` | `needs_retrieval`, `needs_current_info`, `needs_reasoning` |
| Whether the query mentions time | `semantic_hints.mentions_time` | `True` / `False` |
| Whether the query requires cross-period analysis | `semantic_hints.requires_multi_period_analysis` | `True` / `False` |
| Whether the user wants a specific output format | `routing_hints['needs_structure']` | `True` / `False` |
| Whether the query continues a prior conversation | `is_continuation` | `True` / `False` |
| How confident the classifier is | `confidence` + `confidence_band` | `0.74`, `high` |

**Common patterns:**

- **Simple LLM router** — branch on `primary_query_type` alone. Done.
- **RAG pipeline** — use `routing_hints['needs_retrieval']` to decide whether to retrieve,
  `routing_hints['needs_current_info']` to check freshness, and `scope` to decide how many
  results to fetch (broad → more, focused → fewer).
- **Response generator** — act on `action_hints`: `should_enumerate` → numbered list,
  `should_compare` → side-by-side table, `should_direct_answer` → single concise answer.
- **Time-aware pipeline** — gate temporal aggregation on `semantic_hints.requires_multi_period_analysis`.
- **Structured-output pipeline** — detect explicit format requests via
  `routing_hints['needs_structure']` before passing to the generator.
- **Ambiguity guard** — check `confidence_band == 'low'` or `ambiguity_score > 0.5`
  before routing; fall back to clarification when confidence is too low.

> The `primary_query_type` labels are intentionally granular (12 types). If you only need
> coarse routing, `scope` already gives you broad / focused / comparative without looking at
> the type label at all.

---

## Public API

### `PromptCueAnalyzer`

```python
PromptCueAnalyzer(config: PromptCueConfig | None = None)
```

| Method | Description |
|---|---|
| `.analyze(text: str) -> PromptCueQueryObject` | Analyze a query and return a structured result |
| `.warm_up() -> None` | Pre-load all enabled models at startup to avoid first-query latency |
| `.analyze_async(text: str) -> PromptCueQueryObject` | Async variant of `.analyze()`; delegates to `asyncio.to_thread()` |
| `.warm_up_async() -> None` | Async variant of `.warm_up()`; delegates to `asyncio.to_thread()` |

---

### `PromptCueConfig` fields

| Field | Type | Default | Description |
|---|---|---|---|
| `registry_path` | `Path \| None` | `None` | Custom YAML registry path; uses bundled default when `None` |
| `model_cache_dir` | `Path \| None` | env / `None` | Directory where the sentence-transformers model is cached. Falls back to `PROMPTCUE_MODEL_CACHE` env var, then HuggingFace default (`~/.cache/huggingface/`) |
| `embed_fn` | `Callable[[str], list[float]] \| None` | `None` | Injectable embed function for hosted mode. When set, PromptCue delegates all vector computation to this function and never loads a model. `enable_semantic_scoring` is forced to `True`. See [Hosted mode](#hosted-mode-reusing-an-existing-embedding-model) |
| `show_progress_bar` | `bool` | `False` | Standalone mode only: forwarded to `SentenceTransformer.encode(show_progress_bar=...)`. Keep `False` for clean logs; set `True` for local debugging |
| `similarity_threshold` | `float` | `0.55` | Minimum score for a deterministic match to be accepted |
| `semantic_similarity_threshold` | `float` | `0.20` | Minimum score for a semantic match to be accepted |
| `ambiguity_margin` | `float` | `0.08` | Min gap between top-2 scores before clarification is flagged |
| `semantic_fallback_threshold` | `float` | `0.75` | Deterministic score above which the semantic pass is skipped |
| `trigger_fallback_threshold` | `float` | `0.60` | When a trigger phrase matched and the score meets this value and the margin is clear, the deterministic result is trusted directly and semantic is skipped |
| `enable_semantic_scoring` | `bool` | auto | `True` when `sentence-transformers` is installed or `embed_fn` is set, else `False` |
| `embedding_model` | `str` | `all-MiniLM-L6-v2` | HuggingFace model name for semantic scoring (ignored when `embed_fn` is set) |
| `enable_language_detection` | `bool` | `False` | Detect BCP-47 language code; requires `promptcue[detection]` |
| `enable_linguistic_extraction` | `bool` | `False` | Extract verbs, noun phrases, named entities; requires `promptcue[linguistic]` |
| `enable_keyword_extraction` | `bool` | `False` | Extract keyphrases via KeyBERT; requires `promptcue[keywords]` |
| `max_keywords` | `int` | `8` | Maximum number of keyphrases to extract |
| `spacy_model` | `str` | `en_core_web_sm` | spaCy model name for linguistic extraction |

---

### `PromptCueQueryObject` fields

| Field | Type | Description |
|---|---|---|
| `schema_version` | `str` | Output schema version (`"1.0"`) |
| `input_text` | `str` | Original query as provided by the caller |
| `normalized_text` | `str` | Unicode-normalised, whitespace-collapsed query |
| `language` | `str` | BCP-47 language code (`"en"`) or `"unknown"` when detection is off |
| `is_continuation` | `bool` | `True` when the query continues an ongoing conversation (e.g. "what about X?", "and for Y?") |
| `primary_query_type` | `str` | Top classified query type label, or `"unknown"` |
| `classification_basis` | `str` | How the result was reached: `trigger_match`, `word_overlap`, `semantic_similarity`, `below_threshold` |
| `candidate_query_types` | `list[PromptCueCandidate]` | All types ranked by score |
| `runner_up` | `PromptCueCandidate \| None` | Second-ranked candidate; `None` when fewer than two candidates exist |
| `confidence` | `float` | Score of the top candidate (0.0–1.0) |
| `confidence_band` | `str` | Coarse confidence tier: `high`, `medium`, or `low` |
| `ambiguity_score` | `float` | How close the top-2 candidates are (0.0 = clear, 1.0 = identical) |
| `scope` | `str` | Query scope: `broad`, `focused`, `comparative`, `exploratory`, or `unknown` |
| `main_verbs` | `list[str]` | Root verbs extracted by spaCy (empty when enrichment is off) |
| `noun_phrases` | `list[str]` | Noun chunks extracted by spaCy (empty when enrichment is off) |
| `named_entities` | `list[str]` | Named entity surface texts, plain strings (backward compat) |
| `entities` | `list[PromptCueEntity]` | Named entities with `text` and `entity_type` (spaCy label) |
| `keywords` | `list[PromptCueKeyword]` | Keyphrases with `text`, `weight`, and `kind` from KeyBERT |
| `routing_hints` | `dict[str, bool]` | `needs_retrieval`, `needs_reasoning`, `needs_current_info`, `needs_clarification`, `needs_structure` |
| `semantic_hints` | `PromptCueSemanticHints` | Agnostic semantic cues (`mentions_multiple_items`, `requests_comparison`, `requests_enumeration`, `requests_structure`, `mentions_time`, `requires_multi_period_analysis`) |
| `confidence_meta` | `PromptCueConfidenceMeta` | Confidence diagnostics (`type_confidence_margin`, `scope_confidence`, `scope_confidence_margin`) |
| `explanations` | `PromptCueExplanations` | Debug metadata (`decision_notes`, `evidence_tokens`) |
| `action_hints` | `dict[str, bool]` | Response-generation directives: `should_survey`, `should_enumerate`, `should_compare`, `should_direct_answer`, `should_check_recency`, `should_clarify`, `should_respond_conversationally` |
| `constraints` | `list[str]` | Reserved for future use |

---

### Exceptions

All exceptions inherit from `PromptCueError`.

| Exception | Raised when |
|---|---|
| `PromptCueError` | Base class — catch this to handle all PromptCue errors |
| `PromptCueModelLoadError` | The sentence-transformers model cannot be loaded at `warm_up()` time |
| `PromptCueRegistryError` | The query type registry YAML is missing, malformed, or contains invalid entries |

---

## Development

```bash
git clone https://github.com/informity/promptcue.git
cd promptcue

python3 -m venv .venv
source .venv/bin/activate

pip install -e ".[dev,semantic,linguistic,keywords,detection]"
python -m spacy download en_core_web_sm

pytest
ruff check src/ tests/ examples/
```

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md).

---

## License

MIT — see [LICENSE](LICENSE).
