Metadata-Version: 2.4
Name: chitragupta
Version: 0.1.1
Summary: Pytest for your prompts — test, version and audit LLM outputs in pure Python
Author-email: Rohan Khairnar <rohankhairnar190@gmail.com>
License: MIT License
        
        Copyright (c) 2026 Rohan Khairnar
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/build-with-rohan/chitragupta
Project-URL: Issues, https://github.com/build-with-rohan/chitragupta/issues
Project-URL: Documentation, https://github.com/build-with-rohan/chitragupta
Keywords: llm,prompt,testing,ai,openai,anthropic,groq,rag,evaluation,assertions,prompttest
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Testing
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Dynamic: license-file

# Chitragupta

<p align="center">
  <img src="https://raw.githubusercontent.com/build-with-rohan/chitragupta/main/assets/logo.png" width="450"/>
</p>

वाक्येषु दोषान् गणयन् सत्यं परीक्षणमेव च ।
चित्रगुप्तः सदा रक्षेत् बुद्धिवाक्यप्रमाणतः ॥

“The one who counts errors in expressions and verifies truth through testing — Chitragupta always protects correctness through reasoning and validation.”

Chitragupta, the divine record-keeper of truth and action, is reimagined for the age of AI.

This library evaluates LLM outputs with precision—enforcing correctness, structure, and reliability through programmable assertions, just like unit tests for prompts.

## Features 

Pytest for your prompts

Test, validate, and catch breaking changes in your LLM outputs before they reach users.

```
pip install chitragupta
```

Stop guessing if your prompt changes broke something.

Every developer who builds with LLMs faces this: you change one word in your prompt, manually test a few inputs, and ship it. Two days later a user reports wrong output. You don't know which change caused it, when it broke, or how to reproduce it.

Chitragupta gives you a safety net. Add a decorator to your function, define your rules, run chitragupta run. You know immediately — before shipping — whether your LLM still behaves the way you expect.

## Quick start

```python
from chitragupta import prompttest, contains, max_length

@prompttest(
    inputs=["What is 2+2?"],
    asserts=[contains("4"), max_length(200)]
)
def my_bot(question):
    return "The answer is 4."

if __name__ == "__main__":
    print(my_bot("What is 2+2?"))
```

```
chitragupta  v0.1.0  ·  1 file scanned  ·  1 prompt function found
●  my_bot  'What is 2+2?'
contains("4")              PASS
max_length(200)            PASS
────────────────────────────────────────────────────
2 passed  ·  1 input  ·  2 assertions total  ·  1 prompt function
```

## How it works

- Wrap any Python function that calls an LLM with @prompttest decorator
- Define test inputs and rules the output must satisfy
- Run all prompt tests 
```bash 
chitragupta run 
```
- Get clear pass/fail results for every rule
- No cloud, no YAML, no Node.js, no external dependencies

## Minimal example 

```python
from chitragupta import prompttest, contains

@prompttest(inputs=["2+2"], asserts=[contains("4")])
def bot(q):
    return "4"
```

## Why not just pytest?

You can test LLM outputs with pytest, but it quickly becomes repetitive:

- You have to manually call functions with test inputs
- Assertions are not reusable
- No standard way to define prompt rules
- No CLI to scan and run all prompt tests automatically

Chitragupta solves this by:

- Attaching tests directly to your functions
- Providing reusable assertions
- Running everything with a single command

## Why Chitragupta? 

Most LLM testing tools add complexity. Chitragupta removes it. 

- No cloud - everything runs locally 
- No YAML - define tests directly in Python
- No Node.js - pure Python, zero ecosystem friction
- No external dependencies - lightweight and fast 

## Who is this for? 

- Developers building LLM applications
- Teams that want to catch prompt regressions early
- Anyone who needs to validate LLM outputs consistently
- Anyone tired of manually testing prompt changes 

## Real world use cases

### Customer support chatbot

A developer builds a support bot. After tweaking the prompt for tone, the bot starts leaking internal pricing info. With Chitragupta they would have caught it in seconds using not_contains("internal") before any user saw it.

```python
from chitragupta import prompttest, not_contains, max_length

@prompttest(
    inputs=["What are your pricing plans?"],
    asserts=[not_contains("internal"), max_length(300)]
)
def support_bot(query):
    # Your LLM call here
    return "Our pricing starts at $10/month for the basic plan."
```

### JSON output validation

A code review tool expects the LLM to always return valid JSON with specific keys. After a model upgrade, the output schema silently changed. valid_json() would have caught it before deploy.

```python
from chitragupta import prompttest, valid_json

@prompttest(
    inputs=["Review this Python function"],
    asserts=[valid_json()]
)
def code_reviewer(code):
    # Your LLM call here
    return '{"suggestions": ["Add docstring"], "score": 8}'
```

### Content policy enforcement

An HR tool screening resumes must never mention age, gender, or race in its output for legal reasons. A custom assertion function no_bias_words() catches any prompt change that accidentally enables biased output.

```python
from chitragupta import prompttest

def no_bias_words(text):
    bias_terms = ["age", "gender", "race", "young", "old", "male", "female"]
    return not any(term in text.lower() for term in bias_terms)

@prompttest(
    inputs=["Screen this resume for senior developer role"],
    asserts=[no_bias_words]
)
def hr_screening(resume_text):
    # Your LLM call here
    return "Candidate has strong technical skills and relevant experience."
```

### Product description generator

An e-commerce platform generates descriptions that must be 80-150 words for SEO, always mention the product name, and never include competitor names. min_length(), max_length(), not_contains() enforce all of this automatically.

```python
from chitragupta import prompttest, min_length, max_length, not_contains

@prompttest(
    inputs=["Wireless headphones"],
    asserts=[min_length(80), max_length(150), not_contains("Sony"), not_contains("Bose")]
)
def description_generator(product):
    # Your LLM call here
    return "Experience premium sound with our wireless headphones. Features include noise cancellation and 24-hour battery life."
```

### Safety-critical apps

A health app must never give dosage advice or sound like a diagnosis. no_dosage() and no_diagnosis() custom assertions block any prompt version that enables medical advice from being deployed.

```python
from chitragupta import prompttest

def no_dosage(text):
    dosage_terms = ["mg", "dose", "dosage", "pill", "tablet", "take"]
    return not any(term in text.lower() for term in dosage_terms)

def no_diagnosis(text):
    diagnosis_terms = ["diagnosis", "condition", "disease", "illness", "symptoms"]
    return not any(term in text.lower() for term in diagnosis_terms)

@prompttest(
    inputs=["I have a headache, what should I do?"],
    asserts=[no_dosage, no_diagnosis]
)
def health_advisor(query):
    # Your LLM call here
    return "For health concerns, please consult with a qualified healthcare professional."
```

## Built-in assertions

| Assertion | Description | Example |
|-----------|-------------|---------|
| contains() | Text must contain substring | contains("hello") |
| not_contains() | Text must not contain substring | not_contains("error") |
| max_length() | Text length must be ≤ value | max_length(100) |
| min_length() | Text length must be ≥ value | min_length(10) |
| valid_json() | Text must be valid JSON | valid_json() |
| matches_regex() | Text must match regex pattern | matches_regex(r"\d{4}") |

## Custom assertions

Any Python function that returns True/False works as a custom assertion:

```python
def no_emojis(text):
    return not any(char in text for char in ["😀", "😢", "🎉"])

@prompttest(
    inputs=["Generate a response"],
    asserts=[no_emojis]
)
def formal_response(query):
    return "This is a formal response without emojis."
```

## CI/CD integration

```yaml
name: Test Prompts
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: pip install chitragupta
      - run: chitragupta run
```

## Works with any LLM

Chitragupta is LLM-agnostic. It works with OpenAI, Anthropic, Groq, Gemini, local models, or any other LLM you can call from Python. Just wrap your LLM function with the decorator and test away.

## Roadmap

- v0.2 — run history saved locally, chitragupta history command
- v1.0 — @promptversion decorator, chitragupta diff v1 v2, pytest plugin
- v1.1 — llm_judge() assertion, async support
- v2.0 — HTML reports, production monitoring, plugin ecosystem

## About the name

Named after Chitragupta — the divine record-keeper who tracks every action and evaluates it with precision.
This library does the same for your LLM outputs.

## License

MIT License © 2026 Rohan Khairnar

## Support

If you find this useful, consider giving it a ⭐ on GitHub.
