Metadata-Version: 2.4
Name: hallucination-grep
Version: 0.1.0
Summary: Cross-check LLM output against your real codebase to detect hallucinated references
Author: hallucination-grep
License: MIT
Keywords: llm,hallucination,ast,code-analysis,cli
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: isort>=5.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: flake8>=6.0; extra == "dev"
Dynamic: license-file

# hallucination-grep

Cross-check LLM output against your real codebase. Finds references to
functions, classes, files, and imports that the LLM mentioned but that
don't actually exist in your code.

No external AI dependencies. Pure Python + AST analysis.

## Install

```bash
pip install hallucination-grep
```

Or install from source:

```bash
git clone https://github.com/yourname/hallucination-grep
cd hallucination-grep
pip install -e .
```

## Usage

```bash
# From a file
hallucination-grep response.txt --codebase ./src

# From stdin (pipe from clipboard or LLM)
echo "Use the get_user_profile() function..." | hallucination-grep --codebase .

# Specific checks only
hallucination-grep response.txt --codebase . --check functions,files

# JSON output for CI
hallucination-grep response.txt --codebase . --json
```

## Example output

```
╔═════════════════════════════╗
║  HALLUCINATION GREP REPORT  ║
╚═════════════════════════════╝

Scanned: 13 lines of LLM output
Codebase: src (4 Python files, 4 total files indexed)

HALLUCINATIONS DETECTED: 5

╭──────────────────────────────────────────────────────╮
│   ✗ Function: get_user_profile()                     │
│     Mentioned in: line 3 of input                    │
│     Status: Does not exist in codebase               │
│     Similar: get_user_data() in src/users.py:5       │
╰──────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────╮
│   ✗ File: src/helpers/formatter.py                   │
│     Mentioned in: line 3 of input                    │
│     Status: File does not exist                      │
│     Similar: src/utils/format.py                     │
╰──────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────╮
│   ✗ Import: from config import DATABASE_URL          │
│     Mentioned in: line 8 of input                    │
│     Status: Does not exist in codebase               │
│     Similar: config.settings (module)                │
╰──────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────╮
│   ✗ Class: DataProcessor                             │
│     Mentioned in: line 4 of input                    │
│     Status: Does not exist                           │
│     Similar: none found                              │
╰──────────────────────────────────────────────────────╯

VERIFIED REFERENCES: 4
  ✓ Function: get_user_data() (src/users.py:5)
  ✓ Class: UserManager (src/users.py:1)
  ✓ Function: save() (src/base.py:2)
  ✓ File: src/utils/format.py

Hallucination rate: 55% (5 of 9 code references)
```

## How it works

1. **Extract references** from the LLM output using regex:
   - Function calls: `get_user_profile()`, `fetchResults()`
   - Class names: `UserManager`, `DataProcessor`
   - File paths: `src/utils.py`, `config/settings.json`
   - Import statements: `from utils import format_response`
   - Method calls: `.save()`, `.process()`
   - Constants: `DATABASE_URL`, `SECRET_KEY`

2. **Index the codebase** using Python's `ast` module:
   - All defined functions and methods (with file + line)
   - All class definitions
   - All existing files
   - All module-level variable assignments

3. **Cross-reference**: flag anything the LLM mentioned that doesn't exist

4. **Similarity suggestions**: use `difflib.get_close_matches()` to suggest
   what the LLM might have meant

## Options

| Flag | Description |
|------|-------------|
| `--codebase PATH` | Directory to index (required) |
| `--check LIST` | Comma-separated check types: `functions,classes,files,imports,methods,variables` |
| `--json` | Machine-readable JSON output |
| `--min-confidence FLOAT` | Filter hallucinations by confidence threshold |
| `--no-color` | Disable Rich color output |

## CI integration

Exit code is `1` when hallucinations are found, `0` when clean. Use with
`--json` for structured output:

```yaml
# .github/workflows/llm-check.yml
- name: Check LLM response for hallucinations
  run: |
    hallucination-grep llm_output.txt --codebase ./src --json > hallucination_report.json
    cat hallucination_report.json
```

## Architecture

```
src/hallucination_grep/
├── __init__.py      # Public API
├── cli.py           # Click entry point, Rich output
├── extractor.py     # Regex-based reference extraction from LLM text
├── indexer.py       # AST-based codebase indexing
└── checker.py       # Cross-reference + similarity matching
```

## Development

```bash
pip install -e ".[dev]"
pytest
black .
mypy src/
```

## License

MIT
