Metadata-Version: 2.4
Name: truthcheck
Version: 0.3.0
Summary: Open source AI content verification
Project-URL: Homepage, https://github.com/truthscore/truthscore
Project-URL: Documentation, https://github.com/truthscore/truthscore#readme
Project-URL: Repository, https://github.com/truthscore/truthscore
Project-URL: Issues, https://github.com/truthscore/truthscore/issues
Author: TruthScore Contributors
License-Expression: MIT
Keywords: ai,fact-check,mcp,misinformation,verification
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: click>=8.0
Requires-Dist: datasketch>=1.6
Requires-Dist: ddgs>=7.0
Requires-Dist: mcp>=1.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.28
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.18; extra == 'anthropic'
Provides-Extra: dev
Requires-Dist: anthropic>=0.18; extra == 'dev'
Requires-Dist: black>=23.0; extra == 'dev'
Requires-Dist: google-generativeai>=0.5; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: openai>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.10; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: python-dotenv>=1.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Provides-Extra: dotenv
Requires-Dist: python-dotenv>=1.0; extra == 'dotenv'
Provides-Extra: gemini
Requires-Dist: google-generativeai>=0.5; extra == 'gemini'
Provides-Extra: llm
Requires-Dist: anthropic>=0.18; extra == 'llm'
Requires-Dist: google-generativeai>=0.5; extra == 'llm'
Requires-Dist: openai>=1.0; extra == 'llm'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Description-Content-Type: text/markdown

# TruthCheck 🔍

**Open source AI content verification.** Score claims 0-100 and trace their origins.

[![PyPI](https://img.shields.io/pypi/v/truthcheck.svg)](https://pypi.org/project/truthcheck/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Publishers](https://img.shields.io/badge/publishers-8%2C974-blue.svg)]()

## The Problem

AI chatbots retrieve content from the web and present it as fact. Bad actors exploit this by creating fake articles designed to fool AI systems — effectively laundering misinformation through "trusted" AI interfaces.

**Example:** BBC journalist Thomas Germain [demonstrated](https://www.bbc.com/future/article/20260218-i-hacked-chatgpt-and-googles-ai-and-it-only-took-20-minutes) he could make ChatGPT and Google's AI tell users he's "the best tech journalist at eating hot dogs" — by publishing a single fake article on his personal website.

## The Solution

TruthCheck provides two powerful tools:

### 1. `verify` — Score claims 0-100

```bash
$ truthcheck verify "Thomas Germain is the best tech journalist at eating hot dogs" --llm gemini --deep

Claim: Thomas Germain is the best tech journalist at eating hot dogs
TruthScore: 0/100 (FALSE)

Score Breakdown:
  Publisher Credibility: 41/100 (30%)
  Content Analysis:      2/100 (30%)
  Corroboration:         0/100 (20%)
  Fact-Check:            20/100 (20%)

⚠️ ZERO FLAG: Content identified as satire

Evidence:
  • 🚨 Content identified as satire
  • ⚠️ Self-published: tomgermain.com publishes claims about its own subject
  • 🎭 Satire detected: tomgermain.com
```

### 2. `trace` — Find the origin and propagation

```bash
$ truthcheck trace "Thomas Germain is the best tech journalist at eating hot dogs"

╔════════════════════════════════════════════════════════════╗
║ CLAIM TRACE RESULT                                         ║
╠════════════════════════════════════════════════════════════╣
║ Thomas Germain is the best tech journalist at eating ho... ║
╚════════════════════════════════════════════════════════════╝

🎯 ORIGIN (First Source)
   Domain: tomgermain.com
   Date:   2026-02-05
   URL:    https://tomgermain.com/hotdogs.html...

📅 TIMELINE
  🥇 [2026-02-05] tomgermain.com
     └─ The Best Tech Journalists at Eating Hot Dogs...
  🥈 [2026-02-18] bbc.com
     └─ I hacked ChatGPT and Google's AI...
  🥉 [2026-02-18] gizmodo.com
     └─ You Can Hack ChatGPT to Become the World's Best...

🔗 PROPAGATION
  tomgermain.com         ──▶ bbc.com
                         [████████░░] 85%
  tomgermain.com         ──▶ gizmodo.com
                         [███████░░░] 72%

📊 STATS
   Sources found: 14
   With dates:    8
   Date range:    2026-02-05 → 2026-02-19
   Top domains:   tomgermain.com, bbc.com, gizmodo.com
```

## Features

- 🎯 **TruthScore 0-100** — Clear, weighted credibility score
- 🔍 **Origin Tracing** — Find where a claim started and how it spread
- 📊 **8,974 Publishers** — Auto-synced from Media Bias/Fact Check
- 🚨 **Zero Flags** — Auto-detect satire, fake experiments, self-published
- 🦆 **Free Search** — DuckDuckGo by default (no API key needed)
- 🔌 **MCP Server** — Works with Claude Desktop, Cursor
- 💰 **Low Cost** — MinHash similarity (no PyTorch), LLM optional

## Quick Start

### Installation

```bash
pip install truthcheck
```

That's it! All features included. (~15MB, no heavy dependencies)

### CLI Usage

```bash
# Verify a claim (TruthScore 0-100)
truthcheck verify "Some claim" --llm gemini --deep

# Trace claim origin (no LLM needed)
truthcheck trace "Some claim"
truthcheck trace "Some claim" --quick    # Fast (5 sources)
truthcheck trace "Some claim" --deep     # Thorough (30 sources)

# Check a URL
truthcheck check https://example.com/article

# Look up publisher reputation
truthcheck lookup breitbart.com
```

### Python API

```python
from truthcheck import verify_claim, trace_claim
from truthcheck.search import DuckDuckGoProvider

# Verify a claim (TruthScore 0-100)
result = verify_claim(
    "Earth is flat",
    search_provider=DuckDuckGoProvider(),
    deep_analysis=True
)
print(f"TruthScore: {result.truthscore}/100")
print(f"Evidence: {result.evidence}")

# Trace a claim (find origin)
result = trace_claim("Some viral claim")
print(f"Origin: {result['origin']['domain']} ({result['origin']['date']})")
print(f"Timeline: {len(result['timeline'])} sources")
```

## How TruthScore Works

### Scoring Formula (0-100)

| Factor | Weight | What It Measures |
|--------|--------|------------------|
| **Publisher Credibility** | 30% | Is the source in MBFC? Trust rating? |
| **Content Analysis** | 30% | Does it make sense? Red flags? |
| **Corroboration** | 20% | Do reputable sources confirm? |
| **Fact-Check** | 20% | What do fact-checkers say? |

### Zero Flags (Automatic Score = 0)

These patterns force TruthScore to 0:

| Flag | Meaning |
|------|---------|
| 🎭 **Satire** | Content is humor/parody |
| 🧪 **Fake Experiment** | Deliberately fake to test AI |
| 🎬 **Entertainment** | Not meant as fact |
| 🤖 **AI-Generated** | Synthetic misinformation |
| ⚠️ **Self-Published** | Subject publishes own claims |

### Score Interpretation

| Score | Label | Meaning |
|-------|-------|---------|
| 0 | FALSE | Zero flag triggered |
| 1-24 | LIKELY FALSE | Strong evidence against |
| 25-49 | UNCERTAIN | Mixed evidence |
| 50-74 | POSSIBLY TRUE | Some support |
| 75-100 | LIKELY TRUE | Strong support |

## URL Verification

```python
from truthcheck import verify

result = verify("https://reuters.com/article/...")
print(result.trust_score)      # 0.85
print(result.recommendation)   # TRUST / CAUTION / REJECT
```

## Publisher Database

Includes **8,974 publishers** from [Media Bias/Fact Check](https://mediabiasfactcheck.com/):

```bash
$ truthcheck lookup reuters.com

Publisher Found:
  Name: Reuters
  Trust Score: 0.85
  Bias: center
  Fact Check Rating: very-high
```

## Configuration

TruthCheck uses **environment variables** for API keys. No config file needed.

### LLM Providers (for `--llm` option)

| Provider | Environment Variables (in priority order) |
|----------|------------------------------------------|
| **Gemini** | `TRUTHSCORE_GEMINI_KEY` or `GEMINI_API_KEY` or `GOOGLE_API_KEY` |
| **OpenAI** | `TRUTHSCORE_OPENAI_KEY` or `OPENAI_API_KEY` |
| **Anthropic** | `TRUTHSCORE_ANTHROPIC_KEY` or `ANTHROPIC_API_KEY` |
| **Ollama** | No key needed (local) |

**Example setup** (add to `~/.bashrc` or `~/.zshrc`):

```bash
# Pick one LLM provider
export GOOGLE_API_KEY=AIza...           # Gemini (recommended, free tier)
export OPENAI_API_KEY=sk-...            # OpenAI
export ANTHROPIC_API_KEY=sk-ant-...     # Anthropic

# Then use it
truthcheck verify "some claim" --llm gemini --deep
```

Or inline:
```bash
GOOGLE_API_KEY=AIza... truthcheck verify "claim" --llm gemini --deep
```

### Search Providers

| Provider | Environment Variable | Notes |
|----------|---------------------|-------|
| **DuckDuckGo** | None needed | Default, free |
| **Brave** | `TRUTHSCORE_SEARCH_KEY` | [Get free key](https://brave.com/search/api/) |
| **SearXNG** | `SEARXNG_URL` | Self-hosted, default: `http://localhost:8080` |

```bash
# Optional: Brave Search for better results
export TRUTHSCORE_SEARCH_KEY=BSA...
truthcheck verify "claim" --search brave
```

## MCP Server (Claude Desktop, Cursor)

```bash
truthcheck-mcp
```

```json
{
  "mcpServers": {
    "truthcheck": {
      "command": "truthcheck-mcp"
    }
  }
}
```

## Project Structure

```
truthcheck/
├── src/truthcheck/
│   ├── verify.py          # URL verification
│   ├── trace.py           # verify_claim() and trace_claim()
│   ├── similarity.py      # Local embeddings for trace
│   ├── propagation.py     # Build propagation tree
│   ├── visualize.py       # ASCII visualization
│   ├── publisher_db.py    # 8,974 publishers from MBFC
│   ├── search.py          # DuckDuckGo, Brave, SearXNG
│   ├── llm.py             # Gemini, OpenAI, Anthropic, Ollama
│   └── cli.py             # Command-line interface
└── tests/
```

## Philosophy

1. **Scores over verdicts** — 0-100 is clearer than TRUE/FALSE
2. **Trace over trust** — Show the origin, not just a verdict
3. **Evidence over summaries** — Show why, not just what
4. **Open over proprietary** — Verification is a public good

## License

MIT License.

## Acknowledgments

- [Media Bias/Fact Check](https://mediabiasfactcheck.com/) — Publisher database
- [Thomas Germain / BBC](https://www.bbc.com/future/article/20260218-i-hacked-chatgpt-and-googles-ai-and-it-only-took-20-minutes) — Hot dog experiment inspiration

---

**Questions?** [Open an issue](https://github.com/baiyishr/truthcheck/issues)
