Metadata-Version: 2.4
Name: ai-provenance-tracker
Version: 1.0.1
Summary: Detect AI-generated content, trace origins, verify authenticity
Project-URL: Homepage, https://github.com/ogulcanaydogan/ai-provenance-tracker
Project-URL: Documentation, https://github.com/ogulcanaydogan/ai-provenance-tracker#readme
Project-URL: Repository, https://github.com/ogulcanaydogan/ai-provenance-tracker
Project-URL: Issues, https://github.com/ogulcanaydogan/ai-provenance-tracker/issues
Author-email: Ogulcan Aydogan <ogulcanaydogan@gmail.com>
License-Expression: MIT
Keywords: ai,authenticity,deepfake,detection,provenance
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.11
Requires-Dist: aiosqlite>=0.20.0
Requires-Dist: asyncpg>=0.29.0
Requires-Dist: fastapi<0.139,>=0.109.0
Requires-Dist: httpx>=0.26.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: passlib[bcrypt]>=1.7.4
Requires-Dist: pillow>=10.2.0
Requires-Dist: prometheus-fastapi-instrumentator<9.0.0,>=7.0.0
Requires-Dist: pydantic-settings>=2.1.0
Requires-Dist: pydantic>=2.5.0
Requires-Dist: python-jose[cryptography]>=3.3.0
Requires-Dist: python-multipart>=0.0.6
Requires-Dist: redis>=5.0.1
Requires-Dist: scikit-learn>=1.4.0
Requires-Dist: scipy>=1.12.0
Requires-Dist: sqlalchemy>=2.0.25
Requires-Dist: structlog>=24.1.0
Requires-Dist: transformers>=4.37.0
Requires-Dist: uvicorn[standard]>=0.27.0
Provides-Extra: dev
Requires-Dist: httpx>=0.26.0; extra == 'dev'
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pre-commit>=3.6.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff>=0.1.14; extra == 'dev'
Provides-Extra: ml
Requires-Dist: accelerate>=1.1.0; extra == 'ml'
Requires-Dist: torch>=2.1.0; extra == 'ml'
Requires-Dist: torchvision>=0.16.0; extra == 'ml'
Description-Content-Type: text/markdown

# AI Provenance Tracker - Backend

FastAPI backend for detecting AI-generated content.

## Quick Start

```bash
# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -e ".[dev]"

# Run the server
uvicorn app.main:app --reload
```

## API Endpoints

- `POST /api/v1/detect/text` - Detect AI-generated text
- `POST /api/v1/detect/image` - Detect AI-generated images
- `POST /api/v1/detect/audio` - Detect AI-generated audio (WAV)
- `POST /api/v1/detect/video` - Detect AI-generated video (MVP)
- `POST /api/v1/batch/text` - Batch text detection
- `POST /api/v1/intel/x/collect` - Collect X data into trust-and-safety input schema
- `POST /api/v1/intel/x/collect/estimate` - Estimate X request cost without external calls
- `POST /api/v1/intel/x/report` - Generate trust-and-safety report from normalized input
- `POST /api/v1/intel/x/drilldown` - Build cluster/claim drill-down + alerts dataset
- `GET /api/v1/intel/x/scheduler/status` - Check recurring job status
- `POST /api/v1/intel/x/scheduler/run` - Trigger one immediate scheduled run
- `GET /api/v1/analyze/dashboard` - Dashboard-ready analytics metrics
- `GET /api/v1/analyze/evaluation` - Calibration precision/recall trend for dashboard
- `GET /api/v1/analyze/audit-events` - Audit log events (HTTP + detection)
- `GET /health` - Health check

## X Intelligence Collection

Set `X_BEARER_TOKEN` in `.env`, then either call API:

```bash
curl -X POST "http://localhost:8000/api/v1/intel/x/collect" \
  -H "Content-Type: application/json" \
  -d '{"target_handle":"@example","window_days":30,"max_posts":300,"query":"anthropic OR claudecode"}'
```

or use CLI utility:

```bash
python scripts/collect_x_input.py --handle @example --window-days 30 --max-posts 300 --query "anthropic OR claudecode" --output ./x_intel_input.json --show-request-estimate
```

Low-cost run (tight request budget):

```bash
X_MAX_PAGES=1 X_MAX_REQUESTS_PER_RUN=4 python scripts/collect_x_input.py --handle @example --window-days 7 --max-posts 60 --output ./x_intel_input.json --show-request-estimate
```

Cost precheck endpoint (no external X calls):

```bash
curl -X POST "http://localhost:8000/api/v1/intel/x/collect/estimate" \
  -H "Content-Type: application/json" \
  -d '{"window_days":7,"max_posts":60,"max_pages":1}'
```

Batch text detection:

```bash
curl -X POST "http://localhost:8000/api/v1/batch/text" \
  -H "Content-Type: application/json" \
  -d '{"items":[{"item_id":"a","text":"Sample text one..."},{"item_id":"b","text":"Sample text two..."}]}'
```

Dashboard metrics:

```bash
curl "http://localhost:8000/api/v1/analyze/dashboard?days=30"
```

Audit events:

```bash
curl "http://localhost:8000/api/v1/analyze/audit-events?limit=50"
```

Dashboard drill-down from normalized input:

```bash
curl -X POST "http://localhost:8000/api/v1/intel/x/drilldown" \
  -H "Content-Type: application/json" \
  --data-binary @./x_intel_input.json
```

## Trust Report, Benchmark, Evidence Pack

Generate trust report:

```bash
python scripts/generate_x_trust_report.py --input ./x_intel_input.json --output ./x_trust_report.json
```

Benchmark (optional labels file):

```bash
python scripts/benchmark_x_intel.py --report ./x_trust_report.json --labels ./evidence/labels_template.json --output ./x_trust_benchmark.json
```

Build talent-visa evidence pack:

```bash
python scripts/build_talent_visa_evidence_pack.py --reports-glob "./x_trust_report*.json" --benchmarks-glob "./x_trust_benchmark*.json" --output-dir ./evidence
```

Run full pipeline:

```bash
python scripts/run_talent_visa_pipeline.py --handle @example --window-days 90 --max-posts 600 --query "anthropic OR claudecode OR claudeai OR usagelimits"
```

Run pipeline from pre-collected input JSON (offline mode):

```bash
python scripts/run_talent_visa_pipeline.py --input-json ./x_intel_input.json --output-dir ./evidence/runs/manual_input --run-id run_snapshot
```

Compare two run directories:

```bash
python scripts/compare_talent_visa_runs.py --base-run-dir ./evidence/runs/run_a --candidate-run-dir ./evidence/runs/run_b --output-json ./evidence/runs/comparisons/run_a_vs_run_b.json --output-md ./evidence/runs/comparisons/run_a_vs_run_b.md
```

Evaluate confidence-threshold calibration on labeled data:

```bash
python scripts/evaluate_detection_calibration.py --input ./labels_text.jsonl --content-type text --output ./calibration_text.json --register
python scripts/evaluate_detection_calibration.py --input ./labels_audio.jsonl --content-type audio --output ./calibration_audio.json --register
python scripts/evaluate_detection_calibration.py --input ./labels_video.jsonl --content-type video --output ./calibration_video.json --register
```

Audio/video JSONL templates: `./evidence/samples/audio_labeled_template.jsonl`, `./evidence/samples/video_labeled_template.jsonl`

Weekly pipeline cycle with automatic run comparison:

```bash
python scripts/run_weekly_talent_visa_cycle.py --handle @example --window-days 7 --max-posts 60 --output-dir ./evidence/runs/weekly --comparisons-dir ./evidence/runs/comparisons --summary-output ./evidence/runs/weekly/latest_summary.json
```

Production smoke test for all detect endpoints:

```bash
python scripts/smoke_detect_prod.py --base-url https://your-api-domain --output ./evidence/smoke/prod_detect_smoke.json
```

Run background worker process (scheduler + webhook retry queue):

```bash
python -m app.worker.main
```

Trigger a scheduler run manually:

```bash
curl -X POST "http://localhost:8000/api/v1/intel/x/scheduler/run?handle=@example"
```

Check scheduler status:

```bash
curl "http://localhost:8000/api/v1/intel/x/scheduler/status"
```

## Persistence and Migrations

Runtime analysis history is persisted in `analysis_records` (SQLite by default).
Audit events are persisted in `audit_events`.

```bash
alembic upgrade head
```

## Security and Spend Controls

Configure optional API key enforcement and endpoint spend controls in `.env`:

- `REQUIRE_API_KEY`
- `API_KEYS`
- `DAILY_SPEND_CAP_POINTS`
- `RATE_LIMIT_MEDIA_REQUESTS`
- `RATE_LIMIT_BATCH_REQUESTS`
- `RATE_LIMIT_INTEL_REQUESTS`
- `X_COST_GUARD_ENABLED`
- `X_MAX_REQUESTS_PER_RUN`
- `CONSENSUS_ENABLED`
- `COPYLEAKS_API_KEY`
- `REALITY_DEFENDER_API_KEY`
- `SCHEDULER_ENABLED`
- `SCHEDULER_HANDLES`
- `SCHEDULER_MONTHLY_REQUEST_CAP`
- `SCHEDULER_KILL_SWITCH_ON_CAP`
- `SCHEDULER_USAGE_FILE`
- `RUN_SCHEDULER_IN_API`
- `WORKER_ENABLE_SCHEDULER`
- `WORKER_DRAIN_WEBHOOK_QUEUE`
- `WORKER_TICK_SECONDS`
- `WEBHOOK_URLS`
- `WEBHOOK_RETRY_ATTEMPTS`
- `WEBHOOK_RETRY_BACKOFF_SECONDS`
- `WEBHOOK_QUEUE_FILE`
- `WEBHOOK_DEAD_LETTER_FILE`
- `AUDIT_EVENTS_ENABLED`
- `AUDIT_LOG_HTTP_REQUESTS`
- `AUDIT_ACTOR_HEADER`

## Documentation

- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
