Metadata-Version: 2.4
Name: detectzoo
Version: 0.1.3
Summary: DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities
License: Apache-2.0
Project-URL: Homepage, https://anonymous.4open.science/r/DetectZoo-1BEC/
Project-URL: Repository, https://anonymous.4open.science/r/DetectZoo-1BEC/
Project-URL: Issues, https://anonymous.4open.science/r/DetectZoo-1BEC/issues
Keywords: ai-detection,deepfake,llm,generative-ai,machine-learning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: accelerate>=1.13.0
Requires-Dist: datasets>=4.7.0
Requires-Dist: huggingface-hub>=0.36.2
Requires-Dist: matplotlib>=3.10.8
Requires-Dist: numpy>=2.4.4
Requires-Dist: peft>=0.18.1
Requires-Dist: rouge>=1.0.1
Requires-Dist: safetensors>=0.7.0
Requires-Dist: scikit-learn>=1.8.0
Requires-Dist: scipy>=1.17.1
Requires-Dist: torch>=2.11.0
Requires-Dist: tqdm>=4.67.3
Requires-Dist: transformers>=4.57.6
Provides-Extra: datasets
Requires-Dist: modelscope>=1.9; extra == "datasets"
Requires-Dist: gdown>=4.0; extra == "datasets"
Provides-Extra: image
Requires-Dist: torchvision>=0.26.0; extra == "image"
Requires-Dist: Pillow>=12.1.1; extra == "image"
Requires-Dist: open-clip-torch>=2.20; extra == "image"
Requires-Dist: diffusers>=0.21; extra == "image"
Requires-Dist: lpips>=0.1.4; extra == "image"
Requires-Dist: pytorch-wavelets>=1.3; extra == "image"
Requires-Dist: timm>=0.9.0; extra == "image"
Provides-Extra: audio
Requires-Dist: torchaudio>=2.11.0; extra == "audio"
Requires-Dist: librosa>=0.11.0; extra == "audio"
Requires-Dist: soundfile>=0.13.1; extra == "audio"
Provides-Extra: dev
Requires-Dist: detectzoo[audio,datasets,image]; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.15.5; extra == "dev"

# DetectZoo

![DetectZoo](https://anonymous.4open.science/api/repo/DetectZoo-1BEC/file/DetectZoo_banner.png?v=6072c3e2)

DetectZoo is a research-oriented Python toolkit that provides **implementations of AI-generated content detectors across multiple modalities**, including **text, images, and audio**.

The goal of DetectZoo is to make detection methods **easy to use, reproducible, and extensible**, enabling researchers and practitioners to benchmark and deploy AI-generated content detectors with minimal effort.

DetectZoo aggregates detection approaches into a **single, unified API**, allowing users to load and apply detectors with just a few lines of code.

---

## Installation

For the sake of anonymity, we put the package on TestPyPI and you can install it with the following command:

*Note: This is a temporary solution and we will release the package on PyPI after the paper is accepted.*

```bash
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ detectzoo-anon
```

or install from source:

```bash
git clone https://anonymous.4open.science/r/DetectZoo-1BEC/
cd detectzoo
pip install -e .
```

Optional extras:

```bash
pip install -e ".[image,audio]"      # everything for image + audio detectors
pip install detectzoo[datasets]     # when you need ModelScope / gdown-based downloads
pip install -e ".[dev]"             # contributors
```

---

## Quick Start

### Detect AI-generated text

```python
from detectzoo import load_detector

detector = load_detector("fast_detectgpt")

text = "Large language models are transforming many fields."
result = detector.predict(text)

print(result)
# DetectionResult(score=1.2345, label='ai', confidence=0.8012)
print(result.score, result.label)
```

### Detect AI-generated images

```python
from detectzoo import load_detector

detector = load_detector("aeroblade")

result = detector.predict("image.png")
print(result.label)  # "ai" or "human"
```

### Detect synthetic audio

```python
from detectzoo import load_detector

detector = load_detector("rawnet2")

result = detector.predict("speech.wav")
print(result.score, result.label)
```

### List all available detectors

```python
from detectzoo import list_detectors

print(list_detectors())            # all detectors
print(list_detectors("text"))      # text-only
print(list_detectors("image"))     # image-only
print(list_detectors("audio"))     # audio-only
```

---

## Supported Detectors

DetectZoo ships detectors for **text**, **images**, and **audio**. Each uses the same interface: `detector.predict(input) → DetectionResult`.

See [METHODS_AND_MODELS.md](METHODS_AND_MODELS.md) for detailed tables of supported detectors, including registry names, implementation classes, and method summaries. To programmatically list available detector names in code, use `list_detectors()` or specify a type: `list_detectors("text" | "image" | "audio")`.

---

## Core Components

### DetectionResult

Every `predict()` call returns a `DetectionResult` dataclass:

```python
@dataclass
class DetectionResult:
    score: float       # Higher = more likely AI-generated
    label: str         # "ai" or "human"
    confidence: float  # Confidence in the label (0–1)
    metadata: dict     # Detector-specific extra info
```

The `metadata` dictionary varies by detector and may include values like `avg_log_likelihood`, `mean_curvature`, `ppl_observer`, `hf_lf_ratio`, etc.

---

## Benchmarking

DetectZoo includes a built-in evaluation pipeline for comparing detectors on labelled datasets.

### Built-in datasets

DetectZoo ships with loaders for popular detection benchmarks. Data is downloaded and cached automatically on first use — no manual setup needed.

See [METHODS_AND_MODELS.md — Built-in datasets](METHODS_AND_MODELS.md#built-in-datasets) for a complete table of built-in datasets, with class names, descriptions, sources, and `load_dataset` registry keys.


```python
from detectzoo.datasets import CHEATDataset

# Auto-downloads from GitHub on first call, cached in .detectzoo_data/cheat/
dataset = CHEATDataset()
dataset = CHEATDataset(categories=["generation"])   # only first-pass ChatGPT abstracts

# Or point to a local copy
dataset = CHEATDataset(path="data/cheat/")

for item in dataset:
    print(item.label, item.data[:80])
```


All datasets cache downloaded files under a `.detectzoo_data/` directory (configurable via `cache_dir`) so subsequent loads are instant.

### Using the evaluator

```python
from detectzoo import load_detector
from detectzoo.datasets import BaseDataset, HC3Dataset
from detectzoo.benchmarks import BenchmarkEvaluator

# Built-in benchmark dataset
dataset = HC3Dataset(subsets=["finance"])

# Or load a dataset from two directories
dataset = BaseDataset.from_directory("data/real/", "data/fake/")

# Or from a CSV (text modality)
dataset = BaseDataset.from_csv("data/texts.csv", text_column="text", label_column="label")

# Evaluate detectors
evaluator = BenchmarkEvaluator(dataset)
evaluator.run_and_print([
    load_detector("log_likelihood"),
    load_detector("entropy"),
    load_detector("fast_detectgpt"),
])
```

This prints a comparison table with accuracy, precision, recall, F1, and AUROC.

### Metrics

The `compute_metrics` utility computes standard binary-classification metrics:

```python
from detectzoo.utils import compute_metrics

metrics = compute_metrics(
    labels=[0, 0, 1, 1],
    scores=[0.1, 0.3, 0.8, 0.9],
    threshold=0.5,
)
# {'accuracy': 1.0, 'precision': 1.0, 'recall': 1.0, 'f1': 1.0, 'tpr': 1.0, 'fpr': 0.0, 'roc_auc': 1.0, 'pr_auc': 1.0, 'avg_precision': 1.0}
```


## Design Philosophy

DetectZoo is built around three principles.

### 1. Reproducibility

Many detection methods are difficult to reproduce due to missing implementation details. DetectZoo provides **clean and standardized implementations of published detectors** with references to the original papers.

### 2. Accessibility

Users should not need to reimplement detectors. DetectZoo provides **simple imports and unified interfaces**. Loading any detector is a single function call.

### 3. Extensibility

Adding a new detector takes a single file. Subclass `BaseDetector`, implement `predict`, and register with a decorator:

```python
from detectzoo.detectors import BaseDetector
from detectzoo.core.registry import register_detector

@register_detector("my_detector")
class MyDetector(BaseDetector):
    modality = "text"  # or "image" or "audio"

    def __init__(self, threshold=0.5, device="cpu", **kwargs):
        super().__init__(threshold=threshold, device=device, **kwargs)

    def predict(self, input_data):
        # Your detection logic here
        score = 0.0
        return self._make_result(score)
```

The detector is then immediately available via `load_detector("my_detector")`. See `examples/custom_detector.py` for a complete runnable example.

---

## Examples

The `examples/` directory contains self-contained scripts you can run immediately:

| Script | Description |
|--------|-------------|
| `text_detection.py` | Compare text detectors (log-likelihood, log-rank, entropy, fast-detectgpt) on sample human and AI passages. |

Run any example from the project root:

```bash
python examples/text_detection.py --device cuda
```

---

## Contributing

We welcome community contributions. You can contribute by:

* Adding new detectors (see the extensibility section above)
* Improving existing implementations
* Adding benchmark datasets
* Improving documentation
* Reporting issues and suggesting features
