Metadata-Version: 2.4
Name: ai-text-detector
Version: 1.0.1
Summary: An English AI-generated text detection library
Home-page: https://github.com/xuzhenpeng263/ai-text-detector
Author: xuzhenpeng
Author-email: xuzhenpeng <3195211803@qq.com>
License: MIT
Project-URL: Homepage, https://github.com/xuzhenpeng263/ai-text-detector
Project-URL: Repository, https://github.com/xuzhenpeng263/ai-text-detector
Project-URL: Bug Tracker, https://github.com/xuzhenpeng263/ai-text-detector/issues
Keywords: ai,detection,text-classification,nlp,machine-learning
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.19.0
Requires-Dist: scipy>=1.5.0
Requires-Dist: scikit-learn>=0.24.0
Requires-Dist: xgboost>=1.3.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0.0; extra == "dev"
Requires-Dist: pytest-cov>=2.10.0; extra == "dev"
Requires-Dist: black>=20.8b1; extra == "dev"
Requires-Dist: flake8>=3.8.0; extra == "dev"
Requires-Dist: mypy>=0.800; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# AI Text Detector

An English AI-generated text detection library using statistical features and machine learning.

## Overview

`ai-text-detector` is a Python library that detects whether English text is likely AI-generated or human-written. It uses 60 statistical features and an XGBoost classifier trained on the RAID benchmark dataset.

**Key Features:**
- English text detection
- Fast and lightweight detection
- Pre-trained model included
- Simple, intuitive API
- Batch processing support

## Installation

```bash
pip install ai-text-detector
```

### Requirements

- Python >= 3.7
- numpy
- scipy
- xgboost
- scikit-learn

## Quick Start

```python
from ai_detector import AITextDetector

# Initialize detector
detector = AITextDetector()

# Detect AI-generated text
text = """
Artificial intelligence is transforming the world in unprecedented ways.
Machine learning algorithms are becoming increasingly sophisticated,
enabling computers to perform tasks that were once thought to be
exclusively human.
"""

result = detector.detect(text)

print(f"AI Probability: {result['ai_probability']:.2%}")
print(f"Label: {result['label']}")
print(f"Confidence: {result['confidence']}")
```

Output:
```
AI Probability: 85.32%
Label: AI
Confidence: high
```

## Usage

### Basic Detection

```python
from ai_detector import AITextDetector

detector = AITextDetector()
result = detector.detect("Your English text here...")

if result['is_ai']:
    print(f"This text is likely AI-generated ({result['ai_probability']:.2%})")
else:
    print(f"This text is likely human-written ({result['ai_probability']:.2%})")
```

### Getting AI Score Only

```python
detector = AITextDetector()
ai_score = detector.get_ai_score("Your English text here...")
print(f"AI probability: {ai_score:.2%}")
```

### Boolean Classification

```python
detector = AITextDetector()
if detector.is_ai_generated("Your English text here..."):
    print("This is AI-generated text")
```

### Batch Processing

```python
detector = AITextDetector()
texts = ["Text 1...", "Text 2...", "Text 3..."]
results = detector.detect_batch(texts)

for text, result in zip(texts, results):
    print(f"{text[:50]}... -> {result['label']} ({result['ai_probability']:.2%})")
```

### Custom Parameters

```python
# Adjust decision threshold (default: 0.5)
detector = AITextDetector(threshold=0.7)

# Set minimum character length (default: 100)
detector = AITextDetector(min_chars=50)

# Both parameters
detector = AITextDetector(threshold=0.7, min_chars=50)
```

## Result Format

The `detect()` method returns a dictionary with the following fields:

| Field | Type | Description |
|-------|------|-------------|
| `ai_probability` | float | Probability (0-1) that text is AI-generated |
| `is_ai` | bool | True if classified as AI (based on threshold) |
| `confidence` | str | 'high', 'medium', or 'low' confidence |
| `label` | str | 'AI' or 'Human' classification |
| `warning` | str (optional) | Warning if text is too short |

## Confidence Levels

- **high**: Probability differs from threshold by > 0.3
- **medium**: Probability differs from threshold by 0.15-0.3
- **low**: Probability differs from threshold by < 0.15

## Text Length Recommendations

For best results, use English text with at least **100 characters**. Shorter texts will return a warning and low confidence.

## Model Information

Get information about the loaded model:

```python
detector = AITextDetector()
info = detector.get_model_info()
print(f"Model version: {info['version']}")
print(f"Model AUC: {info['auc']:.4f}")
print(f"Model accuracy: {info['accuracy']:.4f}")
```

## Model Details

- **Features**: 60 statistical features (compression ratio, entropy, burstiness, etc.)
- **Training Data**: RAID benchmark (English text from multiple domains)
- **Algorithm**: XGBoost classifier
- **Language Support**: English text only

## Limitations

1. **English Only**: This model is trained on English text and works best with English content
2. **Text Length**: Short texts (< 100 characters) have unreliable results
3. **Domain Specific**: Model trained on general text; specialized domains may vary
4. **Evolving AI**: As AI models improve, detection accuracy may decrease

## Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

## License

This project is licensed under the MIT License.

## Citation

If you use this library in your research, please cite:

```bibtex
@software{ai_text_detector,
  title={AI Text Detector: English AI-Generated Text Detection},
  author={Your Name},
  year={2025},
  url={https://github.com/yourusername/ai-text-detector}
}
```

## References

This model is based on research using the RAID benchmark:
- RAID: A Benchmark for AI-Generated Text Detection (ACL 2024)

## Support

For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/yourusername/ai-text-detector).
