Metadata-Version: 2.4
Name: modelaudit
Version: 0.2.5
Summary: Static scanning library for detecting malicious code, backdoors, and other security risks in ML model files
Project-URL: Repository, https://github.com/promptfoo/modelaudit
Project-URL: Homepage, https://github.com/promptfoo/modelaudit
Author-email: Ian Webster <ian@promptfoo.dev>, Michael D'Angelo <michael@promptfoo.dev>
License: MIT
License-File: LICENSE
Keywords: ai,ml,model-scanning,pickle,pytorch,security,tensorflow
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Security
Requires-Python: >=3.9
Requires-Dist: click>=8.1.7
Requires-Dist: cyclonedx-python-lib>=11.0.0
Requires-Dist: defusedxml>=0.7.1
Requires-Dist: huggingface-hub>=0.23.0
Requires-Dist: numpy>=1.19.0
Requires-Dist: platformdirs>=3.0.0
Requires-Dist: pydantic<3.0,>=2.11.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pyyaml<7.0,>=6.0
Requires-Dist: requests>=2.28.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: yaspin>=2.5.0
Provides-Extra: all
Requires-Dist: coremltools>=8.0; extra == 'all'
Requires-Dist: dill<1.0,>=0.3.0; extra == 'all'
Requires-Dist: fsspec>=2025.5.1; extra == 'all'
Requires-Dist: gcsfs>=2025.5.1; extra == 'all'
Requires-Dist: h5py<4.0,>=3.1; extra == 'all'
Requires-Dist: huggingface-hub>=0.23.0; extra == 'all'
Requires-Dist: joblib<2.0,>=1.0.0; extra == 'all'
Requires-Dist: mlflow>=2.12.0; extra == 'all'
Requires-Dist: msgpack<2.0,>=1.0.0; extra == 'all'
Requires-Dist: onnx<2.0,>=1.12.0; extra == 'all'
Requires-Dist: pyyaml<7.0,>=6.0; extra == 'all'
Requires-Dist: s3fs>=2025.5.1; extra == 'all'
Requires-Dist: safetensors>=0.4.0; extra == 'all'
Requires-Dist: scikit-learn<2.0,>=1.0.0; extra == 'all'
Requires-Dist: tensorflow<3.0,>=2.13.0; extra == 'all'
Requires-Dist: tensorrt>=8.6.0; extra == 'all'
Requires-Dist: tflite>=2.18.0; extra == 'all'
Requires-Dist: torch<3.0,>=2.6.0; extra == 'all'
Provides-Extra: all-ci
Requires-Dist: coremltools>=8.0; extra == 'all-ci'
Requires-Dist: dill<1.0,>=0.3.0; extra == 'all-ci'
Requires-Dist: fsspec>=2025.5.1; extra == 'all-ci'
Requires-Dist: gcsfs>=2025.5.1; extra == 'all-ci'
Requires-Dist: h5py<4.0,>=3.1; extra == 'all-ci'
Requires-Dist: huggingface-hub>=0.23.0; extra == 'all-ci'
Requires-Dist: joblib<2.0,>=1.0.0; extra == 'all-ci'
Requires-Dist: mlflow>=2.12.0; extra == 'all-ci'
Requires-Dist: msgpack<2.0,>=1.0.0; extra == 'all-ci'
Requires-Dist: onnx<2.0,>=1.12.0; extra == 'all-ci'
Requires-Dist: pyyaml<7.0,>=6.0; extra == 'all-ci'
Requires-Dist: s3fs>=2025.5.1; extra == 'all-ci'
Requires-Dist: safetensors>=0.4.0; extra == 'all-ci'
Requires-Dist: scikit-learn<2.0,>=1.0.0; extra == 'all-ci'
Requires-Dist: tensorflow<3.0,>=2.13.0; extra == 'all-ci'
Requires-Dist: tflite>=2.18.0; extra == 'all-ci'
Requires-Dist: torch<3.0,>=2.6.0; extra == 'all-ci'
Provides-Extra: cloud
Requires-Dist: fsspec>=2025.5.1; extra == 'cloud'
Requires-Dist: gcsfs>=2025.5.1; extra == 'cloud'
Requires-Dist: s3fs>=2025.5.1; extra == 'cloud'
Provides-Extra: coreml
Requires-Dist: coremltools>=8.0; extra == 'coreml'
Provides-Extra: dill
Requires-Dist: dill<1.0,>=0.3.0; extra == 'dill'
Provides-Extra: flax
Requires-Dist: msgpack<2.0,>=1.0.0; extra == 'flax'
Provides-Extra: h5
Requires-Dist: h5py<4.0,>=3.1; extra == 'h5'
Provides-Extra: huggingface
Requires-Dist: huggingface-hub>=0.23.0; extra == 'huggingface'
Provides-Extra: joblib
Requires-Dist: joblib<2.0,>=1.0.0; extra == 'joblib'
Requires-Dist: scikit-learn<2.0,>=1.0.0; extra == 'joblib'
Provides-Extra: mlflow
Requires-Dist: mlflow>=2.12.0; extra == 'mlflow'
Provides-Extra: numpy1
Requires-Dist: coremltools>=8.0; extra == 'numpy1'
Requires-Dist: dill<1.0,>=0.3.0; extra == 'numpy1'
Requires-Dist: fsspec>=2025.5.1; extra == 'numpy1'
Requires-Dist: gcsfs>=2025.5.1; extra == 'numpy1'
Requires-Dist: h5py<4.0,>=3.1; extra == 'numpy1'
Requires-Dist: huggingface-hub>=0.23.0; extra == 'numpy1'
Requires-Dist: joblib<2.0,>=1.0.0; extra == 'numpy1'
Requires-Dist: msgpack<2.0,>=1.0.0; extra == 'numpy1'
Requires-Dist: numpy<2.0,>=1.19.0; extra == 'numpy1'
Requires-Dist: onnx<2.0,>=1.12.0; extra == 'numpy1'
Requires-Dist: pyyaml<7.0,>=6.0; extra == 'numpy1'
Requires-Dist: s3fs>=2025.5.1; extra == 'numpy1'
Requires-Dist: safetensors>=0.4.0; extra == 'numpy1'
Requires-Dist: scikit-learn<2.0,>=1.0.0; extra == 'numpy1'
Requires-Dist: tensorflow<3.0,>=2.13.0; extra == 'numpy1'
Requires-Dist: tensorrt>=8.6.0; extra == 'numpy1'
Requires-Dist: tflite>=2.18.0; extra == 'numpy1'
Requires-Dist: torch<3.0,>=2.6.0; extra == 'numpy1'
Provides-Extra: onnx
Requires-Dist: onnx<2.0,>=1.12.0; extra == 'onnx'
Provides-Extra: pytorch
Requires-Dist: torch<3.0,>=2.6.0; extra == 'pytorch'
Provides-Extra: safetensors
Requires-Dist: safetensors>=0.4.0; extra == 'safetensors'
Provides-Extra: sevenzip
Requires-Dist: py7zr>=0.20.0; extra == 'sevenzip'
Provides-Extra: tensorflow
Requires-Dist: tensorflow<3.0,>=2.13.0; extra == 'tensorflow'
Provides-Extra: tensorrt
Requires-Dist: tensorrt>=8.6.0; extra == 'tensorrt'
Provides-Extra: tflite
Requires-Dist: tflite>=2.18.0; extra == 'tflite'
Provides-Extra: yaml
Requires-Dist: pyyaml<7.0,>=6.0; extra == 'yaml'
Description-Content-Type: text/markdown

# ModelAudit

Static security scanner for AI/ML model files. It detects malicious code, dangerous deserialization, risky module usage, and embedded secrets—all without loading or executing the model.

[![PyPI version](https://badge.fury.io/py/modelaudit.svg)](https://pypi.org/project/modelaudit/)
[![Python versions](https://img.shields.io/pypi/pyversions/modelaudit.svg)](https://pypi.org/project/modelaudit/)
[![Code Style: ruff](https://img.shields.io/badge/code%20style-ruff-005cd7.svg)](https://github.com/astral-sh/ruff)
[![License](https://img.shields.io/github/license/promptfoo/promptfoo)](https://github.com/promptfoo/promptfoo/blob/main/LICENSE)

<img width="989" alt="image" src="https://www.promptfoo.dev/img/docs/modelaudit/modelaudit-result.png" />

**[Documentation](https://www.promptfoo.dev/docs/model-audit/)** | **[Usage Examples](https://www.promptfoo.dev/docs/model-audit/usage/)** | **[Supported Formats](https://www.promptfoo.dev/docs/model-audit/scanners/)**

## Quick Start

```bash
# Install with all supported ML framework dependencies
pip install modelaudit[all]

# Scan a model file
modelaudit model.pkl

# Scan a directory
modelaudit ./models/

# Export results for automation
modelaudit model.pkl --format json --output results.json
```

**Example output:**

```text
$ modelaudit suspicious_model.pkl

✓ Scanning suspicious_model.pkl
Files scanned: 1 | Issues found: 2 critical, 1 warning

1. suspicious_model.pkl (pos 28): [CRITICAL] Malicious code execution attempt
   Why: Contains os.system() call that could run arbitrary commands

2. suspicious_model.pkl (pos 52): [WARNING] Dangerous pickle deserialization
   Why: Could execute code when the model loads

✗ 2 security issues found. See details above.
```

## Installation

**Recommended (includes common ML frameworks):**

```bash
pip install modelaudit[all]
```

**Basic installation:**

```bash
# Core functionality only (pickle, numpy, archives)
pip install modelaudit
```

**Specific frameworks:**

```bash
pip install modelaudit[tensorflow]  # TensorFlow (.pb)
pip install modelaudit[pytorch]     # PyTorch (.pt, .pth)
pip install modelaudit[h5]          # Keras (.h5, .keras)
pip install modelaudit[onnx]        # ONNX (.onnx)
pip install modelaudit[safetensors] # SafeTensors (.safetensors)

# Multiple frameworks
pip install modelaudit[tensorflow,pytorch,h5]
```

**Additional features:**

```bash
pip install modelaudit[cloud]       # S3, GCS, Azure storage
pip install modelaudit[coreml]      # Apple Core ML
pip install modelaudit[flax]        # JAX/Flax models
pip install modelaudit[mlflow]      # MLflow registry
pip install modelaudit[huggingface] # Hugging Face integration
```

**Compatibility:**

```bash
# NumPy 1.x compatibility (some frameworks require NumPy < 2.0)
pip install modelaudit[numpy1]

# For CI/CD environments (omits dependencies like TensorRT that may not be available)
pip install modelaudit[all-ci]
```

**Docker:**

```bash
docker pull ghcr.io/promptfoo/modelaudit:latest
# Linux/macOS
docker run --rm -v "$(pwd)":/app ghcr.io/promptfoo/modelaudit:latest model.pkl
# Windows
docker run --rm -v "%cd%":/app ghcr.io/promptfoo/modelaudit:latest model.pkl
```

## Security Checks

### Code Execution Detection

- Dangerous Python modules: `os`, `sys`, `subprocess`, `eval`, `exec`
- Pickle opcodes: `REDUCE`, `GLOBAL`, `INST`, `OBJ`, `NEWOBJ`, `STACK_GLOBAL`, `BUILD`, `NEWOBJ_EX`
- Embedded executable file detection

### Embedded Data Extraction

- API keys, tokens, and credentials in model weights/metadata
- URLs, IP addresses, and network endpoints
- Suspicious configuration properties

### Archive Security

- Path traversal attacks in ZIP/TAR archives
- Executable files within model packages
- Malicious filenames and directory structures

### ML Framework Analysis

- TensorFlow operations: `PyFunc`, `PyFuncStateless`
- Keras unsafe layers and custom objects
- Template injection in model configurations

### Context-Aware Analysis

- Intelligently distinguishes between legitimate ML framework patterns and genuine threats to reduce false positives in complex model files

## Supported Formats

ModelAudit includes 29 specialized scanners for ML model formats ([see complete list](https://www.promptfoo.dev/docs/model-audit/scanners/)):

| Format          | Extensions                                | Security Focus                                     |
| --------------- | ----------------------------------------- | -------------------------------------------------- |
| **Pickle**      | `.pkl`, `.pickle`, `.dill`, `.pt`, `.pth` | Code execution, malicious opcodes, deserialization |
| **Archives**    | `.zip`, `.tar`, `.gz`, `.7z`, `.bz2`      | Path traversal, embedded executables               |
| **TensorFlow**  | `.pb`, SavedModel directories             | Dangerous operations, custom ops                   |
| **Keras**       | `.h5`, `.keras`, `.hdf5`                  | Unsafe layers, custom objects                      |
| **ONNX**        | `.onnx`                                   | Custom operators, metadata                         |
| **SafeTensors** | `.safetensors`                            | Header validation, metadata                        |
| **GGUF/GGML**   | `.gguf`, `.ggml`                          | Header validation, metadata                        |
| **Joblib**      | `.joblib`                                 | Pickled objects, scikit-learn                      |
| **JAX/Flax**    | `.msgpack`, `.flax`, `.orbax`             | Serialized transforms                              |
| **NumPy**       | `.npy`, `.npz`                            | Array metadata, pickle objects                     |
| **Core ML**     | `.mlmodel`                                | Custom layers, metadata                            |
| **ExecuTorch**  | `.ptl`, `.pte`                            | Mobile model validation                            |

Plus scanners for TensorFlow Lite, TensorRT, PaddlePaddle, OpenVINO, text files, and configuration formats.

[Complete format documentation →](https://www.promptfoo.dev/docs/model-audit/scanners/)

## Usage Examples

### Basic Scanning

```bash
# Scan single file
modelaudit model.pkl

# Scan directory
modelaudit ./models/

# Strict mode (fail on warnings)
modelaudit model.pkl --strict
```

### CI/CD Integration

```bash
# JSON output for automation
modelaudit models/ --format json --output results.json

# Generate SBOM report
modelaudit model.pkl --sbom compliance_report.json

# Disable colors in CI
NO_COLOR=1 modelaudit models/
```

### Remote Sources

```bash
# Hugging Face models (via direct URL or hf:// scheme)
modelaudit https://huggingface.co/gpt2
modelaudit hf://microsoft/DialoGPT-medium

# Cloud storage
modelaudit s3://bucket/model.pt
modelaudit gs://bucket/models/
modelaudit https://account.blob.core.windows.net/container/model.pt

# MLflow registry
modelaudit models:/MyModel/Production

# JFrog Artifactory
modelaudit https://company.jfrog.io/repo/model.pt
```

### Command Options

- **`--format`** - Output format: text, json, sarif
- **`--output`** - Write results to file
- **`--verbose`** - Detailed output
- **`--quiet`** - Minimal output
- **`--strict`** - Fail on warnings, scan all files
- **`--timeout`** - Override scan timeout
- **`--max-size`** - Set size limits (e.g., 10 GB)
- **`--dry-run`** - Preview without scanning
- **`--progress`** - Force progress display
- **`--sbom`** - Generate CycloneDX SBOM
- **`--blacklist`** - Additional patterns to flag
- **`--no-cache`** - Disable result caching

[Advanced usage examples →](https://www.promptfoo.dev/docs/model-audit/usage/)

## Output Formats

### Text (default)

```text
$ modelaudit model.pkl

✓ Scanning model.pkl
Files scanned: 1 | Issues found: 1 critical

1. model.pkl (pos 28): [CRITICAL] Malicious code execution attempt
   Why: Contains os.system() call that could run arbitrary commands
```

### JSON (for automation)

```bash
modelaudit model.pkl --format json
```

```json
{
  "files_scanned": 1,
  "issues": [
    {
      "message": "Malicious code execution attempt",
      "severity": "critical",
      "location": "model.pkl (pos 28)"
    }
  ]
}
```

### SARIF (for security tools)

```bash
modelaudit model.pkl --format sarif --output results.sarif
```

## Troubleshooting

### Check scanner availability

```bash
modelaudit doctor --show-failed
```

### NumPy compatibility issues

```bash
# Use NumPy 1.x compatibility mode
pip install modelaudit[numpy1]
```

### Missing dependencies

```bash
# ModelAudit shows exactly what to install
modelaudit your-model.onnx
# Output: "Install with 'pip install modelaudit[onnx]'"
```

### Exit Codes

- `0` - No security issues found
- `1` - Security issues detected
- `2` - Scan errors occurred

### Authentication

ModelAudit uses environment variables for authenticating to remote services:

```bash
# JFrog Artifactory
export JFROG_API_TOKEN=your_token

# MLflow
export MLFLOW_TRACKING_URI=http://localhost:5000

# AWS, Google Cloud, and Azure
# Authentication is handled automatically by the respective client libraries
# (e.g., via IAM roles, `aws configure`, `gcloud auth login`, or environment variables).
# For specific env var setup, refer to the library's documentation.
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Hugging Face
export HF_TOKEN=your_token
```

## Documentation

- **Documentation**: [promptfoo.dev/docs/model-audit/](https://www.promptfoo.dev/docs/model-audit/)
- **Usage Examples**: [promptfoo.dev/docs/model-audit/usage/](https://www.promptfoo.dev/docs/model-audit/usage/)
- **Report Issues**: Contact support at [promptfoo.dev](https://www.promptfoo.dev/)

## 📝 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
