Metadata-Version: 2.4
Name: appthreat-vulnerability-db
Version: 6.6.1
Summary: AppThreat's vulnerability database and package search library with a built-in sqlite based storage. OSV, CVE, GitHub, npm are the primary sources of vulnerabilities.
Author-email: Team AppThreat <cloud@appthreat.com>
License: MIT
Project-URL: Homepage, https://github.com/appthreat/vulnerability-db
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: Free Threading :: 1 - Unstable
Classifier: Topic :: Security
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: httpx[http2]
Requires-Dist: appdirs
Requires-Dist: orjson
Requires-Dist: semver
Requires-Dist: packageurl-python
Requires-Dist: cvss
Requires-Dist: pydantic[email]
Requires-Dist: rich
Requires-Dist: apsw
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Requires-Dist: bandit; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: pylint; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Provides-Extra: custom
Requires-Dist: PyYAML; extra == "custom"
Requires-Dist: tomli; python_version < "3.11" and extra == "custom"
Provides-Extra: oras
Requires-Dist: oras>=0.2.25; extra == "oras"
Provides-Extra: all
Requires-Dist: oras>=0.2.25; extra == "all"
Requires-Dist: PyYAML; extra == "all"
Requires-Dist: tomli; python_version < "3.11" and extra == "all"

# Introduction

This repo is a vulnerability database and package search for sources such as AppThreat vuln-list, OSV, NVD, and GitHub. Vulnerability data are downloaded from the sources and stored in a sqlite based storage with indexes to allow offline access and efficient searches.

## Why vulnerability db?

A good vulnerability database must have the following properties:

- Accuracy
- Easy to [download](#download-pre-built-database-recommended), [integrate](#for-integrators), and use
- Performance

Multiple upstream sources are used by vdb to improve accuracy and reduce false negatives. SQLite database containing data in CVE 5.2 schema format is precompiled and distributed as files via ghcr to simplify download. With automatic purl prefix generation even for git repos, searches on the database can be performed with purl, cpe, or even http git url string. Every row in the database uses an open specification such as CVE 5.2 or Package URL (purl and vers) thus preventing the possibility of vendor lock-in.

## Vulnerability Data sources

- Linux [vuln-list](https://github.com/appthreat/vuln-list) (Forked from AquaSecurity)
- OSV (1)
- NVD
- GitHub

1 - We exclude Linux and oss-fuzz feeds by default. Set the environment variable `OSV_INCLUDE_FUZZ=true` to include them.
2 - Malware feeds are included by default, thus increasing the db size slightly. Set the environment variable `OSV_EXCLUDE_MALWARE=true` to exclude them.

## Linux distros

- AlmaLinux
- Debian
- Alpine
- Amazon Linux
- Arch Linux
- RHEL/CentOS
- Rocky Linux
- Ubuntu
- OpenSUSE
- Photon
- Chainguard
- Wolfi OS

## Installation

```shell
pip install appthreat-vulnerability-db>=6.6.1
```

To install vdb with optional dependencies such as `oras` use the `[oras]` or `[all]` dependency group.

```shell
pip install appthreat-vulnerability-db[all]
```

**NOTE:** VDB v6 is a major rewrite to use SQLite database. Current users of depscan v5 must continue using version 5.8.x

```shell
pip install appthreat-vulnerability-db==5.8.0
```

## Usage

This package is ideal as a library for managing vulnerabilities. This is used by [owasp-dep-scan](http://github.com/owasp-dep-scan/dep-scan), a free open-source dependency audit tool. However, there is a limited cli capability available with few features to test this tool directly.

### Option 1: Download pre-built database (Recommended)

To download a pre-built SQLite database ([refreshed](https://github.com/AppThreat/vdb/actions) every 6 hours) containing all application vulnerabilities (~ 700MB). This step is recommended for all users.

```shell
# pip install appthreat-vulnerability-db[all]
vdb --download-image
```

You can execute this command daily or when a fresh database is required.

To perform containers and OS scans, download the full image (~ 7.5GB) which includes all application and OS vulnerabilities.

```
vdb --download-full-image
```

Use any sqlite browser or cli tools to load and query the two databases.

**data.index.vdb6** - index db with purl prefix and vers

<img src="./docs/index-vdb6.png" alt="index" width="400">

**data.vdb6** - Contains source data in CVE 5.2 format stored as a jsonb blob.

<img src="./docs/vdb6.png" alt="database" width="400">

### Option 2: Download pre-built database (ORAS)

Using [ORAS cli](https://oras.land/) might be slightly faster.

```
export VDB_HOME=$HOME/vdb
oras pull ghcr.io/appthreat/vdbxz:v6.5.x -o $VDB_HOME
tar -xvf *.tar.xz
rm *.tar.xz
```

### Option 3: Use HuggingFace cli

Download one of the databases.

```shell
pip install -U "huggingface_hub[cli]"
```

app only database

```shell
export VDB_HOME=$(pwd)/app
huggingface-cli download AppThreat/vdb --include "app/*.vdb6" --repo-type dataset --local-dir .
```

app only 10 year database

```shell
export VDB_HOME=$(pwd)/app-10y
huggingface-cli download AppThreat/vdb --include "app-10y/*.vdb6" --repo-type dataset --local-dir .
```

app and os database

```shell
export VDB_HOME=$(pwd)/app-os
huggingface-cli download AppThreat/vdb --include "app-os/*.vdb6" --repo-type dataset --local-dir .
```

app and os 10 year database

```shell
export VDB_HOME=$(pwd)/app-os-10y
huggingface-cli download AppThreat/vdb --include "app-os-10y/*.vdb6" --repo-type dataset --local-dir .
```

#### Citation

Use the below citation in your research.

```text
@misc{vdb,
  author = {Team AppThreat},
  month = Feb,
  title = {{AppThreat vulnerability-db}},
  howpublished = {{https://huggingface.co/datasets/AppThreat/vdb}},
  year = {2025}
}
```

### Option 4: Manually create the vulnerability database (ADVANCED users)

Cache application vulnerabilities

```shell
vdb --cache
```

To remove any existing databases:

```shell
vdb --clean
```

The typical size of this database is over 700 MB.

Cache from just [OSV](https://osv.dev)

```shell
vdb --cache --only-osv
```

It is possible to customize the cache behavior by increasing the historic data period to cache by setting the following environment variables.

- NVD_START_YEAR - Default: 2018. Supports up to 2002
- GITHUB_PAGE_COUNT - Default: 2. Supports up to 20

Cache application and OS vulnerabilities

```shell
vdb --cache-os
```

Note the size of the database with OS vulnerabilities is over 7.5 GB. It is possible to ignore/exlude specific OS distros using environment variables.

Example to ignore almalinux and ubuntu data from getting included, set the below environment variables:

```shell
export VDB_IGNORE_ALMALINUX=true
export VDB_IGNORE_UBUNTU=true
```

Refer to the variable `LINUX_DISTRO_VULN_LIST_PATHS` in [config.py](./vdb/lib/config.py) for the full list of distro strings supported.

## Available Database Variations

VDB provides multiple pre-built databases optimized for different use cases, balancing data depth and file size. Both ORAS (ghcr.io) and HuggingFace datasets are updated every 6 hours.

**Note for AI Agents:** Use this table to decide which database URL to pass to the `download_image()` function based on the user's requirements.

| Database Scope | Time Context         | ORAS Image URL (`v6` or `v6.5.x`)    | HuggingFace Path | Recommended Use Case                                    |
| :------------- | :------------------- | :----------------------------------- | :--------------- | :------------------------------------------------------ |
| **App Only**   | **2 Years** (2024+)  | `ghcr.io/appthreat/vdbxz-app-2y:v6`  | `app-2y/`        | Fast, lightweight scans for very modern applications.   |
| **App Only**   | **Default** (2018+)  | `ghcr.io/appthreat/vdbxz-app:v6`     | `app/`           | **(Default)** Standard application dependency scanning. |
| **App Only**   | **10 Years** (2014+) | `ghcr.io/appthreat/vdbxz-app-10y:v6` | `app-10y/`       | Deep auditing of legacy application software.           |
| **App + OS**   | **Default** (2018+)  | `ghcr.io/appthreat/vdbxz:v6`         | `app-os/`        | Standard container and OS-level package scanning.       |
| **App + OS**   | **10 Years** (2014+) | `ghcr.io/appthreat/vdbxz-10y:v6`     | `app-os-10y/`    | Deep auditing of legacy Linux containers/VMs.           |

_(Note: The ORAS URLs above use `.tar.xz` compression. You can replace `vdbxz` with `vdbzst` in the URL if you prefer Zstandard compression)._

---

## Custom Vulnerability Data

VDB supports loading custom vulnerability data from a local directory at runtime. This allows you to:

1.  **Add Private Vulnerabilities:** Include internal CVEs that are not public.
2.  **Override False Positives:** Correct data returned by the official database by marking specific versions as `unaffected`.

Custom data must follow the **CVE 5.2 JSON Schema**. Supported file extensions are `.json`, `.yaml`, `.yml`, and `.toml`.

To use custom data, pass the directory path to the `--custom-data` argument.

```shell
vdb --search pkg:npm/my-lib@1.0.0 --custom-data /path/to/custom/vulns
```

### Example 1: Adding a Private Vulnerability

Create a file `private-vuln.yaml`. Since you are defining a new vulnerability record, you use the `cna` container.

```yaml
dataType: CVE_RECORD
dataVersion: "5.2"
cveMetadata:
  cveId: PRIVATE-2025-001
  assignerOrgId: 00000000-0000-4000-8000-000000000000
  state: PUBLISHED
  datePublished: "2025-01-01T00:00:00Z"
  dateUpdated: "2025-01-01T00:00:00Z"
containers:
  cna:
    providerMetadata:
      orgId: 00000000-0000-4000-8000-000000000000
    descriptions:
      - lang: en
        value: "Private vulnerability in internal library"
    affected:
      - vendor: internal
        product: my-lib
        packageName: my-lib
        packageURL: pkg:npm/my-lib
        versions:
          - version: "1.0.0"
            status: affected
            versionType: semver
            lessThan: "2.0.0"
```

### Example 2: Overriding a False Positive

If the official database reports `CVE-2023-9999` for `pkg:pypi/requests` but you have determined it is a false positive for your specific version, you can override it using an **ADP (Authorized Data Publisher)** container. This is the recommended way to append or dispute existing vulnerability data.

**Logic:** If a CVE ID and Package URL combination exists in your custom data, VDB will **ignore** the entry from the official database and use yours instead.

Create `override.yaml`:

```yaml
dataType: CVE_RECORD
dataVersion: "5.2"
cveMetadata:
  cveId: CVE-2023-9999
  assignerOrgId: 00000000-0000-4000-8000-000000000000
  state: PUBLISHED
containers:
  # Use 'adp' to append/modify existing vulnerability data
  adp:
    - providerMetadata:
        orgId: 00000000-0000-4000-8000-000000000000
        shortName: "MySecTeam"
      descriptions:
        - lang: en
          value: "Override to mark specific version as unaffected"
      affected:
        - product: requests
          packageName: requests
          packageURL: pkg:pypi/requests
          versions:
            # Explicitly mark your version as unaffected
            - version: "2.31.0"
              status: unaffected
              versionType: semver
```

## CLI Usage

```shell
usage: vdb [-h] [--clean] [--cache] [--cache-os] [--only-osv] [--only-aqua] [--only-ghsa] [--search SEARCH] [--list-malware] [--bom BOM_FILE] [--download-image] [--download-full-image]
           [--print-vdb-metadata] [--custom-data CUSTOM_DATA]
AppThreat's vulnerability database and package search library with a sqlite storage.

options:
  -h, --help            show this help message and exit
  --clean               Clear the vulnerability database cache from platform specific user_data_dir.
  --cache               Cache vulnerability information in platform specific user_data_dir.
  --cache-os            Cache OS vulnerability information in platform specific user_data_dir.
  --only-osv            Use only OSV as the source. Use with --cache.
  --only-aqua           Use only Aqua vuln-list as the source. Use with --cache.
  --only-ghsa           Use only recent ghsa as the source. Use with --cache.
  --search SEARCH       Search for the package or vulnerability ID (CVE, GHSA, ALSA, DSA, etc.) in the database. Use purl, cpe, or git http url.
  --list-malware        List latest malwares with CVE ID beginning with MAL-.
  --bom BOM_FILE        Search for packages in the CycloneDX BOM file.
  --download-image      Downloaded pre-created vdb image to platform specific user_data_dir. Application vulnerabilities only.
  --download-full-image
                        Downloaded pre-created vdb image to platform specific user_data_dir. All vulnerabilities including OS.
  --print-vdb-metadata  Display metadata about the current vdb in user_data_dir.
  --custom-data CUSTOM_DATA
                        Path to directory containing custom vulnerability data (JSON/YAML/TOML) to override/augment results.
```

### CLI search

It is possible to perform a range of searches using the cli.

```shell
vdb --search pkg:pypi/xml2dict@0.2.2

# Search based on a purl prefix
vdb --search pkg:pypi/xml2dict

# Full url and short form for swift
vdb --search "pkg:swift/github.com/vapor/vapor@4.39.0"

vdb --search "pkg:swift/vapor/vapor@4.89.0"

# Search by cpe
vdb --search "cpe:2.3:a:npm:gitblame:*:*:*:*:*:*:*:*"

# Search by colon separated values
vdb --search "npm:gitblame:0.0.1"

# Search by vulnerability id (CVE, GHSA, ALSA, DSA, etc.)
vdb --search CVE-2024-25169

# Search with wildcard for CVE
vdb --search CVE-2025-%

# Search by git url
vdb --search "https://github.com/electron/electron"

# Search by CycloneDX SBOM
vdb --bom bom.json
```

### List recent malware

To list malware entries with the `MAL-` prefix, use the following command.

```shell
vdb --list-malware
```

## For Integrators

The `vulnerability-db` is designed to be embedded into other Python applications.

### Basic Integration

```python
### AI/Agent Integration Script
# AI Agents: Use this script pattern to query vdb and get structured JSON output.

import json
from vdb.lib import config, db6 as db_lib, search
from vdb.lib.orasclient import download_image

# 1. Download the database if missing or stale (older than 1 day)
if db_lib.needs_update(days=1):
    download_image(config.VDB_DATABASE_URL, config.DATA_DIR)
    
# 2. Load Custom Data (Optional)
# This will override DB results with local YAML/JSON definitions
# search.load_custom_data("/path/to/custom_vulns")    

# 3. Perform the search
target = "pkg:pypi/xml2dict@0.2.2"
results = search.search_by_any(target, with_data=True)

# 4. Extract and parse the Pydantic CVE 5.2 models into standard JSON
output = []
for res in results:
    vuln = {
        "cve_id": res['cve_id'],
        "fixed_in": res['fix_version'],
    }
    # res['source_data'] is a Pydantic model. Use model_dump to serialize.
    if res.get('source_data'):
        vuln['cve_data'] = res['source_data'].model_dump(mode='json')
    output.append(vuln)

# Print standard JSON for the agent to read via stdout
print(json.dumps(output, indent=2))
```

### Advanced Usage

**Batching and Generators**
When processing large SBOMs, `search_by_cdx_bom` yields a generator to reduce memory usage.

```python
results_generator = search.search_by_cdx_bom("bom.json", with_data=True)
for result_batch in results_generator:
    for res in result_batch:
        # Process individual vulnerability result
        pass
```

**Custom Database Locations**
If you are managing the database files manually or in a custom location, ensure `config.DATA_DIR` is set via environment variable `VDB_HOME` before importing the library, or update the `vdb.lib.config` paths dynamically.

**Result Structure**
The results returned by search functions are dictionaries containing:

- `cve_id`: The vulnerability identifier.
- `source_data`: A Pydantic model (`vdb.lib.cve_model.CVE`) of the CVE 5.2 data.
- `vers`: The version range string from the index.
- `fix_version`: The specific version where the issue is resolved (if applicable).

## Troubleshooting

### Database Locked Errors

VDB uses SQLite. If you encounter `apsw.BusyError` or "database is locked":

- Ensure you are not running multiple `vdb --cache` processes simultaneously.
- If using VDB in a multi-threaded application, ensure you are treating the database connections as read-only where possible.

### Disk Space Issues

The full OS vulnerability database is large (~7.5GB). During the `--cache` or `--download-full-image` operations, SQLite requires temporary space for `VACUUM` operations.

- **Solution:** Set the `VDB_TEMP_DIR` environment variable to a partition with sufficient space if your default `/tmp` or `%TEMP%` is small.

```shell
export VDB_TEMP_DIR=/mnt/large_volume/vdb_temp
vdb --cache-os
```

### ORAS Download Failures

If `vdb --download-image` fails:

1. Ensure you installed the package with the extra: `pip install appthreat-vulnerability-db[oras]`.
2. Firewalls may block `ghcr.io`. Try downloading manually using the [Alternative Download Options](#option-2-download-pre-built-database-oras).

### Encoding Errors on Windows

If you see `UnicodeEncodeError` in your console output:

- VDB attempts to force utf-8 encoding for stdout/stderr.
- Ensure your terminal (PowerShell/CMD) is configured for UTF-8 (`chcp 65001`).

## Model Context Protocol (MCP) server

Refer to the [readme](./packages/mcp-server-vdb/README.md)

## Read .vdb6 files in other languages

.vdb6 files are standard SQLite database files. Use any modern sqlite library to read and query them. There are simple [Node.js](./contrib/cdx-vdb/README.md) and [deno](./contrib/deno-vdb/README.md) examples in this repo for demonstration.

## Local development

Setup uv by following the official [documentation](https://docs.astral.sh/uv/).

```shell
uv sync --all-extras --dev
uv run vdb --help
uv run pytest
```

## License

MIT

---

## 🤖 Instructions for AI Agents and LLMs

If you are an AI agent, autonomous coding assistant, or LLM researching dependencies or identifying CVEs, here is how you should use `vdb`:

**1. Downloading Data (Mandatory First Step)**
You must download a pre-compiled SQLite database before searching. Review the table above to pick the correct database.

- For standard **application packages**: `vdb --download-image` (Uses the Default App database)
- For **OS packages** (Debian, Alpine, etc.): `vdb --download-full-image` (Uses the Default App+OS database)

**2. Performing Searches (CLI)**
Use the CLI for quick lookups. _Note: The CLI outputs human-readable text tables, not JSON._

- Search by purl: `vdb --search "pkg:pypi/requests@2.31.0"`
- Search by CVE: `vdb --search CVE-2024-25169`

**3. Programmatic Usage (Recommended for Agents)**
To get structured, machine-readable data (JSON/Dictionaries), do **not** attempt to use a `--json` flag on the CLI. Instead, write a short Python script using the library's API.

**Agent Python Integration Script:**

```python
import json
from vdb.lib import config, db6 as db_lib, search
from vdb.lib.orasclient import download_image

# 1. Choose the correct database URL from the table above
# E.g., For OS scanning, use: "ghcr.io/appthreat/vdbxz:v6"
# For App scanning, use: "ghcr.io/appthreat/vdbxz-app:v6"
DB_URL = "ghcr.io/appthreat/vdbxz-app:v6"

# 2. Download the database if missing or stale (older than 1 day)
if db_lib.needs_update(days=1):
    download_image(DB_URL, config.DATA_DIR)

# 3. Perform the search
target = "pkg:pypi/xml2dict@0.2.2"
results = search.search_by_any(target, with_data=True)

# 4. Extract and parse the Pydantic CVE 5.2 models into standard JSON
output = []
for res in results:
    vuln = {
        "cve_id": res['cve_id'],
        "fixed_in": res['fix_version'],
    }
    # res['source_data'] is a Pydantic model. Use model_dump to serialize.
    if res.get('source_data'):
        vuln['cve_data'] = res['source_data'].model_dump(mode='json')
    output.append(vuln)

# Print standard JSON for the agent to read via stdout
print(json.dumps(output, indent=2))
```

### 💡 Pro-Tip for AI Agents: Direct SQLite Querying

The `.vdb6` files downloaded to your `user_data_dir` are standard SQLite databases. If you are an AI agent needing to perform complex aggregations, bulk exports, or custom filtering, you can query the database directly using `sqlite3`:

```bash
# Example: Find all entries for a specific purl prefix directly in the index
sqlite3 ~/.local/share/vdb/data.index.vdb6 "SELECT * FROM cve_index WHERE purl LIKE 'pkg:npm/react%';"
```
