Metadata-Version: 2.4
Name: hashscanner
Version: 0.1.0
Summary: Python client for the HashScanner API — query 1.5B+ NIST NSRL known file hashes by API.
Project-URL: Homepage, https://www.hashscanner.com
Project-URL: Documentation, https://www.hashscanner.com/api
Project-URL: Source, https://github.com/hashscanner/hashscanner-python
Project-URL: Issue Tracker, https://github.com/hashscanner/hashscanner-python/issues
Project-URL: Sign up (free API key), https://www.hashscanner.com/register
Author: HashScanner
License: MIT
License-File: LICENSE
Keywords: dfir,forensics,hash,hashscanner,lookup,md5,nist,nsrl,rds,sha1,sha256
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Requires-Dist: requests>=2.25
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == 'dev'
Description-Content-Type: text/markdown

# HashScanner Python Client

Python client for the **[HashScanner](https://www.hashscanner.com) API** — query **1.5 billion+ NIST NSRL known file hashes** (MD5 / SHA-1 / SHA-256) by API, single or in bulk.

HashScanner puts the NIST National Software Reference Library online so you can filter the **known** out of your data and focus on the unknown — without downloading and maintaining the ~700 GB RDS yourself.

> 🔑 **You need a free API key.** Create an account at **<https://www.hashscanner.com/register>** — every plan (including the free tier) includes API access. Your key is in your dashboard.

## Install

```bash
pip install hashscanner
```

## Quick start

```python
from hashscanner import Client

hs = Client("hs_xxxx_sk_xxxx")          # or set HASHSCANNER_API_KEY

result = hs.lookup("d41d8cd98f00b204e9800998ecf8427e")
if result.found:
    print(result.type, result.file_name, result.product, "via", result.source)
else:
    print("not in NSRL — worth a closer look")
```

A match means the file is **known** (cataloged in NSRL) — not that it is safe, clean, or
malicious. Use it to set aside files you already recognise.

### Bulk lookups (async)

For large sets — up to 100,000 hashes per job — submit a bulk job. The client handles the
submit → poll → download flow for you:

```python
hashes = ["d41d8cd9...", "da39a3ee...", ...]

# JSON: returns a list of result dicts
for record in hs.bulk(hashes):
    print(record["hash"], record["found"])

# CSV: returns the raw CSV text
csv_text = hs.bulk(hashes, format="csv")
```

Prefer to drive the steps yourself?

```python
job = hs.submit_bulk(hashes, format="json")   # -> BulkJob (queued)
job = hs.wait(job, poll_interval=3)           # poll until completed/failed
for record in hs.iter_results(job):           # stream NDJSON results
    ...
```

### A few small lookups concurrently

```python
results = hs.lookup_many(["<hash1>", "<hash2>", "<hash3>"])
```

## Command line

The package installs a `hashscanner` command:

```bash
export HASHSCANNER_API_KEY="hs_xxxx_sk_xxxx"

# single lookup
hashscanner lookup d41d8cd98f00b204e9800998ecf8427e
hashscanner lookup d41d8cd9... --json

# bulk: one hash per line (use '-' for stdin), JSON (NDJSON) or CSV
hashscanner bulk hashes.txt
hashscanner bulk hashes.txt --format csv -o results.csv
cat hashes.txt | hashscanner bulk -
```

## Errors

All errors derive from `hashscanner.HashScannerError`:

| Exception | When |
|---|---|
| `AuthenticationError` | 401 — key missing/invalid |
| `SubscriptionInactiveError` | 403 — renew/upgrade |
| `RateLimitError` | 429 — per-minute rate limit or monthly quota (`.retry_after`, `.reset`) |
| `BadRequestError` | 400 — bad hash / job too large |
| `NotFoundError` | 404 — unknown/expired bulk job |
| `JobFailedError` | bulk job finished `failed` |
| `APIError` | other non-2xx |

A single-lookup miss is **not** an error — it returns `LookupResult(found=False)`.

## Links

- API documentation: <https://www.hashscanner.com/api>
- Pricing & limits: <https://www.hashscanner.com/pricing>
- Sign up (free): <https://www.hashscanner.com/register>

## License

MIT
