Metadata-Version: 2.4
Name: iocflow
Version: 0.1.0
Summary: Extract threat indicators (IOCs) from unstructured text — IPs, domains, URLs, hashes, CVEs, MITRE techniques, threat actors, and malware families. Layer 1 of an IOC-lifecycle toolkit.
Project-URL: Homepage, https://github.com/vinayvobbili/iocflow
Project-URL: Repository, https://github.com/vinayvobbili/iocflow
Project-URL: Issues, https://github.com/vinayvobbili/iocflow/issues
Author-email: Vinay Vobbilichetty <vinayvobbilichetty11@gmail.com>
License: MIT License
        
        Copyright (c) 2026 Vinay Vobbilichetty
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: cve,cybersecurity,dfir,extraction,indicators-of-compromise,ioc,malware,mitre-attack,threat-actor,threat-intelligence
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Requires-Dist: tldextract>=3.4
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: pytest>=7; extra == 'dev'
Requires-Dist: requests>=2.25; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Provides-Extra: mitre
Requires-Dist: requests>=2.25; extra == 'mitre'
Description-Content-Type: text/markdown

# iocflow

[![CI](https://github.com/vinayvobbili/iocflow/actions/workflows/ci.yml/badge.svg)](https://github.com/vinayvobbili/iocflow/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/iocflow)](https://pypi.org/project/iocflow/)
[![Python](https://img.shields.io/pypi/pyversions/iocflow)](https://pypi.org/project/iocflow/)
[![License](https://img.shields.io/pypi/l/iocflow)](https://github.com/vinayvobbili/iocflow/blob/main/LICENSE)

Pull **indicators of compromise** out of unstructured text — threat-intel
reports, advisories, emails, tickets — in one call. iocflow extracts IPs,
domains, URLs, filenames, file hashes, CVEs, MITRE ATT&CK technique IDs, threat
actors, and malware families, with the false-positive defenses you'd otherwise
write by hand: a Public Suffix List domain validator, benign-domain/IP
allowlists, hash de-duplication across MD5/SHA1/SHA256, and re-fanging of
defanged IOCs.

```python
from iocflow import extract

text = """
APT28 (a.k.a. Fancy Bear) staged Cobalt Strike from evil-domain[.]ru and
185.220.101.5, dropping install.ps1 (MD5 a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4).
Exploited CVE-2021-44228 via T1190. Contact: ops@evil-domain[.]ru.
"""

entities = extract(text)
print(entities.summary())
# 1 IPs, 1 domains, 1 filenames, 1 hashes, 1 CVEs, 1 emails, 1 threat actors, 1 MITRE techniques

for ind in entities.iter_indicators():
    print(ind.kind, ind.value)
# ip 185.220.101.5
# domain evil-domain.ru
# ...
```

The defanged `evil-domain[.]ru` and `ops@evil-domain[.]ru` are re-fanged
automatically; `185.220.101.5` is kept while private/benign IPs are dropped.

## Install

```bash
pip install iocflow              # core — one dependency (tldextract)
pip install "iocflow[mitre]"     # + a ready-made MITRE ATT&CK malware-name source
```

## What it extracts

`extract(text)` returns an `ExtractedEntities` with:

- `ips` — public IPv4, excluding private ranges, benign IPs, and version-number-like values
- `domains` — validated against the Mozilla Public Suffix List via `tldextract`
- `urls` — both `https://…` and bare `host/path` forms (so package-registry paths survive)
- `filenames` — suspicious script/executable/macro/archive filenames
- `hashes` — `{"md5": [...], "sha1": [...], "sha256": [...]}`, de-duplicated across lengths
- `cves` — `CVE-YYYY-NNNN+`, normalized to uppercase
- `emails`
- `mitre_techniques` — `T1059`, `T1059.001`, …
- `threat_actors` (+ `threat_actors_enriched`) — APT/UNC/FIN/TA/DEV/STORM designators,
  a curated well-known list, and the `"<Name> ransomware"` pattern
- `malware_families` — populated when you supply a malware-name source (see below)

Each individual extractor is also importable and composable:

```python
from iocflow import extract_ips, extract_hashes, refang_text
extract_ips(refang_text("c2 at 185[.]220[.]101[.]5"))   # ['185.220.101.5']
```

## Pluggable name sources

The core has **no external-data dependency**. Two enrichment sources are
optional and supplied by you, so iocflow drops cleanly into any environment —
plug in your own feeds, or use the bundled MITRE extra.

**Malware families.** Give `extract` a `MalwareNames` and it matches families
(with alias-to-canonical normalization) behind a three-layer false-positive
defense. Build one from your own list, from MITRE-shaped records, or from the
optional extra:

```python
from iocflow import extract, MalwareNames

# Your own list:
names = MalwareNames.from_names(["Cobalt Strike", "Emotet", "Qakbot"])
entities = extract(report_text, malware_names=names)

# Or the bundled MITRE ATT&CK source (needs: pip install "iocflow[mitre]"):
from iocflow.mitre import mitre_malware_names
entities = extract(report_text, malware_names=mitre_malware_names())
```

**Threat-actor aliases.** Give `extract` an `ActorAliases` to match a custom
name set and enrich actors with `common_name` / `region` / `all_names`. Without
it, actors are still found by pattern and curated list:

```python
from iocflow import extract, ActorAliases

aliases = ActorAliases.from_index({
    "apt28": {"common_name": "APT28", "region": "Russia",
              "all_names": ["Fancy Bear", "Sofacy", "Sednit"]},
})
entities = extract(report_text, actor_aliases=aliases)
entities.threat_actors_enriched[0].region        # "Russia"
entities.threat_actors_enriched[0].aliases_display()  # "Fancy Bear, Sofacy, Sednit"
```

## Command line

```bash
iocflow "APT28 used 185.220.101.5 and evil[.]example[.]com"
echo "report text…" | iocflow --json
iocflow --mitre "Emotet dropped Cobalt Strike"     # needs iocflow[mitre]
```

## Where this is going

iocflow is **Layer 1** of an IOC-lifecycle toolkit. The plan is to grow it in
independently-useful layers, each behind its own pip extra: enrichment
(VirusTotal, Recorded Future, AbuseIPDB, Shodan, abuse.ch), AI commentary,
suggested hunts, and optional perimeter blocking — each configured by plugging
in your own API keys. `ExtractedEntities` (and its `iter_indicators()` view) is
the stable hand-off type those layers consume.

## License

MIT
