Metadata-Version: 2.4
Name: aisoc
Version: 0.1.0
Summary: An AI SOC kernel: an event-sourced, multi-agent SOC where a role-based team of LLM agents works a case over a shared event bus — with case memory, human-in-the-loop gating, and a replayable audit log. Bring your own LLM, alert source, and tools.
Project-URL: Homepage, https://github.com/vinayvobbili/aisoc
Project-URL: Repository, https://github.com/vinayvobbili/aisoc
Project-URL: Issues, https://github.com/vinayvobbili/aisoc/issues
Author-email: Vinay Vobbilichetty <vinayvobbilichetty11@gmail.com>
License: MIT License
        
        Copyright (c) 2026 Vinay Vobbilichetty
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: ai-soc,case-memory,dfir,event-sourcing,human-in-the-loop,incident-response,langgraph,llm-agents,multi-agent,security-operations,soar,soc,threat-intel
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: pydantic>=2
Provides-Extra: agent
Requires-Dist: langchain-core>=0.3; extra == 'agent'
Requires-Dist: langgraph>=0.2; extra == 'agent'
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: langchain-core>=0.3; extra == 'dev'
Requires-Dist: langgraph>=0.2; extra == 'dev'
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pydantic>=2; extra == 'dev'
Requires-Dist: pytest>=7; extra == 'dev'
Requires-Dist: redis>=4.5; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Provides-Extra: redis
Requires-Dist: redis>=4.5; extra == 'redis'
Description-Content-Type: text/markdown

# aisoc 🛡️🤖

**An AI SOC kernel** — a Security Operations Center modeled as a *team*, not a
classifier. A set of role-based LLM agents (triage, Tier 2, IR Lead, Threat
Intel, Threat Hunter, Detection Engineer, and a SOC Manager over the top) work a
case together over a shared, **event-sourced** bus, with a **human in the loop**
on every consequential action — and a **case memory** so the team gets better
over a campaign instead of re-deriving the same conclusion ticket by ticket.

> A single triage prompt can *label* an alert. It can't *run a case* — pull the
> context, weigh it against what the org has seen before, propose a containment,
> and leave a cited record of why. aisoc is the kernel for the second thing.

aisoc owns the **kernel**: the event contract, the bus, the case-memory read
models, and the agent framework. It owns **none of your environment**. You
inject three seams — an LLM, an alert source, and a tool registry — so the same
agent code runs in a unit test, a zero-infra demo, or production against your
real SOAR/EDR/SIEM.

---

## Why event-sourced?

Agents don't call each other directly. Each one **consumes** events off a shared
bus and **publishes** its own. That single decision buys a lot:

- **Auditable end to end.** Every verdict, escalation, and human decision is an
  event on the log — nothing is implicit.
- **Replayable by construction.** Reconstruct any case from the log, or replay a
  recorded log through the agents offline to backtest a prompt change.
- **Composable.** Add, pause, or swap a role without rewiring the others.

---

## Install

```bash
pip install aisoc                 # core: event contract + in-memory bus
pip install "aisoc[redis]"        # durable, multi-process bus (Redis Streams)
pip install "aisoc[agent]"        # the LangGraph role agents (roadmap)
```

Requires Python 3.10+.

## Quick start — zero infrastructure

The in-memory bus needs no Redis, no daemon, nothing. It's the default for the
demo, the test suite, and the offline backtest.

```python
from aisoc import InMemoryBus, AlertTriaged, STREAM_TRIAGE, parse_event

bus = InMemoryBus()

bus.publish(STREAM_TRIAGE, AlertTriaged(
    correlation_id="TICKET-42", produced_by="triage", ticket_id="TICKET-42",
    verdict="true_positive_malicious", confidence=0.93,
    summary="beaconing to known-bad C2", priority_score=8,
))

# Read the audit log back — every event is mirrored there.
for raw in bus.replay():
    event = parse_event(raw)              # typed model, dispatched on event_type
    print(event.event_type, event.ticket_id, event.verdict)
```

Consumer-group delivery (one event → one consumer per group, held pending until
acked, redelivered on failure) works the same on the in-memory bus as on Redis:

```python
batch = bus.consume_batch([STREAM_TRIAGE], group="tier2", consumer="worker-1")
for stream, msg_id, event in batch:
    ...                                   # do the work
    bus.ack(stream, "tier2", msg_id)      # ack so it isn't redelivered
```

## The three seams

aisoc ships no integrations of its own — you inject them (see `aisoc.seams`).
They're `runtime_checkable` Protocols, so duck-typing is enough; you never
subclass anything.

- **`ChatModel`** — the LLM the agents reason with. Any object with an
  `invoke()` satisfies it; a LangChain `BaseChatModel` drops in directly, so you
  can point it at OpenAI, a local model, or a failover wrapper.
- **`AlertSource`** — where cases come from. Implement `poll()` to pull from
  your SOAR/SIEM/ticketing system and yield normalized `AlertReceived` events.
- **`ToolProvider`** — the tools a role may call. `tools_for(role)` returns the
  callables that role is allowed to use; aisoc fans them out per role and never
  inspects them.

Swap a real model for a stub, a live SOAR for a fixture — the same agent code
runs in a test, a demo, or production.

## The event contract

Every event subclasses `BusEvent` and is dispatched on a `Literal` `event_type`:

| Event | Emitted by | Meaning |
|---|---|---|
| `AlertReceived` | ingestion | a new ticket landed |
| `AlertTriaged` | triage | first-pass verdict + confidence |
| `CaseEscalated` | any role | handoff to a higher tier |
| `Tier2Analysis` | Tier 2 | refined verdict + escalation decision |
| `IRPlan` | IR Lead | written containment/eradication/recovery plan |
| `ActionProposed` | IR Lead | a real-system action awaiting human approval |
| `ActionDecision` | human | approve/reject (execution stays your integration) |
| `ThreatIntelReport` | Threat Intel | actor attribution + technique mapping |
| `DetectionTuningReport` | Detection Eng | rule-tuning opportunities over a window |
| `HuntingReport` | Threat Hunter | proactive findings (advisory) |
| `CampaignDetected` | Campaign Detector | cross-incident cluster (advisory) |
| `ShiftSummary` | SOC Manager | windowed readout of the shift |

Actions are **proposed, never auto-executed** — anything with real-world impact
(containing a host, blocking an indicator) is recorded as an `ActionProposed`
and gated on a human `ActionDecision`, captured with who approved it, when, and
why.

## Case memory

Memory isn't a bolted-on vector store — it's a **read projection over the event
log** the SOC is already writing. Index the audit stream, then query it. The bus
is injected, so the whole layer runs offline on the in-memory bus:

```python
from aisoc import InMemoryBus, case_memory

bus = InMemoryBus()
# ... agents publish events onto `bus` as they work cases ...

case_memory.backfill(bus=bus)                       # fold the audit log into a case index

case_memory.recall_for_ticket("4187", bus=bus)      # prior cases sharing strong indicators
case_memory.get_case_reasoning("4187", bus=bus)     # the recorded reasoning trace (for interrogation)
case_memory.find_campaign_clusters(window_days=14)  # cross-incident campaigns
case_memory.compute_trends(window_days=30)          # per-role cost / latency / accuracy-vs-ground-truth
```

Recall is **retrieve mechanically, judge semantically** — it surfaces precedent
by shared hard indicators (actor, hash, domain, IP, CVE), and the relevance call
stays with the model. Interrogation is **deterministic recall + grounded
narration**: `get_case_reasoning` returns what actually happened from the record,
never a re-derived rationale. Precedent injection into agent prompts is
flag-gated (`AISOC_CASE_RECALL=1`) so you can A/B whether it actually helps.

## The agents

A role agent is "which events do I care about, and what do I publish in
response." The base (`aisoc.agents.Agent`, in the `agent` extra) owns the rest:
the consumer-group run loop, a generic tool-call loop (bind the role's tools,
invoke, run the calls it asks for, feed results back, repeat until it answers or
the per-event budget runs out), and graceful shutdown. Everything is injected —
the bus, the chat model, the tool provider — so the same class runs in a unit
test against a stub, in the demo, or in production:

```python
from aisoc import InMemoryBus
from aisoc.agents import TriageAgent

agent = TriageAgent(bus=bus, model=my_chat_model, tools=my_tool_provider)
agent.run()                 # consume soc.alerts, publish soc.triage verdicts
```

There are two shapes of role. **Per-ticket roles** subclass `Agent` and run a
consume loop — the case moves down the chain as each publishes the next event:

- `TriageAgent` — alert → first-pass verdict (`AlertReceived` → `AlertTriaged`)
- `Tier2Agent` — deeper look, confirm/refine/escalate (`AlertTriaged` → `Tier2Analysis` / `CaseEscalated`)
- `IRLeadAgent` — containment plan + a human-gated `ActionProposed` (`CaseEscalated` → `IRPlan` + `ActionProposed`)
- `ThreatIntelAgent` — actor attribution + technique mapping (`IRPlan` → `ThreatIntelReport`)

**Windowed roles** are scheduled `run_once(*, bus, model=...)` functions that
replay the audit log over a window and publish a report — call them on a timer:

```python
from aisoc.agents import detection_eng, soc_manager, threat_hunter, campaign_detector

soc_manager.run_once(bus=bus, model=model, window_hours=8)        # ShiftSummary
detection_eng.run_once(bus=bus, model=model, window_hours=24)     # DetectionTuningReport
threat_hunter.run_once(bus=bus, model=model, window_hours=24)     # HuntingReport
campaign_detector.run_once(bus=bus, model=model, window_days=14)  # CampaignDetected
```

Every verdict is recorded for the case-memory trend rollup, and any real-world
action (containment, a block) is published as an `ActionProposed` gated on a
human `ActionDecision` — never auto-executed.

## Backtest a change before it ships

Because every case is just events on a replayable log, you can run a recorded
alert log back through the *current* agents offline and measure how often they
get it right — no Redis, no live SOAR, and a stub model instead of a real LLM:

```python
from aisoc.backtest import load_alerts, run_backtest

alerts = load_alerts("recorded_alerts.jsonl")
result = run_backtest(alerts, model=my_model, tools=my_tools)

result.accuracy(ground_truth, role="triage")   # fraction matching known truth
result.verdict_for("5004", "triage")           # what a given case resolved to
```

Swap in a real model and rerun to see whether a prompt or model change moves
accuracy without regressing the rest. `examples/backtest.py` is a complete,
runnable version with a planted miss for the harness to catch.

## Try it — no infrastructure

```bash
pip install "aisoc[agent]"
python examples/demo.py        # the whole SOC works four cases on an in-memory bus
python examples/backtest.py    # replay a recorded log and score it against truth
```

Both run with a deterministic stub model — no API key, no Redis, no real LLM.
See [`examples/`](examples/).

## Roadmap

aisoc is being extracted from a production multi-agent SOC into a reusable,
vendor-neutral kernel. Landing in slices:

- ✅ **Kernel seams** — the three injection Protocols.
- ✅ **Bus + event contract** — in-memory + Redis Streams, replayable audit log.
- ✅ **Case memory** — event-log read models: recall similar prior cases as
  precedent, score per-role accuracy against ground truth, reconstruct a cited
  reasoning trace for any ticket, and cluster cross-incident campaigns. Runs
  offline on the in-memory bus.
- ✅ **Agent framework + the role team** — the `Agent` base over the three
  seams (consumer-group run loop, a generic bind-tools-and-iterate tool-call
  loop, per-event budgets) plus the full team on top: triage, Tier 2, IR Lead,
  and Threat Intel as per-ticket agents, and Detection Engineer, SOC Manager,
  Threat Hunter, and Campaign Detector as windowed `run_once` roles.
- ✅ **Backtest harness + runnable demo** — `aisoc.backtest.run_backtest`
  replays a recorded alert log through the agents offline and scores their
  verdicts against ground truth; `Agent.drain()` runs a role to completion
  without a blocking loop. Two no-infrastructure example scripts (`examples/`)
  show the whole SOC and the backtest on the in-memory bus with a stub model.

The design story behind case memory is written up here:
[Giving an AI SOC a Memory](https://vinayvobbili.github.io/posts/soc-in-a-box-case-memory/).

## License

MIT © Vinay Vobbilichetty
