Use to navigate · N speaker notes · F fullscreen · G grid
A working system · open-source · in production

AI agents that run
integrated M&A due diligence
across an entire data room.

Nine specialist domains read every contract in parallel, cross-reference their findings, and trace every conclusion to an exact page, clause, and quote — producing the cross-domain picture that normally takes weeks to assemble from siloed advisor reports.

LegalFinanceCommercial Product / TechCybersecurityHR TaxRegulatoryESG

Built by a corp-dev practitioner, to solve a real problem
Open-source · Apache-2.0 · pip install dd-agents
For: corp dev · PE · advisors · legal teams
Anyone running due diligence across hundreds of contracts
The 60-second version

What this is, before we go deep.

If you remember three things from this conversation:

1

It connects the dots advisors don't

Legal, Finance, Commercial, Tech all flag the same counterparty separately. This automatically links a change-of-control clause (Legal) to the revenue concentrated under it (Finance) to the key-person who owns it (HR) — one compound risk, not four disconnected memos.

2

Every finding is provable

No "the AI thinks." Each finding carries the source file, page, section, and a verbatim quote — verified against the actual document. Unverifiable claims are auto-downgraded. It's built to survive an IC and a regulator, not to impress a demo.

3

It accelerates your team — doesn't replace advisors

First-read across hundreds of contracts in hours instead of weeks; standardized across every deal you screen. Your counsel and bankers still sign the conclusions — they just start from a cross-referenced map instead of a blank page.

Output

One run produces an interactive HTML report (Go/No-Go verdict → action items → domain detail → full evidence), a 14-sheet Excel workbook for downstream modeling, and per-finding JSON with citations. Open it offline in a browser. Filter to "P0 only" and share the URL with the deal team.

Why this exists

I built it to solve my own problem.

As a corp-dev lead, I'd spend weeks assembling the cross-domain picture from siloed advisor reports. Legal, financial, and commercial teams all flagged the same target independently — with nobody connecting the dots.

A termination clause in one contract and a revenue-concentration risk in the same counterparty would sit in two separate workstreams — if they were flagged at all.

The realization

The analysis wasn't the hard part. The integration was. Nine workstreams, hundreds of documents, no shared index, no cross-reference, and a clock that keeps compressing. That's a systems problem — and systems problems have engineering answers.

What a real run looks like
  • 200+ counterparties across a single target's contract base
  • 400–500 documents — PDFs, Word, Excel, scans, images
  • 9 domains × every document, in parallel
  • Thousands of raw findings → filtered to what's material
  • Every survivor traced to a page and a quote

This isn't a prototype. It's an open-source package (pip install dd-agents), 60k+ lines, 3,000+ tests, strict-typed, Apache-2.0 — running on real deal data rooms.

The industry problem you already live

DD runs in silos. The clock keeps shrinking. The cost is sunk either way.

31%
of M&A failures trace back to due-diligence shortcomings (HBR / McKinsey / KPMG)
6→3 wks
DD timelines compressing — same scope, half the time
1–3%
corp-dev screen-to-close rate; 200–1,000+ screened/yr, DD cost sunk on every deal that dies
74→95%
AI contract-analysis accuracy with clause-aware prompting (Addleshaw Goddard, 510 contracts)
Demand is already here

86% of M&A organizations have integrated GenAI into deal workflows (Deloitte 2025). Roughly half of practitioners now use AI in DD specifically (Bain). ~$4.9T in 2025 deal activity, AI cited as rationale in a third of the 100 largest deals (PwC).

But the tools are fragmented

Existing tools are single-domain and legal-centric (Luminance, Kira/Litera, Harvey), single-function (VDRs, CLM), or a build-it-yourself GenAI framework. None do multi-domain forensic DD with adversarial cross-validation across all nine workstreams. That gap is what this fills.

What gets analyzed

Nine specialist lenses — each a domain expert, not a generalist.

Every document is read by all nine in parallel. Each carries deep, deal-aware domain logic (e.g. Legal disaggregates change-of-control into five distinct subtypes). Enable/disable any of them per deal in config.

⚖️ Legal

Change-of-control (5 subtypes), anti-assignment, termination rights, IP ownership & freedom-to-operate, indemnities, liability caps, governance graph

💰 Finance

Revenue cross-referencing (flags >5% ARR mismatch), revenue decomposition, unit economics (CAC/LTV/NRR/GRR), pricing compliance, insurance program

📊 Commercial

Renewals, churn, SLA & service-credit exposure, customer concentration (>30% flag), MFN clauses, supply-chain & operational capacity

🛠️ Product / Tech

DPA analysis, SOC 2 / ISO 27001, technical SLAs, integration & migration complexity, data portability, technical debt, vendor lock-in

🔒 Cybersecurity

Breach history, identity & access, encryption, incident response (MTTD/MTTR), vuln management, third-party risk, cyber insurance

👥 HR / People

Comp & benefits liabilities, key-talent retention, CoC acceleration / golden parachutes, labor & union exposure, succession, classification

🧾 Tax

Income tax, transfer pricing, NOLs & §382, sales/use nexus, international & deal-structure tax, provisions, controversy

📋 Regulatory

License transferability, antitrust (HSR/EU, HHI), data-privacy regulation, AML/sanctions, government contracts, sector rules

🌱 ESG

Environmental contamination, permits, climate/carbon (Scope 1/2/3), hazardous materials, ESG governance & disclosure (CSRD/TCFD)

Extensible

External specialists plug in via a Python entry-point — no core changes. Three built-in extensions already exist: Insurance→Finance, Operations→Commercial, IP-Deep→Legal.

Who this is for

Built for the serial, programmatic acquirer — across very different risk shapes.

A small in-house team running many deals a year. Different targets, different data rooms — but the same nine workstreams every time. That's the exact profile where a standardized, repeatable DD engine compounds in value.

Deal shapeWhere the risk concentrates
AI / IP-heavy targetModel IP & data rights, open-source exposure, DPAs, cross-border tax
Public-company carve-outSEC-grade scrutiny, regulatory clearance (CFIUS/HSR), ARR quality
VC-backed SaaSCustomer concentration, churn, retention, revenue quality
Regulated fintech / healthFinancial-crime / AML / sector compliance, license transferability
Platform / bolt-onCommercial commitments, contract assignment, integration

Each row is a different data-room shape — but the same nine domains, run the same disciplined way.

The fit, in one line

Your repeating DD workstreams are this tool's nine agents.


  • AI / IP deals: Product/Tech agent on model IP, data rights, DPAs, open-source exposure
  • CCaaS / SaaS: Finance + Commercial on ARR quality, churn, customer concentration, at-risk revenue
  • Regulated targets: Regulatory agent on AML/sanctions, license transferability
  • Cross-border deals: Tax agent on transfer pricing & structure; Regulatory on CFIUS/HSR filings
  • Public carve-outs: citation-grade audit trail that holds up to SEC scrutiny
Let me show you the output first

The deal lead gets the answer in 30 seconds — then drills down.

Executive dashboard with Go/No-Go verdict, key takeaways, severity distribution
Layer 1 — the decision zone, visible on load. Verdict + narrative + 9-domain risk strip + open items.

Progressive disclosure, four layers

1
Decision
Go / Conditional Go / Proceed-with-Conditions / No-Go, with a narrative explaining why, top deal-breakers, and revenue exposure.
2
Actions
Recommendations grouped Pre-close → 30d → 90d → Long-term, each with owner, effort, and the finding it ties to.
3
Domains
Nine domain cards + the cross-domain correlation matrix + deep dives.
4
Evidence
Every finding, sortable, with file/page/quote. Nothing is asserted without it.

Self-contained & offline. One HTML file, zero network calls, XSS-safe. Filter to "P0 only," and the URL itself encodes the filter — share it with the deal team.

The single biggest differentiator

It finds the risk that lives between domains.

A change-of-control clause is a Legal finding. The $2.4M ARR concentrated under that customer is a Finance finding. The key account-owner with no successor is an HR finding. Three teams, three memos, three workstreams.

This system recognizes they're the same compound risk on the same counterparty — and escalates it accordingly.

Compound severity escalation
  • Two P2s on the same entity across domains → escalate to P1
  • P1 + P2 on the same entity across domains → escalate to P0
  • 3+ domains flagging the same entity → P1 minimum
Cross-domain synthesis showing compound risks across entity domains
Compound-risk cards + domain-interaction matrix. The picture that used to take weeks to assemble manually.
How it works — the architecture

A Python pipeline drives the agents. The agents are workers, not decision-makers.

38 deterministic steps, 5 blocking quality gates. This control inversion is the whole ballgame — and the reason it's reliable enough for deal work.

Data Room PDFs · Word · Excel · scanned images
extract · classify subjects · resolve entities · score document precedence
PYTHON ORCHESTRATOR 38-step state machine · 5 blocking gates · checkpoint / resume
9 specialists run in parallel
LegalFinanceCommercialProduct/TechCyberHRTaxRegulatoryESG
Cross-Domain Triggers symbolic rules, no LLM
Judge adversarial spot-check · re-spawn if weak
Merge · Dedup · Numerical Audit deterministic Python
Executive Synthesis Go/No-Go · severity calibration
HTML + Excel + JSON
Why Python-on-top, not "an agent that orchestrates"

The orchestrator can't skip a step, miscount, or proceed past a failed gate, because it isn't an LLM — it's code. Agents are invoked at specific steps; their output is validated by Python before the pipeline advances. Resume from any checkpoint after a failure.

The hard lesson behind that design

Telling an LLM "you MUST" is a suggestion, not a control.

An earlier version let the LLM run the whole pipeline from a prose spec. On a real ~200-counterparty deal, we did a retrospective on the failures.

All 17 quality failures were the same kind
  • Agents skipped counterparties (28 of 34 outputs produced)
  • Produced aggregate summaries instead of per-subject analysis
  • Fabricated citations — quotes not in the source
  • Missed gap detection for missing referenced docs
  • Generated counts that didn't add up
  • The orchestrating LLM skipped its own blocking gates
The enforcement paradox

The LLM that's told to follow the rules is also the entity that decides whether to follow them. Markdown emphasis — "MUST", "BLOCKING", "CRITICAL" — has zero enforcement power.

The fix

Invert control. Python enforces flow. Hooks enforce output format. Validation gates are if/else in code. Coverage is verified by counting files on disk. Numbers are re-derived from source. The LLM does analysis; it never decides whether quality passed.

This is the difference between an impressive demo and a system a deal team can put its name on.

The trust layer · citation verification

Every claim is proven against the source — or it doesn't survive.

When an agent says "termination right in Section 12.3," the system independently confirms that quote actually exists. This is what separates analysis from hallucination.

1
Exact page match
Search the cited page. Whitespace normalized to absorb PDF column-layout artifacts.
2
Adjacent pages ±1
Catches cross-page quotes and off-by-one page errors from extraction.
3
Full-document fuzzy match (≥80%)
OCR'd text never matches exactly — rapidfuzz partial-ratio catches reformatted quotes.
4
Cross-file search
Quote not in the cited file? Search all of that counterparty's docs and auto-correct the file attribution.
Defense in depth — 5 layers
  • Structured output validated against a schema; malformed output rejected on write
  • Mandatory citations — file, page, section, exact quote
  • "NOT_FOUND" escape valve — tell the model to write a gap instead of inventing
  • Adversarial Judge re-checks high-severity claims
  • 6-layer numerical audit re-derives every number

The single biggest unlock: giving the model an explicit way to say NOT_FOUND and write a gap. Without that escape valve, models fabricate rather than admit ignorance. With it, hallucination drops sharply.

Neurosymbolic cross-domain analysis

Deterministic rules decide when a finding needs a second domain's eyes.

After the first pass, symbolic rules — no LLM, fully auditable — fire when one domain's finding has implications in another. Each fired rule spawns a focused second-pass analysis, scoped to only the cited contracts, with full provenance.

If this domain finds……ask this domain to
Finance: revenue recognition issueLegalverify contract enforceability — ASC 606 rights, delivery criteria, clawbacks
Legal: change-of-control riskFinancequantify revenue at risk if CoC terminates; acceleration; ARR/TCV impact
Legal: termination-for-convenienceFinancecalculate at-risk remaining contract value & committed vs uncommitted revenue
Legal: IP ownership disputeProduct/Techassess which systems depend on the disputed IP; migration cost
Product/Tech: cross-border data flowLegalverify DPA compliance — SCCs/adequacy, GDPR Art. 28, breach notice
Commercial: SLA / service-credit riskFinancequantify max annual service-credit liability & recurring-revenue impact
Finance: pricing anomalyCommercialvalidate vs rate cards, volume commitments, renewal & competitive benchmarks
Why this matters

LLMs are for judgment; code is for control. The rules are pattern-matched on finding category + severity + keywords — deterministic, logged, budget-bounded (≈$5/deal cap on second-pass work). You can audit exactly why every cross-domain check fired.

The invisible foundations

Two problems that, unsolved, make every downstream finding wrong.

Entity resolution — 6-pass cascade

"IBM", "International Business Machines", and subsidiary "Red Hat" — same entity in your analysis? If the system can't resolve that, consolidation breaks.

  • 1 Exact · 2 Normalized (strip Inc/Corp/LLC)
  • 3 Alias lookup (config-driven)
  • 4 Fuzzy token-sort (88% long / 95% short names)
  • 5 TF-IDF cosine for transpositions
  • 6 Parent-child hierarchy + learned matches

Counter-intuitive guard: names ≤5 chars never fuzzy-match — otherwise "Inc." matches everything. Every match is logged for audit.

Document precedence — which version governs

A data room has v1, v2_draft, v2_signed, MSA_final, MSA_executed. And an MSA governs every Order Form beneath it — until an Order Form says "notwithstanding…" and the child overrides the parent.

  • Version chains: signed/executed rank 10 → draft 2 → superseded 1
  • Folder tiers: /Executed/ outranks /Drafts/
  • Composite score: version 40% · folder 30% · recency 30%
  • Governance graph (NetworkX): amendment chains, cascade impact, conflict detection — deterministic traversal, not LLM

CoC-impact query across 200 contracts resolves in ~50ms. The graph decides how clauses interact; the LLM only reads what each one says.

The judgment you'd expect from a partner

Same clause. Different deal. Different severity.

A raw AI flags every change-of-control clause as critical. But the deal structure decides what's actually material — and that context flows through the whole pipeline.

Asset purchase
  • P0 — anti-assignment blocks transfer; get consent or lose the customer
  • P1 — shared-services agreement must be replicated; no transition plan
Stock purchase — same clauses
  • P3 — legal entity doesn't change; no assignment occurs. Routine.
  • P3 — intercompany obligations eliminated at closing. Not a risk.
One severity authority — deterministic and auditable

Final severity is decided once, by code, not scattered across the pipeline: the agent's call → deterministic recalibration of known false positives (termination-for-convenience capped at P2; competitor-only CoC → P3) → your own per-deal overrides. Every finding carries its severity_source and the full chain, so an IC or auditor sees exactly why each clause landed where it did. A safety bound means a user override can never silently bury a genuine deal-breaker.

The twist seasoned people catch: some contracts treat ownership change as a "deemed assignment," so anti-assignment can trigger even in a stock deal. That's why CoC is disaggregated into five subtypes — to catch exactly this.

Signal vs noise · the adversarial check

Most of what AI finds is noise. The work is killing it.

Four agents across hundreds of documents produce thousands of findings. Without classification, the real risks drown.

1
Noise filter (15 patterns)
"extraction failed," "binary file" — process artifacts, not findings. Removed.
2
Data-quality filter (14 patterns)
"records unavailable" — real gaps, but not material risks. Routed to an appendix.
3
Material findings
What survives both filters. These drive the verdict.
The Judge — adversarial QA

A separate agent spot-checks findings with accusatory framing — "this finding appears fabricated; prove it with a direct quote." (Polite "please double-check" prompts had near-zero effect; accusatory framing improved accuracy ~9% in the research.)

  • Risk-based sampling: P0 100% · P1 20% · P2 10%
  • Scored across 5 weighted dimensions; threshold 70/100
  • Weak agents re-spawned for a second round

Know when to stop using LLMs. Validation, dedup, and audit were moved from an LLM to deterministic Python — quality went up, cost went to zero, and every failure now has a stack trace.

Fail-closed by design

Better to produce nothing than unreliable output.

Five blocking gates halt the pipeline on failure — they don't just log a warning nobody reads. If it can't meet the bar, it stops and tells you why.

Gate · Extraction

Halts if document text extraction systemically fails.

Gate · Coverage

Every counterparty must be analyzed by every agent. Auto-respawn for gaps.

Gate · Numerical audit

6 layers: source traceability, re-derivation, cross-source & cross-format consistency.

Gate · QA audit

17 structural checks — manifests, citations, domain coverage, report sheets.

Gate · Post-generation

Excel must match the numerical manifest cell-for-cell before release.

Agent guardrails

Path-locked to the output dir. 24 blocked shell patterns. Turn limits with hard kill. Agents have tools, not shell freedom.

Plus 31 Definition-of-Done checks at the end and atomic, read-only handling of your data room — the tool never modifies your files, sends nothing home, and writes only to a separate _dd/ directory.

Make it your firm's playbook — no code

Tune what each analyst looks for. You can add guidance; you can't remove a safety rule.

An M&A lead can adapt the nine specialists to a deal or a house style by editing markdown — not Python. Drop a file per agent, or inherit a deal-type profile.

# dd-config/agents/legal.md --- agent: legal extends: saas # inherit a deal-type profile --- ## Persona (replaces default) Prioritize change-of-control and IP chain-of-title. ## Additional Focus Areas - open-source copyleft exposure - EU data-residency commitments ## Severity Overrides - change_of_control: P1
Inspect & audit any agent — read-only
  • dd-agents agents list — every specialist + status
  • describe — its persona, focus, severity rules, safety floor
  • validate — lint your config, fail-closed
  • preview — the exact prompt the model will receive
Governed by design

Bundled deal-type profiles (saas, regulated-fintech, …) compose with one merge rule. A non-removable safety floor — citation mandate, anti-fabrication, anti-tampering — is always appended last, so no customization can weaken the controls.

Institutional memory

Every run is smarter than the last.

Deal Knowledge Base — compounds across runs
  • Entity profiles & clause summaries — enriched each run
  • Contradiction tracking — "Company X says 2-yr retention; Contract Y says 5"
  • Finding lineage via SHA-256 fingerprint — tracked across runs even when wording changes (active → resolved → recurring)
  • Append-only chronicle — every run, search, annotation, timestamped
  • Next run's prompts include prior context, known contradictions, finding history

Run 1 finds the risk. Run 2 knows the context, catches what changed, and flags new contradictions.

The system develops institutional memory — like an analyst who remembers every deal they've ever worked on.

For a serial acquirer

Across a deal pipeline this means: re-screening a target you looked at two years ago starts from what you already learned; recurring counterparties (shared customers, common vendors) carry forward; and post-close, the same engine re-runs to track whether flagged risks were remediated.

What it can do — beyond the full run

Different jobs, one engine.

Full pipeline

Integrated due diligence

All 9 domains, cross-referenced, 5 gates, HTML + Excel + JSON. The deep run for an active target.

dd-agents run deal-config.json
Quick scan · minutes

Red-flag triage

GREEN / YELLOW / RED across 8 deal-killer categories — litigation, IP gaps, undisclosed contracts, key-person, restatements, regulatory, customer concentration, debt covenants. A first read before you commit resources.

run --quick-scan --model-profile economy
Targeted search

Ask every contract one question

"Does this require consent on change of control? YES/NO/NOT_ADDRESSED." → Excel with answers, citations, verification scores. The prompts file is plain JSON any lawyer can write.

dd-agents search prompts.json
Post-run

Chat · query · portfolio

Multi-turn chat with memory over the findings; one-shot Q&A; track and compare risk across multiple deals; export PDF; data-room quality assessment before you even start.

dd-agents chat · portfolio compare

Deployment: runs entirely on your machine or your cloud (Anthropic API or AWS Bedrock). Docker image, Homebrew, or pip. No telemetry, no phone-home — important for confidential deal data.

Economics

Every API call is a deal cost — so cost control is built into the architecture.

Right model for the right task
# Three model profiles: economy / standard / premium Extraction & triage: Haiku ($0.80 / M in) Specialist analysis: Sonnet ($3 / M in) Executive synthesis: Opus ($15 / M in) — only where it matters

Per-agent and per-step cost tracking. Hard budget limits that halt the pipeline. Cross-domain second-pass work is capped (≈$5/deal default). You see the estimated cost before the run.

Why this matters for M&A

In a deal, pipeline cost shows up on someone's fee schedule. With intelligent cost management at every layer, the run cost stays manageable even for large data rooms.


The point isn't "cheaper than advisors." It's invaluable alongside them — additive to insight, not to the fee burden. A first-read that makes the expensive hours count.

Open-source, Apache-2.0 — no per-seat license, no SaaS subscription. The only marginal cost is the LLM tokens, on your own provider account.

Where I'm honest about the edges

Current limitations.

A seasoned M&A person will ask this anyway — so here it is straight.

It accelerates advisors; it doesn't replace them

Legal, financial, and regulatory conclusions still belong to qualified professionals. The tool produces analysis they build on — IC memos, negotiation checklists, integration plans.

Contract / document due diligence

It analyzes what's in the data room. It doesn't do management interviews, site visits, customer calls, or live financial-audit work.

Garbage in, gaps out — by design

Extraction is the hard 80%. Truly unreadable scans degrade gracefully (it flags a gap rather than guessing), but quality depends on data-room quality.

English-first; multilingual via OCR

Strongest on English contracts. 100+ language OCR is supported, but non-English legal nuance is less battle-tested.

Tuned at ~200–500 docs / ~50–200 subjects

File-based, no database. Excellent at typical deal scale; a mega-deal with tens of thousands of docs would want a different storage tier.

A power tool, not a turnkey SaaS

It's a CLI / open-source package. It assumes a technical operator to run it and a deal professional to interpret. That's also why it's adaptable to your playbook.

Where this goes

From "interesting" to a standard part of the deal machine.

Screening & triage

Run quick-scan on every target that reaches data-room stage. Standardize the first read so the team spends expensive hours only where red flags warrant it — compounding across the hundreds of targets a corp-dev team screens each year.

A house playbook (today)

Encode your severity rules, deal-type logic, and focus areas once as reusable dd-config/ markdown profiles — then every deal is diligenced the same way, with the buyer-thesis "Acquirer Intelligence" agent mapping findings to your integration priorities.

Audit-ready by default

Every finding carries its source quote, severity chain, and a config/prompt-version fingerprint. The report shows the exact analyst configuration it ran under — so an IC or regulator can trace any conclusion end-to-end.

Cross-deal portfolio view

Compare risk profiles across targets you're weighing; carry institutional memory between runs; re-run post-close to track whether flagged risks were remediated.

Private & on your terms

Self-hosted, runs on the Anthropic API or AWS Bedrock, no telemetry. Confidential deal data never leaves your environment — a hard requirement you can't get from most SaaS DD tools.

Open to collaboration

It's open source for a reason. The most useful directions come from people who run more deals than I do — which domains matter most, what the IC wants to see first. That feedback shapes where this goes next.

The takeaway
The gap between an AI demo and a system a deal team can put its name on is entirely engineering.

74% → 95% accuracy came from extraction, chunking, citation verification, deterministic gates, and cross-domain synthesis — not from a bigger model. That discipline is what makes this usable for real M&A.

Try it live

Interactive sample report — no install:
zoharbabin.github.io/
due-diligence-agents

The code

pip install dd-agents
Apache-2.0 · 3,000+ tests · open source

Next step

A working session on a real (or synthetic) data room — a live run, end-to-end.


These patterns aren't M&A-specific — they apply to any system doing cross-document analysis at scale: contract review, compliance, research synthesis, knowledge management.

01 / 22
Due Diligence Agents · integrated, AI-powered M&A due diligence

Speaker notes