excelminer documentation

Extract Excel artifacts into a normalized graph.

excelminer inventories workbooks without opening Excel: sheets, connections, Power Query, pivots, formulas, and more.

Quickstart

Analyze a workbook to JSON

from excelminer import AnalysisOptions, analyze_to_dict

result = analyze_to_dict(
    "workbook.xlsx",
    options=AnalysisOptions(include_formulas=True),
)

print(result["graph"]["stats"])
Graph first

Inspect the normalized graph

from excelminer import AnalysisOptions, analyze_workbook

graph, reports, ctx = analyze_workbook(
    "workbook.xlsx",
    options=AnalysisOptions(include_formulas=True),
)

print(graph.stats())
print(ctx.issues)

What excelminer extracts

The pipeline is designed for inventory, reproducible diffs, and analysis. Formulas are captured as text, macros are never executed.

Workbook structure

  • Sheets, defined names, charts (when referenced)
  • Connections + inferred sources
  • Pivot tables + caches

Query & automation hints

  • Power Query definitions + M scripts
  • Binary mashup detection (best-effort extraction)
  • VBA project module text

Optional enrichment

  • Formula text & dependencies
  • Used-range value blocks (calamine)
  • COM automation (Windows + Excel)

Backend pipeline at a glance

The default backend order can be overridden, but this is the typical flow.

  1. OOXML zip parsing (sheets, defined names, charts, connections).
  2. VBA project module extraction for macro-enabled files.
  3. Power Query extraction + script/source inference.
  4. Pivot tables and caches.
  5. Calamine used-range scanning (optional).
  6. Openpyxl formula text (optional).
  7. Excel COM enrichment (Windows-only, opt-in).

Serving locally

Tip: Write the bundled site files to a folder and open index.html locally:

python -m excelminer.docs --write-site .\excelminer-site

Then open excelminer-site\index.html in your browser.

Core output shape

Every analysis returns a deterministic graph.

Node

{
  "id": "sheet:Sheet1",
  "kind": "sheet",
  "key": "Sheet1",
  "attrs": {"index": 0}
}

Edge

{
  "src": "sheet:Sheet1",
  "dst": "formula_cell:Sheet1!A1",
  "kind": "contains",
  "attrs": {}
}

Explore the details

Implementation guidance

Learn about architecture decisions, backend responsibilities, and data flow.

Architecture Backends Output Schema

Safety first

Understand connection sanitization, Power Query privacy risks, and COM automation constraints.

Security Privacy COM notes