excelminer / Backends

Backend pipeline

Each backend extracts a slice of workbook metadata into the graph.

OOXMLZipBackend

Structural extraction via zipped OOXML parts.

  • Sheets, defined names, charts (when defined names are referenced)
  • Connections + inferred sources
  • No Excel installation required

OpenpyxlBackend

Formula text extraction (not evaluation).

  • Formula cell nodes (include_formulas=True)
  • Runs with include_formulas=True or include_cells=True
  • Can use XML formula-only scans for large sheets (formula_scan_mode)
  • Supports sheet scoping and pivot-range skips for formulas
  • include_cells does not emit cell blocks in this backend

CalamineBackend

Optional used-range/value block scanning.

  • Requires excelminer[calamine]
  • Emits cell_block nodes

ComBackend (Windows)

Opt-in Excel automation for enrichment.

  • Requires Excel + excelminer[com]
  • Opt-in for modern OOXML formats
  • Can extract sheets, connections, formulas

PowerQueryZipBackend

Parses xl/queries/*.xml and infers sources from M.

  • Emits powerquery and m_script nodes
  • Mashup containers attempt best-effort script extraction
  • Links to source nodes when possible

PivotZipBackend

Best-effort pivot table + cache extraction.

  • Emits pivot_table + pivot_cache
  • Links pivots to caches and connections

VbaZipBackend

VBA module extraction for macro-enabled files.

  • Emits vba_project nodes
  • Captures VBA module text when available

Backend ordering

Default ordering is deterministic for reproducible diffs.

  1. OOXML → VBA → Power Query → Pivot
  2. Calamine → Openpyxl → COM