excelminer / Output Schema

Output Schema

Understand the normalized graph, nodes, edges, and best-effort fields.

High-level structure

  • path: input file path.
  • options: AnalysisOptions used.
  • issues: top-level issues/warnings.
  • reports: per-backend stats + issues.
  • graph: nodes, edges, and stats.

Node and edge shape

Node

{
  "id": "sheet:Sheet1",
  "kind": "sheet",
  "key": "Sheet1",
  "attrs": {
    "index": 0
  }
}

Edge

{
  "src": "sheet:Sheet1",
  "dst": "formula_cell:Sheet1!A1",
  "kind": "contains",
  "attrs": {}
}

Common node kinds

sheet defined_name chart connection source powerquery m_script pivot_table pivot_cache vba_project formula_cell formula_group cell_block

Relationship kinds

  • contains: sheet → formula/cell blocks
  • uses_source: connection → source
  • uses_connection: powerquery → connection
  • uses_cache: pivot_table → pivot_cache
  • scoped_to: defined_name → sheet
  • has_script: powerquery → m_script
  • uses_defined_name: chart → defined_name
  • member_of: formula_cell → formula_group

Best-effort fields

Power Query

  • Extracts M code when stored as XML.
  • Emits per-script M code as m_script nodes.
  • Binary mashups attempt best-effort M extraction; otherwise metadata-only.
  • Source inference is regex-based and partial.

Pivots + formulas

  • Pivot cache linking is best-effort.
  • Formula dependencies are heuristic.
  • Formulas are never evaluated.
  • Formula groups exist only when post_analysis_distillation is enabled.

Connections

  • Sanitized connection KV stored in connection_kv.
  • Raw connection strings may also be present.