excelminer / Options

AnalysisOptions

Feature flags and limits that shape extraction.

Feature flags

Flag Default What it enables
include_vba True Detect embedded VBA projects in macro-enabled files; emits vba_project nodes with module text when available.
include_connections True Extract OOXML connections and infer source nodes.
include_powerquery True Parse Power Query definitions in xl/queries/*.xml.
include_pivots True Parse pivot tables + caches (best-effort).
include_defined_names True Extract defined names (OOXML and optional COM).
include_formulas False Capture formula text and dependencies via openpyxl.
include_cells False Enable calamine used-range scanning for cell_block nodes. Does not create formula nodes.
include_com False Allow COM automation (Windows + Excel). Required for modern OOXML enrichment.
post_analysis_distillation False Condense graphs (e.g., formula grouping, pruning unused nodes). See distillation_workers for parallelism.

Formula scanning controls

  • formula_sheet_names: restrict formula scans to named sheets (mutually exclusive with formula_sheet_indexes).
  • formula_sheet_indexes: restrict formula scans by workbook order (1-based).
  • formula_scan_mode: auto (default), xml, or openpyxl.
  • formula_large_sheet_cells: with auto, switch to XML formula-only scans when sheets exceed 10,000 cells.
  • formula_skip_pivot_cells: skip formulas inside pivot table ranges (default true).

Limits

Scoping

  • max_sheets: cap sheets scanned by sheet-iterating backends.
  • max_cells_per_sheet: cap formula/cell scans.
  • max_rows_per_sheet: cap rows scanned for formulas.
  • max_cols_per_sheet: cap columns scanned for formulas.

Sampling

  • row_chunk_size: row chunk sizing for formula scans.
  • column_chunk_size: column chunk sizing for formula scans.
  • sample_rows_per_block: sample rows in calamine blocks.
  • sample_cols_per_block: sample columns in calamine blocks.

Parallelism and logging

  • formula_decomposition_workers: parallelism for formula dependency extraction.
  • distillation_workers: parallelism for post-analysis distillation.
  • log_callback: optional callback for progress/issues during analysis.

COM nuances

COM automation is opt-in for modern formats (.xlsx, .xlsm) and requires Windows + Excel. Legacy formats like .xls can trigger COM even without include_com=True because OOXML parsing may not apply.