Feature flags
| Flag | Default | What it enables |
|---|---|---|
include_vba |
True | Detect embedded VBA projects in macro-enabled files; emits vba_project nodes with module text when available. |
include_connections |
True | Extract OOXML connections and infer source nodes. |
include_powerquery |
True | Parse Power Query definitions in xl/queries/*.xml. |
include_pivots |
True | Parse pivot tables + caches (best-effort). |
include_defined_names |
True | Extract defined names (OOXML and optional COM). |
include_formulas |
False | Capture formula text and dependencies via openpyxl. |
include_cells |
False | Enable calamine used-range scanning for cell_block nodes. Does not create formula nodes. |
include_com |
False | Allow COM automation (Windows + Excel). Required for modern OOXML enrichment. |
post_analysis_distillation |
False | Condense graphs (e.g., formula grouping, pruning unused nodes). See distillation_workers for parallelism. |
Formula scanning controls
formula_sheet_names: restrict formula scans to named sheets (mutually exclusive withformula_sheet_indexes).formula_sheet_indexes: restrict formula scans by workbook order (1-based).formula_scan_mode:auto(default),xml, oropenpyxl.formula_large_sheet_cells: withauto, switch to XML formula-only scans when sheets exceed 10,000 cells.formula_skip_pivot_cells: skip formulas inside pivot table ranges (default true).
Limits
Scoping
max_sheets: cap sheets scanned by sheet-iterating backends.max_cells_per_sheet: cap formula/cell scans.max_rows_per_sheet: cap rows scanned for formulas.max_cols_per_sheet: cap columns scanned for formulas.
Sampling
row_chunk_size: row chunk sizing for formula scans.column_chunk_size: column chunk sizing for formula scans.sample_rows_per_block: sample rows in calamine blocks.sample_cols_per_block: sample columns in calamine blocks.
Parallelism and logging
formula_decomposition_workers: parallelism for formula dependency extraction.distillation_workers: parallelism for post-analysis distillation.log_callback: optional callback for progress/issues during analysis.
COM nuances
COM automation is opt-in for modern formats (.xlsx, .xlsm) and requires
Windows + Excel. Legacy formats like .xls can trigger COM even without
include_com=True because OOXML parsing may not apply.