DuckDB Public Repo System Review Graph

DuckDB turns SQL and local data access into vectorized analytical execution inside an embedded database engine.

Depth: deep Systems: 6 Artifacts: 8 Gates: 5 Workflows: 6

Bigger Picture

This example shows how to map a database engine repo. The system is not just a command-line tool or a client library; it is a layered execution engine where SQL is parsed, bound, planned, optimized, executed in vectorized operators, connected to storage and transactions, and extended through extensions. A reviewer should understand the route from query text to result chunks and persisted state.

Source Links

Report Registers

Coverage Register

AreaCountWhat It MeansReviewer Use
Systems6Bounded contexts, services, subsystems, or product surfaces.Use this to see whether the report maps the main operating areas.
Artifacts8Inspectable files, APIs, tables, dashboards, reports, or outputs.Use this to trace where system claims can be inspected.
Schemas/contracts6Public or sanitized contracts for artifacts and handoffs.Use this to rebuild examples without touching private data.
Decision gates5Rules that advance, wait, block, or require human review.Use this to find where the system controls action.
Workflows6Lifecycle steps from input to output.Use this to follow what happens end to end.
Graph edges49Explicit and derived relationships between manifest nodes.Use this to audit connectivity and missing relationships.
Child maps0Linked subsystem maps for large repositories.Use this to drill into a map-of-maps instead of one flat report.
Blueprint sections0Source-evidence-backed operating flows.Use this to review deep behavior claims with proof anchors.
Blueprint evidence rows0Source paths, symbols, roles, and proof levels.Use this to verify whether blueprint claims are source-backed.
Source links3External or public references used by the report.Use this to confirm the report's public evidence base.
Known boundaries4Open limits, unproven claims, redactions, or scope exclusions.Use this to avoid treating the report as stronger than it is.
Review questions5Questions a maintainer, auditor, or agent should answer next.Use this as the human follow-up queue.
Rebuild phases2Documented commands or phases for reproducing the report.Use this to regenerate or verify the report locally.

Evidence Register

EvidenceKindCoverageProofReviewer Use
GitHub repositorysource linkwhole reportdeclaredPrimary public source used for repo identity and source paths.
DuckDB documentationsource linkwhole reportdeclaredPublic docs for SQL, clients, extensions, and engine behavior.
DuckDB internals overviewsource linkwhole reportdeclaredPublic internals documentation for engine orientation.
src/parser/source_directoryenginesafe_to_shareTurns SQL text into parsed statements.
src/planner/source_directoryenginesafe_to_shareBinds and plans parsed SQL into logical operators.
src/optimizer/source_directoryenginesafe_to_shareRewrites and improves logical plans before physical execution.
src/execution/source_directoryenginesafe_to_shareExecutes physical operators and data pipelines.
src/storage/source_directoryenginesafe_to_shareHandles table storage, persistence, scans, and writes.
src/transaction/source_directoryenginesafe_to_shareCoordinates transaction lifecycle and commit boundaries.
extension/source_directoryextensionssafe_to_shareHosts built-in and optional extension surfaces such as parquet, json, icu, tpch, and tpcds.
test/testsqualitysafe_to_shareProtects SQL behavior, storage behavior, extensions, and compatibility.
SQLRequestquery_contractsql_text, connection_context, parameters, transaction_statecontract declaredRepresents incoming SQL and execution context before parsing and binding.
LogicalPlanplanning_contractoperators, bindings, types, catalog_refscontract declaredRepresents a query after parsing and semantic binding.
PhysicalPlanexecution_contractphysical_operators, pipelines, dependencies, estimated_costscontract declaredRepresents executable operator pipelines.
DataChunkvectorized_data_contractvectors, column_types, cardinalitycontract declaredRepresents batches of data flowing through vectorized execution.
StorageTransactionstorage_contractcatalog_state, table_state, write_set, commit_statuscontract declaredRepresents storage and transaction state for reads and writes.
ExtensionSpecextension_contractextension_name, functions, load_policy, compatibilitycontract declaredDescribes extension-provided functionality and loading boundaries.

Gap Register

GapAreaStatusBoundaryNext Step
Known boundarywhole reportopenThis is a public educational map, not an official DuckDB maintainer audit.Accept the boundary or add evidence that closes it.
Known boundarywhole reportopenIt maps high-level engine layers and public source paths, not every operator or internal invariant.Accept the boundary or add evidence that closes it.
Known boundarywhole reportopenA real audit should inspect a specific commit, build configuration, tests, fuzzers, benchmarks, and release notes.Accept the boundary or add evidence that closes it.
Known boundarywhole reportopenDo not use production SQL or private data in public examples.Accept the boundary or add evidence that closes it.
System truth boundarySQL Front DoorreviewA parsed query is not executable until binding and planning succeed.Inspect this boundary before making stronger behavior claims.
System truth boundaryPlanner And OptimizerreviewOptimization must not change query meaning.Inspect this boundary before making stronger behavior claims.
System truth boundaryVectorized Execution EnginereviewExecution depends on valid plan, memory, transaction, and storage state.Inspect this boundary before making stronger behavior claims.
System truth boundaryStorage And Transaction LayerreviewStorage behavior is valid only within transaction rules.Inspect this boundary before making stronger behavior claims.
System truth boundaryExtension SystemreviewExtensions expand behavior but must respect engine compatibility and load policy.Inspect this boundary before making stronger behavior claims.
System truth boundaryQuality And Compatibility LoopreviewThis report does not replace upstream CI or benchmark review.Inspect this boundary before making stronger behavior claims.
Blueprint not declaredwhole reportoptionalNo source-backed blueprint sections were declared.Add blueprint sections when the report needs source-level proof.

Action Register

ActionOwnerStatusTriggerExpected Output
Review questionmaintainer / auditoropenHow does SQL move from text to parsed statement, logical plan, optimized plan, physical operators, and result chunks?Answer from source, tests, docs, logs, or maintainer knowledge.
Review questionmaintainer / auditoropenWhich gates preserve query semantics during binding and optimization?Answer from source, tests, docs, logs, or maintainer knowledge.
Review questionmaintainer / auditoropenWhere do storage and transaction boundaries constrain execution?Answer from source, tests, docs, logs, or maintainer knowledge.
Review questionmaintainer / auditoropenHow do extensions expand engine behavior without destabilizing the core?Answer from source, tests, docs, logs, or maintainer knowledge.
Review questionmaintainer / auditoropenWhich public tests, fuzzers, benchmarks, and release notes would a deeper audit inspect?Answer from source, tests, docs, logs, or maintainer knowledge.
Resolve boundarymaintainer / auditoropenThis is a public educational map, not an official DuckDB maintainer audit.Accept as scope or add proof that closes it.
Resolve boundarymaintainer / auditoropenIt maps high-level engine layers and public source paths, not every operator or internal invariant.Accept as scope or add proof that closes it.
Resolve boundarymaintainer / auditoropenA real audit should inspect a specific commit, build configuration, tests, fuzzers, benchmarks, and release notes.Accept as scope or add proof that closes it.
Resolve boundarymaintainer / auditoropenDo not use production SQL or private data in public examples.Accept as scope or add proof that closes it.
Rebuild phasemaintainer / agentrepeatablevalidateCheck the DuckDB public repo manifest.
Rebuild phasemaintainer / agentrepeatablebuildGenerate the DuckDB system review report.

Lifecycle Map

flowchart LR
  receive_sql["Receive SQL"]
  parse_and_bind["Parse And Bind"]
  plan_and_optimize["Plan And Optimize"]
  execute_plan["Execute Plan"]
  commit_or_return["Commit Or Return Results"]
  load_extension["Load Extension"]
  receive_sql --> parse_and_bind["Parse And Bind"]
  parse_and_bind --> plan_and_optimize["Plan And Optimize"]
  plan_and_optimize --> execute_plan["Execute Plan"]
  execute_plan --> commit_or_return["Commit Or Return Results"]
  load_extension --> parse_and_bind["Parse And Bind"]

Artifact And Schema Map

flowchart LR
  system_sql_front_door["SQL Front Door"]
  system_sql_front_door --> artifact_parser_source["Parser Source"]
  artifact_parser_source --> schema_SQLRequest["SQLRequest"]
  system_planner_optimizer["Planner And Optimizer"]
  system_planner_optimizer --> artifact_planner_source["Planner Source"]
  artifact_planner_source --> schema_LogicalPlan["LogicalPlan"]
  system_planner_optimizer --> artifact_optimizer_source["Optimizer Source"]
  artifact_optimizer_source --> schema_LogicalPlan["LogicalPlan"]
  system_vectorized_execution_engine["Vectorized Execution Engine"]
  system_vectorized_execution_engine --> artifact_execution_source["Execution Source"]
  artifact_execution_source --> schema_PhysicalPlan["PhysicalPlan"]
  system_storage_transaction_layer["Storage And Transaction Layer"]
  system_storage_transaction_layer --> artifact_storage_source["Storage Source"]
  artifact_storage_source --> schema_StorageTransaction["StorageTransaction"]
  system_storage_transaction_layer --> artifact_transaction_source["Transaction Source"]
  artifact_transaction_source --> schema_StorageTransaction["StorageTransaction"]
  system_extension_system["Extension System"]
  system_extension_system --> artifact_extension_tree["Extension Tree"]
  artifact_extension_tree --> schema_ExtensionSpec["ExtensionSpec"]
  system_quality_compatibility_loop["Quality And Compatibility Loop"]
  system_quality_compatibility_loop --> artifact_test_suite["Test Suite"]
  artifact_test_suite --> schema_SQLRequest["SQLRequest"]

Gate Map

flowchart LR
  gate_parse_bind_gate{"Parse And Bind Gate"}
  gate_parse_bind_gate --> out_parse_bind_gate_LogicalPlan["LogicalPlan"]
  gate_parse_bind_gate --> out_parse_bind_gate_query_error["query_error"]
  gate_optimization_gate{"Optimization Gate"}
  gate_optimization_gate --> out_optimization_gate_optimized_plan["optimized_plan"]
  gate_optimization_gate --> out_optimization_gate_fallback_plan["fallback_plan"]
  gate_execution_gate{"Execution Gate"}
  gate_execution_gate --> out_execution_gate_DataChunk["DataChunk"]
  gate_execution_gate --> out_execution_gate_execution_error["execution_error"]
  gate_storage_commit_gate{"Storage Commit Gate"}
  gate_storage_commit_gate --> out_storage_commit_gate_committed["committed"]
  gate_storage_commit_gate --> out_storage_commit_gate_rolled_back["rolled_back"]
  gate_extension_load_gate{"Extension Load Gate"}
  gate_extension_load_gate --> out_extension_load_gate_extension_loaded["extension_loaded"]
  gate_extension_load_gate --> out_extension_load_gate_extension_blocked["extension_blocked"]
  gate_parse_bind_gate{"Parse And Bind Gate"} --> step_parse_and_bind["Parse And Bind"]
  gate_optimization_gate{"Optimization Gate"} --> step_plan_and_optimize["Plan And Optimize"]
  gate_execution_gate{"Execution Gate"} --> step_execute_plan["Execute Plan"]
  gate_storage_commit_gate{"Storage Commit Gate"} --> step_commit_or_return["Commit Or Return Results"]
  gate_extension_load_gate{"Extension Load Gate"} --> step_load_extension["Load Extension"]

Relationship Graph

flowchart TD
  extension_system["Extension System"] -- "adds functions to" --> planner_optimizer["Planner And Optimizer"]
  storage_transaction_layer["Storage And Transaction Layer"] -- "feeds data to" --> vectorized_execution_engine["Vectorized Execution Engine"]
  sql_front_door["SQL Front Door"] -- "owns or uses" --> parser_source["Parser Source"]
  sql_front_door["SQL Front Door"] -- "is gated by" --> parse_bind_gate["Parse And Bind Gate"]
  planner_optimizer["Planner And Optimizer"] -- "owns or uses" --> planner_source["Planner Source"]
  planner_optimizer["Planner And Optimizer"] -- "owns or uses" --> optimizer_source["Optimizer Source"]
  planner_optimizer["Planner And Optimizer"] -- "is gated by" --> optimization_gate["Optimization Gate"]
  vectorized_execution_engine["Vectorized Execution Engine"] -- "owns or uses" --> execution_source["Execution Source"]
  vectorized_execution_engine["Vectorized Execution Engine"] -- "is gated by" --> execution_gate["Execution Gate"]
  storage_transaction_layer["Storage And Transaction Layer"] -- "owns or uses" --> storage_source["Storage Source"]
  storage_transaction_layer["Storage And Transaction Layer"] -- "owns or uses" --> transaction_source["Transaction Source"]
  storage_transaction_layer["Storage And Transaction Layer"] -- "is gated by" --> storage_commit_gate["Storage Commit Gate"]
  storage_transaction_layer["Storage And Transaction Layer"] -- "is gated by" --> execution_gate["Execution Gate"]
  extension_system["Extension System"] -- "owns or uses" --> extension_tree["Extension Tree"]
  extension_system["Extension System"] -- "is gated by" --> extension_load_gate["Extension Load Gate"]
  quality_compatibility_loop["Quality And Compatibility Loop"] -- "owns or uses" --> test_suite["Test Suite"]
  quality_compatibility_loop["Quality And Compatibility Loop"] -- "is gated by" --> parse_bind_gate["Parse And Bind Gate"]
  quality_compatibility_loop["Quality And Compatibility Loop"] -- "is gated by" --> execution_gate["Execution Gate"]
  quality_compatibility_loop["Quality And Compatibility Loop"] -- "is gated by" --> storage_commit_gate["Storage Commit Gate"]
  client_request["client_request"] -- "feeds" --> receive_sql["Receive SQL"]
  receive_sql["Receive SQL"] -- "produces" --> SQLRequest["SQLRequest"]
  receive_sql["Receive SQL"] -- "routes to" --> parse_and_bind["Parse And Bind"]
  SQLRequest["SQLRequest"] -- "feeds" --> parse_and_bind["Parse And Bind"]
  parser_source["Parser Source"] -- "feeds" --> parse_and_bind["Parse And Bind"]
  parse_and_bind["Parse And Bind"] -- "produces" --> LogicalPlan["LogicalPlan"]
  parse_bind_gate["Parse And Bind Gate"] -- "gates" --> parse_and_bind["Parse And Bind"]
  parse_and_bind["Parse And Bind"] -- "routes to" --> plan_and_optimize["Plan And Optimize"]
  LogicalPlan["LogicalPlan"] -- "feeds" --> plan_and_optimize["Plan And Optimize"]
  planner_source["Planner Source"] -- "feeds" --> plan_and_optimize["Plan And Optimize"]
  optimizer_source["Optimizer Source"] -- "feeds" --> plan_and_optimize["Plan And Optimize"]
  plan_and_optimize["Plan And Optimize"] -- "produces" --> PhysicalPlan["PhysicalPlan"]
  optimization_gate["Optimization Gate"] -- "gates" --> plan_and_optimize["Plan And Optimize"]
  plan_and_optimize["Plan And Optimize"] -- "routes to" --> execute_plan["Execute Plan"]
  PhysicalPlan["PhysicalPlan"] -- "feeds" --> execute_plan["Execute Plan"]
  execution_source["Execution Source"] -- "feeds" --> execute_plan["Execute Plan"]
  StorageTransaction["StorageTransaction"] -- "feeds" --> execute_plan["Execute Plan"]
  execute_plan["Execute Plan"] -- "produces" --> DataChunk["DataChunk"]
  execution_gate["Execution Gate"] -- "gates" --> execute_plan["Execute Plan"]
  execute_plan["Execute Plan"] -- "routes to" --> commit_or_return["Commit Or Return Results"]
  DataChunk["DataChunk"] -- "feeds" --> commit_or_return["Commit Or Return Results"]
  StorageTransaction["StorageTransaction"] -- "feeds" --> commit_or_return["Commit Or Return Results"]
  commit_or_return["Commit Or Return Results"] -- "produces" --> query_result["query_result"]
  commit_or_return["Commit Or Return Results"] -- "produces" --> committed["committed"]
  storage_commit_gate["Storage Commit Gate"] -- "gates" --> commit_or_return["Commit Or Return Results"]
  ExtensionSpec["ExtensionSpec"] -- "feeds" --> load_extension["Load Extension"]
  extension_tree["Extension Tree"] -- "feeds" --> load_extension["Load Extension"]
  load_extension["Load Extension"] -- "produces" --> extension_loaded["extension_loaded"]
  extension_load_gate["Extension Load Gate"] -- "gates" --> load_extension["Load Extension"]
  load_extension["Load Extension"] -- "routes to" --> parse_and_bind["Parse And Bind"]

Systems

SQL Front Door

Accepts SQL from clients and turns it into parsed and bound statements.

Architecture
embedded database front end
Lifecycle
SQLRequest -> parser -> binder -> LogicalPlan
Boundary
A parsed query is not executable until binding and planning succeed.

Planner And Optimizer

Builds logical plans and rewrites them into better executable forms.

Architecture
query planner and optimizer
Lifecycle
LogicalPlan -> optimized plan -> PhysicalPlan
Boundary
Optimization must not change query meaning.

Vectorized Execution Engine

Executes physical operator pipelines over data chunks.

Architecture
vectorized analytical execution
Lifecycle
PhysicalPlan -> operator pipeline -> DataChunk
Boundary
Execution depends on valid plan, memory, transaction, and storage state.

Storage And Transaction Layer

Persists data, scans tables, and coordinates transaction boundaries.

Architecture
embedded storage manager
Lifecycle
scan/write request -> storage state -> commit or rollback
Boundary
Storage behavior is valid only within transaction rules.

Extension System

Adds optional capabilities and file/function integrations through extension modules.

Architecture
extension framework
Lifecycle
extension spec -> load gate -> functions/operators
Boundary
Extensions expand behavior but must respect engine compatibility and load policy.

Quality And Compatibility Loop

Exercises SQL, storage, extension, and compatibility scenarios.

Architecture
test matrix
Lifecycle
source change -> test cases -> release confidence
Boundary
This report does not replace upstream CI or benchmark review.

Review Questions