Metadata-Version: 2.4
Name: semql-introspect
Version: 0.3.0
Summary: Bootstrap a semql Catalog from a live database — emits Python cube stubs from Information Schema with heuristic measure/dimension inference.
Author: Nikhil Pallamreddy
Author-email: Nikhil Pallamreddy <nikhil.pallamreddy+git@gmail.com>
License-Expression: BSD-3-Clause
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Dist: semql>=0.3.0,<0.4
Requires-Python: >=3.12
Project-URL: Homepage, https://github.com/npalladium/semql
Project-URL: Repository, https://github.com/npalladium/semql
Project-URL: Issues, https://github.com/npalladium/semql/issues
Description-Content-Type: text/markdown

# semql-introspect

Bootstrap a [semql](../semql) `Catalog` from a live database.

Reads Information Schema, emits Python `Cube` stubs with heuristic
measure / dimension / time-dimension inference and foreign-key derived
joins. Designed for greenfield adoption — a team with 200 tables can
generate the mechanical 80% of a catalog in seconds, then hand-edit
the heuristic guesses.

Use this for *cold-start* scaffolding. For ongoing drift detection
on a catalog you already have, see
[`semql-validate-db`](../semql-validate-db) — it probes a live
database against an authored catalog and surfaces missing tables /
columns / join predicates that the compiler can't see at build time.

## Install

```sh
pip install semql-introspect
```

## Usage

```python
import duckdb
from semql.model import Dialect
from semql_introspect import introspect_to_python

con = duckdb.connect("warehouse.db")
print(introspect_to_python(con, dialect=Dialect.DUCKDB, schema="main"))
```

Or via CLI:

```sh
semql-introspect --backend duckdb --schema main --conn "warehouse.db"
```

## Heuristics

- Numeric columns named `amount` / `price` / `revenue` / `cost` / `total`
  / `value` / `qty` / `quantity` / `count` → `Measure(agg="sum")`.
- Columns ending in `_id` → `Measure(agg="count_distinct")` (the table's
  cardinality is usually interesting).
- `date` / `timestamp` columns → `TimeDimension`.
- Foreign keys → `Join(relationship="many_to_one")` plus the foreign-side
  `Dimension(foreign_key=...)`.
- Everything else → `Dimension` typed by the column's SQL type.

Heuristic guesses get a `# TODO: review` comment so the diff makes the
inference choices reviewable.
