Metadata-Version: 2.4
Name: sql-code-graph
Version: 0.3.0
Summary: SQL code graph analyzer and lineage tracer
Project-URL: Homepage, https://github.com/Warhorze/sql-code-graph
Project-URL: Repository, https://github.com/Warhorze/sql-code-graph
Project-URL: Issues, https://github.com/Warhorze/sql-code-graph/issues
Project-URL: Changelog, https://github.com/Warhorze/sql-code-graph/blob/master/CHANGELOG.md
Author-email: wesley <rademakerwesley@gmail.com>
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Requires-Dist: kuzu==0.11.3
Requires-Dist: mcp<2.0,>=1.27.0
Requires-Dist: pathspec>=0.12.1
Requires-Dist: pydantic>=2.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: rich>=13.7.0
Requires-Dist: sqlglot==30.6.0
Requires-Dist: sqlglotc==30.6.0
Requires-Dist: typer>=0.9.0
Requires-Dist: watchdog>=3.0.0
Provides-Extra: dbt
Requires-Dist: dbt-core>=1.7; extra == 'dbt'
Provides-Extra: neo4j
Requires-Dist: neo4j>=5.15.0; extra == 'neo4j'
Provides-Extra: snowflake
Requires-Dist: acryl-datahub<0.15.0,>=0.14.0; extra == 'snowflake'
Description-Content-Type: text/markdown

# sql-code-graph

SQL lineage and dependency analysis as an MCP server for Claude Code.

Indexes a directory of `.sql` files into a graph database and exposes lineage
queries as MCP tools — so Claude can answer questions like *"what tables does
this view depend on?"* or *"where is `orders.customer_id` derived from?"*
without reading every file.

## Quick start

Choose one:

**Permanent install** (recommended):
```bash
uv tool install sql-code-graph    # Fast, managed, no isolation needed
sqlcg install                     # Register MCP server in Claude Code
```

**One-shot try** (cold cache warning):
```bash
uvx sql-code-graph                # First run is slow (downloads deps)
                                  # Subsequent runs use cache, ~1s startup
```

Restart Claude Code, then inside your project ask:

```
Index my SQL files at ./sql --dialect snowflake
```

That's it. The MCP tools are now available to Claude in every conversation
for that project.

### Workflow (3 steps)

1. **Initialize**: `sqlcg db init`
2. **Index**: `sqlcg index ./sql --dialect snowflake`
3. **Keep fresh**: `sqlcg git install-hooks` (optional)

## Full setup (recommended)

```bash
# 1. Install
pip install sql-code-graph

# 2. Register with Claude Code (~/.claude/settings.json)
sqlcg install

# 3. Restart Claude Code

# 4. Index your SQL repo
# Only git-tracked files are indexed — build artefacts, node_modules,
# and .venv are ignored automatically.
sqlcg db init
sqlcg index ./sql --dialect snowflake   # or: bigquery, postgres, ansi

# 5. (Optional) Keep the graph fresh on branch switches
cd /your/sql/repo
sqlcg git install-hooks
```

Step 5 installs a `post-checkout` git hook that re-indexes automatically
whenever you switch branches. Without it the graph may be stale after a
`git checkout` until you re-run `sqlcg index` manually.

## Dialect config

To avoid passing `--dialect` every time, create `.sqlcg.toml` in your repo root:

```toml
[sqlcg]
dialect = "snowflake"   # snowflake | bigquery | postgres | ansi
```

The git hook and `sqlcg index --dialect auto` both read this file.

## Add to your project CLAUDE.md (recommended)

Adding a short note to your project's `CLAUDE.md` helps Claude know the tools
are available and when to use them:

```markdown
## SQL lineage
This project uses sql-code-graph. MCP tools are available:
- `db_info` — check graph health and parse quality before running lineage queries
- `index_repo` — index or re-index a directory of SQL files
- `find_table_usages` — find all queries that read a table
- `trace_column_lineage` — trace where a column's value comes from
- `get_upstream_dependencies` / `get_downstream_dependencies` — dependency chains
- `search_sql_pattern` — full-text search across all indexed SQL
- `execute_cypher` — raw graph query for advanced analysis
```

The MCP server works without this — Claude can discover the tools on its own —
but the CLAUDE.md snippet ensures they get used proactively.

## Parse quality

After indexing, `sqlcg gain` shows a **parse quality breakdown** that tells you how
much column-level lineage was extracted:

| Quality | Meaning | Tools affected |
|---|---|---|
| `FULL` | Column-level lineage extracted | All tools work |
| `TABLE_ONLY` | Table edges only — no column lineage | `trace_column_lineage`, `get_*_dependencies` return empty |
| `SCRIPTING_FALLBACK` | sqlglot fell back to raw command node | Partial table edges; column lineage unavailable |
| `FAILED` | File failed to parse entirely | File invisible to all queries |

Quality is shown per-file after `sqlcg index` and in `sqlcg gain` Section F.
`list_dialects_and_repos()` warns when scripting fallback exceeds 20% of queries.

**What causes TABLE_ONLY?** Mostly `SELECT *` — sqlglot can't trace column names through
a wildcard. Alias those selects to get FULL coverage.

**What causes SCRIPTING_FALLBACK?** Snowflake `$$` procedure bodies or `BEGIN…END` scripting
blocks. sqlglot parses the block as a raw `Command` node and extracts DML via tokenizer
fallback. Table edges are usually correct; column edges are not.

Check `sqlcg db info` for the parsing mode distribution across all indexed queries.

## MCP tools reference

| Tool | Description |
|------|-------------|
| `index_repo(repo_path, dialect)` | Index a directory of SQL files |
| `trace_column_lineage(table_col)` | Trace column lineage upstream |
| `find_table_usages(table_name)` | Find all queries that read a table |
| `get_upstream_dependencies(table_col)` | Full upstream dependency chain |
| `get_downstream_dependencies(table_col)` | Full downstream dependency chain |
| `search_sql_pattern(query)` | Full-text search across indexed SQL |
| `list_dialects_and_repos()` | List indexed repos and dialects (catalogue) |
| `db_info()` | Graph health, node counts, parse quality breakdown, warnings |
| `execute_cypher(query)` | Raw Cypher query against the graph |

> **LLM agent tip**: call `db_info()` before lineage queries to check that
> `SqlColumn > 0` and `warnings` is empty. If `parse_quality["scripting_block"]`
> is high, column lineage will be limited for those files — use table-level tools
> (`find_table_usages`, `get_*_dependencies`) instead.

## CLI reference

Full option reference: [docs/cli.md](docs/cli.md)

```bash
sqlcg install                          # register MCP server in Claude Code
sqlcg db init                          # initialise graph database
sqlcg index <path> --dialect <d>       # index SQL files
sqlcg index <path> --dialect auto      # read dialect from .sqlcg.toml
sqlcg watch <path>                     # watch for file changes
sqlcg git install-hooks                # install post-checkout hook
sqlcg gain                             # show usage metrics
sqlcg report                           # generate FP/error report
sqlcg mcp start                        # start MCP server manually
sqlcg version                          # show installed version
```

## Supported dialects

`snowflake` · `bigquery` · `postgres` · `ansi` · `tsql` · `dbt` (via optional extra)

## Development

```bash
git clone https://github.com/Warhorze/sql-code-graph
cd sql-code-graph
uv sync --all-extras
uv run pytest tests/unit
```

## Issues

Bug reports and feature requests: [github.com/Warhorze/sql-code-graph/issues](https://github.com/Warhorze/sql-code-graph/issues)

Questions and discussion: [github.com/Warhorze/sql-code-graph/discussions](https://github.com/Warhorze/sql-code-graph/discussions)

## License

MIT
