Metadata-Version: 2.4
Name: sqlcodec
Version: 0.1.3
Summary: A Python tool for compressing/decompressing SQL DDL fewer tokens for LLMs.
Author-email: "TroBee.ONE" <noreply@github.com>
License: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown

# sqlcodec

`sqlcodec` is a Python utility designed to compress SQL DDL into a tokenized format optimized for Large Language Models (LLMs). It reduces the token count of schema definitions, allowing more context to be fit into the LLM's window while maintaining semantic clarity.

## Key Features
- **Regex Mapping**: Replaces common SQL keywords with short tokens (e.g., `CREATE TABLE` -> `~cr ~t`).
- **Dialect Support**: Specific mappings for SQL Server and Postgres.
- **Comment Wrapping**: Preserves comments in a minified-safe format (`~cml ... ~endcml`).
- **Whitespace Minification**: Collapses unnecessary spaces and newlines while preserving statement separators.
- **LLM Integration**: Includes a helper to generate system prompts for LLMs to interpret the compressed SQL.

## Benefits
- **Token Efficiency**: Can reduce DDL size by 40-60%.
- **Context Preservation**: Useful for RAG systems or LLM agents that need to "see" a large database schema.


# sqlcodec Usage Instructions

## 1. Compressing and Decompressing

### A. Compress a String Statement
```python
from sqlcodec import compress

sql_string = "CREATE TABLE Users (ID INT PRIMARY KEY);"
compressed = compress(sql_string)
print(compressed)
```

### B. Compress a SQL File
```python
from sqlcodec import compress

with open("input.sql", "r") as f:
    sql_data = f.read()

compressed = compress(sql_data)

with open("compressed.txt", "w") as f:
    f.write(compressed)
```

### C. Decompress a String Statement
```python
from sqlcodec import decompress

compressed_str = "~cr ~t Users (ID INT ~pk);"
original = decompress(compressed_str, dialect="sqlserver")
print(original)
```

### D. Decompress a Compressed File
```python
from sqlcodec import decompress

with open("compressed.txt", "r") as f:
    compressed_data = f.read()

# Tip: detect_dialect can help if you're unsure
original = decompress(compressed_data, dialect="sqlserver")

with open("reconstructed.sql", "w") as f:
    f.write(original)
```

### E. Get a System Prompt for an LLM
This generates the instructions the LLM needs to understand your compressed SQL.

```python
from sqlcodec import get_system_prompt

# 1. For SQL Server
ss_prompt = get_system_prompt(dialect="sqlserver")
print(ss_prompt)

# 2. For Postgres
pg_prompt = get_system_prompt(dialect="postgres")
print(pg_prompt)

# 3. For Standard SQL (Generic)
std_prompt = get_system_prompt()
print(std_prompt)
```
