Metadata-Version: 2.4
Name: sql-rewriter
Version: 0.1.2
Summary: ANTLR4-based SQL rewriting tool for permission management in LLM-generated SQL
Author-email: wangyang <wangyang377@ustc.edu>
Maintainer-email: wangyang <wangyang377@ustc.edu>
License: MIT
Project-URL: Homepage, https://github.com/wangyang377/sql-rewriter
Project-URL: Documentation, https://github.com/wangyang377/sql-rewriter#readme
Project-URL: Repository, https://github.com/wangyang377/sql-rewriter
Project-URL: Issues, https://github.com/wangyang377/sql-rewriter/issues
Keywords: sql,rewriter,llm,antlr4,sql-parser,sql-rewrite,hive
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Database
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: antlr4-python3-runtime<5.0.0,>=4.13.1
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Dynamic: license-file

# SQL Rewriter - ANTLR4-based SQL Rewriting Tool

A Python package for SQL rewriting based on ANTLR4 grammar parsing. Currently provides the `add_where_condition` function, designed for permission management in LLM-generated SQL. Uses syntax parsing instead of regex matching, making it easier to handle various SQL formats generated by large language models.

## How It Works

Uses ANTLR4 to parse SQL statements into syntax trees, then traverses the tree using the Visitor pattern to locate target table queries and intelligently add or merge WHERE conditions. If the original SQL already has a WHERE clause, it wraps the existing condition in parentheses before adding the new permission condition with AND (to prevent LLM SQL injection).

## Installation

```bash
pip install sql-rewriter
```

Or install from source (if you want to modify the code):

```bash
git clone https://github.com/wangyang377/sql-rewriter.git
cd sql-rewriter
./scripts/generate_parser.sh  # Requires ANTLR4, see Development section below
pip install -e .
```

## Usage

### Basic Usage

```python
from sql_rewriter import add_where_condition

# Add condition when there's no WHERE clause
sql = "SELECT * FROM users;"
new_sql = add_where_condition(sql, "age > 18", "users")
# Result: SELECT * FROM users WHERE age > 18;

# Append condition when WHERE clause already exists (wraps existing condition in parentheses)
sql = "SELECT * FROM users WHERE age > 18;"
new_sql = add_where_condition(sql, "status = 'active'", "users")
# Result: SELECT * FROM users WHERE (age > 18) AND status = 'active';

# JOIN queries - add condition only for specific table
sql = "SELECT * FROM users JOIN orders ON users.id = orders.user_id;"
new_sql = add_where_condition(sql, "users.status = 'active'", "users")
# Result: SELECT * FROM users JOIN orders ON users.id = orders.user_id WHERE users.status = 'active';

# Nested queries work too - precisely locates target table
sql = "SELECT * FROM orders WHERE status = 'pending' AND EXISTS (SELECT 1 FROM users WHERE users.id = orders.user_id);"
new_sql = add_where_condition(sql, "users.status = 'active'", "users")
# Result: SELECT * FROM orders WHERE status = 'pending' AND EXISTS (SELECT 1 FROM users WHERE users.id = orders.user_id AND users.status = 'active');
```

### API Reference

```python
add_where_condition(sql_text, new_condition, table_name=None)
```

**Parameters:**
- `sql_text`: Original SQL statement
- `new_condition`: WHERE condition to add (without the WHERE keyword)
- `table_name`: Target table name. Condition is only added if the FROM clause contains this table. If `None`, no processing is performed

**Returns:**
- Modified SQL statement (string)

**Raises:**
- `ValueError`: If SQL parsing fails

## Development

If you clone the project from Git, you need to generate ANTLR parser code first:

```bash
# Install ANTLR4 (macOS)
brew install antlr

# Linux (Ubuntu/Debian)
sudo apt-get install antlr4

# Then generate code
./scripts/generate_parser.sh
```

Run tests:

```bash
cd tests
python test_parser.py
```
