Metadata-Version: 2.4
Name: sql-redis
Version: 0.1.1
Summary: SQL to Redis command translation utility
Project-URL: Homepage, https://github.com/redis/sql-redis
Project-URL: Repository, https://github.com/redis/sql-redis
Author-email: "Redis Inc." <applied.ai@redis.com>
License-Expression: MIT
Keywords: query-translation,redis,redis-client,sql
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: <3.14,>=3.9
Requires-Dist: redis>=5.0.0
Requires-Dist: sqlglot>=26.0.0
Description-Content-Type: text/markdown

# sql-redis

A proof-of-concept SQL-to-Redis translator that converts SQL SELECT statements into Redis `FT.SEARCH` and `FT.AGGREGATE` commands.

## Status

This is an **early POC** demonstrating feasibility, not a production-ready library. The goal is to explore design decisions and validate the approach before committing to a full implementation.

## Quick Example

```python
from redis import Redis
from sql_redis import Translator
from sql_redis.schema import SchemaRegistry
from sql_redis.executor import Executor

client = Redis()
registry = SchemaRegistry(client)
registry.load_all()  # Loads index schemas from Redis

executor = Executor(client, registry)

# Simple query
result = executor.execute("""
    SELECT title, price 
    FROM products 
    WHERE category = 'electronics' AND price < 500
    ORDER BY price ASC
    LIMIT 10
""")

for row in result.rows:
    print(row["title"], row["price"])

# Vector search with params
result = executor.execute("""
    SELECT title, vector_distance(embedding, :vec) AS score
    FROM products
    LIMIT 5
""", params={"vec": vector_bytes})
```

## Design Decisions

### Why SQL instead of a pandas-like Python DSL?

We considered several interface options:

| Approach | Example | Trade-offs |
|----------|---------|------------|
| **SQL** | `SELECT * FROM products WHERE price > 100` | Universal, well-understood, tooling exists |
| **Pandas-like** | `df[df.price > 100]` | Pythonic but limited to Python, no standard |
| **Builder pattern** | `query.select("*").where(price__gt=100)` | Type-safe but verbose, learning curve |

**We chose SQL because:**

1. **Universality** — SQL is the lingua franca of data. Developers, analysts, and tools all speak it.
2. **No new DSL to learn** — Users already know SQL. A pandas-like API requires learning our specific dialect.
3. **Tooling compatibility** — SQL strings can be generated by ORMs, query builders, or AI assistants.
4. **Clear mapping** — SQL semantics map reasonably well to RediSearch operations (SELECT→LOAD, WHERE→filter, GROUP BY→GROUPBY).

The downside is losing Python's type checking and IDE support, but for a query interface, the universality trade-off is worth it.

### Why sqlglot instead of writing a custom parser?

**Options considered:**
- **Custom parser** (regex, hand-rolled recursive descent)
- **PLY/Lark** (parser generators)
- **sqlglot** (production SQL parser)
- **sqlparse** (tokenizer, not a full parser)

**We chose sqlglot because:**

1. **Battle-tested** — Used in production by companies like Tobiko (SQLMesh). Handles edge cases we'd miss.
2. **Full AST** — Provides a complete abstract syntax tree, not just tokens. We can traverse and analyze queries properly.
3. **Dialect support** — Handles SQL variations. Users can write MySQL-style or PostgreSQL-style queries.
4. **Active maintenance** — Regular releases, responsive maintainers, good documentation.

The alternative was writing a custom parser, which would be error-prone and time-consuming for a POC. sqlglot lets us focus on the translation logic rather than parsing edge cases.

### Why schema-aware translation?

Redis field types determine query syntax:

| Field Type | Redis Syntax | Example |
|------------|--------------|---------|
| TEXT | `@field:term` | `@title:laptop` |
| NUMERIC | `@field:[min max]` | `@price:[100 500]` |
| TAG | `@field:{value}` | `@category:{books}` |

**Without schema knowledge**, we can't translate `category = 'books'` correctly — it could be `@category:books` (TEXT search) or `@category:{books}` (TAG exact match).

**Our approach:** The `SchemaRegistry` fetches index schemas via `FT.INFO` at startup. The translator uses this to generate correct syntax per field type.

This adds a Redis round-trip at initialization but ensures correct query generation.

### Architecture: Why this layered design?

```
SQL String
    ↓
┌─────────────────┐
│   SQLParser     │  Parse SQL → ParsedQuery dataclass
└────────┬────────┘
         ↓
┌─────────────────┐
│ SchemaRegistry  │  Load field types from Redis
└────────┬────────┘
         ↓
┌─────────────────┐
│    Analyzer     │  Classify conditions by field type
└────────┬────────┘
         ↓
┌─────────────────┐
│  QueryBuilder   │  Generate RediSearch syntax per type
└────────┬────────┘
         ↓
┌─────────────────┐
│   Translator    │  Orchestrate pipeline, build command
└────────┬────────┘
         ↓
┌─────────────────┐
│    Executor     │  Execute command, parse results
└────────┬────────┘
         ↓
QueryResult(rows, count)
```

**Why separate components?**

1. **Testability** — Each layer has focused unit tests. 100% coverage is achievable because responsibilities are clear.
2. **Single responsibility** — Parser doesn't know about Redis. QueryBuilder doesn't know about SQL. Changes are localized.
3. **Extensibility** — Adding a new field type (e.g., GEO) means updating Analyzer and QueryBuilder, not rewriting everything.

**Why not a single monolithic translator?**

Early prototypes combined parsing and translation. This led to:
- Tests that required Redis connections for simple SQL parsing tests
- Difficulty testing edge cases in isolation
- Tangled code that was hard to modify

The layered approach emerged from TDD — writing tests first revealed natural boundaries.

## What's Implemented

- [x] Basic SELECT with field selection
- [x] WHERE with TEXT, NUMERIC, TAG field types
- [x] Comparison operators: `=`, `!=`, `<`, `<=`, `>`, `>=`, `BETWEEN`, `IN`
- [x] Boolean operators: `AND`, `OR`
- [x] Aggregations: `COUNT`, `SUM`, `AVG`, `MIN`, `MAX`
- [x] `GROUP BY` with multiple aggregations
- [x] `ORDER BY` with ASC/DESC
- [x] `LIMIT` and `OFFSET` pagination
- [x] Computed fields: `price * 0.9 AS discounted`
- [x] Vector KNN search: `vector_distance(field, :param)`
- [x] Hybrid search (filters + vector)
- [x] Full-text search: `LIKE 'prefix%'` (prefix), `fulltext(field, 'terms')` function

## What's Not Implemented (Yet...)

- [ ] JOINs (Redis doesn't support cross-index joins)
- [ ] Subqueries
- [ ] HAVING clause
- [ ] DISTINCT
- [ ] GEO field queries
- [ ] Index creation from SQL (CREATE INDEX)

## Development

```bash
# Install dependencies
uv sync --all-extras

# Run tests (requires Docker for testcontainers)
uv run pytest

# Run with coverage
uv run pytest --cov=sql_redis --cov-report=html
```

## Testing Philosophy

This project uses strict TDD with 100% test coverage as a hard requirement. The approach:

1. **Write failing tests first** — Define expected behavior before implementation
2. **One test at a time** — Implement just enough to pass each test
3. **No untestable code** — If we can't test it, we don't write it
4. **Integration tests mirror raw Redis** — `test_sql_queries.py` verifies SQL produces same results as equivalent `FT.AGGREGATE` commands in `test_redis_queries.py`

Coverage is enforced in CI. Pragmas (`# pragma: no cover`) are forbidden — if code can't be tested, it shouldn't exist.

