Metadata-Version: 2.1
Name: libsqlglot
Version: 0.4.1
Summary: Python bindings for libsqlglot, a high-performance SQL parser, transpiler, and optimiser written in C++
Author: libsqlglot contributors
License: Apache-2.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: C++
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Compilers
Project-URL: Homepage, https://github.com/richarah/libsqlglot
Project-URL: Documentation, https://github.com/richarah/libsqlglot#readme
Project-URL: Repository, https://github.com/richarah/libsqlglot
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# libsqlglot

#### What this is

sqlglot, in C++. 45 dialects, 126× faster on benchmark SQL, 252× on the kind your ORM generates when nobody's looking. Performance gap scales with query complexity, see [Benchmarks](#benchmarks).

Supports stored procedures (PL/pgSQL, T-SQL, MySQL, PL/SQL): where sqlglot falls back to passthrough, libsqlglot parses them into the AST.

#### What it's for

Anywhere SQL parsing meets the hot path: proxies, sidecars, migration, linters, Git pre-commit hooks, CI/CD pipelines. Also a replacement for every regex that's pretending to be a parser.

#### Why this exists

Because life is too short to parse SQL in Python.

Inspired by the original [sqlglot](https://github.com/tobymao/sqlglot), which did the decade-long work of mapping 31+ SQL dialects into an elegant, universal AST. libsqlglot does the comparatively trivial work of compiling it: the algorithm was already O(n), the runtime wasn't.

## Contents

- [Functionality](#functionality)
- [Quickstart](#quickstart)
  - [C++](#c)
  - [Python](#python)
- [Differences from original sqlglot](#differences-from-original-sqlglot)
- [Building](#building)
- [Architecture](#architecture)
- [Benchmarks](#benchmarks)
- [Examples](#examples)
- [Supported SQL dialects](#supported-sql-dialects)
- [Contributing](#contributing)
- [Licence](#licence)

## Functionality

Transpiles SQL dialects via sqlglot AST. Full stored procedure support. Fail-fast errors with exact line and column. Python bindings available at 95-98% of C++ speed (`import libsqlglot as sqlglot` and go).

Handles the full SQL surface: SELECT, INSERT, UPDATE, DELETE, CREATE TABLE, ALTER TABLE, DROP TABLE, TRUNCATE, MERGE, plus stored procedures (CALL, RETURN, DECLARE, IF/ELSEIF/ELSE, WHILE, FOR loops). Also handles CTEs, window functions, subqueries, and various JOIN types.

Compatible with 45 dialects (see [Supported SQL dialects](#supported-sql-dialects) for the full list).

## Quickstart

### C++

```cpp
#include <libsqlglot/transpiler.h>

using namespace libsqlglot;

// Transpile between dialects (MySQL → PostgreSQL)
std::string output = Transpiler::transpile(
    "SELECT `id`, `name` FROM `users` LIMIT 10",
    Dialect::MySQL,
    Dialect::PostgreSQL
);
// Returns: SELECT "users"."id", "users"."name" FROM "users" LIMIT 10

// Parse SQL into AST
Arena arena;
auto stmt = Transpiler::parse(arena, "SELECT name FROM users WHERE age > 18");
// Returns: AST with SelectStmt node

// Optimise AST (column qualification, predicate pushdown, constant folding)
Transpiler::optimize(arena, stmt);
// Modifies AST: name → users.name, age → users.age

// Generate SQL from AST
std::string sql = Transpiler::generate(stmt, Dialect::PostgreSQL);
// Returns: SELECT users.name FROM users WHERE users.age > 18
```

See [Supported SQL dialects](#supported-sql-dialects) for all available `Dialect::` values.

### Python

Available on PyPI: `pip install libsqlglot`

**System requirements**: Linux with glibc 2.35+ (Ubuntu 22.04+, Debian 12+, RHEL 9+)

```python
import libsqlglot as sqlglot

# Transpile
sql = "SELECT `id`, `name` FROM `users` LIMIT 10"
result = sqlglot.transpile(sql, read="mysql", write="postgres")
result = sqlglot.transpile(sql, "mysql", "postgres") # Both styles work

# Parse
stmt = sqlglot.parse_one(sql) # single statement
stmts = sqlglot.parse("SELECT 1; SELECT 2") # multiple statements

# Generate
sql = stmt.sql() # default ANSI
sql = stmt.sql(dialect="postgres", pretty=True) # pretty-print
sql = stmt.sql(dialect="sqlserver") # TRUE → 1

# AST traverse
columns = stmt.find_all(sqlglot.ExprType.COLUMN)
tables = stmt.find_all(sqlglot.ExprType.TABLE_REF)
stmt.walk(lambda n: print(n.type))

# Optimise
optimized = sqlglot.optimize(stmt)

# Builder
stmt = (sqlglot.select(["id", "name"])
        .from_("users")
        .where("active = TRUE")
        .order_by("name")
        .limit(10))

# Diff
diff = sqlglot.diff("SELECT id FROM users", "SELECT user_id FROM users")
```

### Unix Pipes

libsqlglot can be used in Unix pipelines via the Python CLI:

```bash
# Basic transpilation
echo "SELECT \`id\` FROM \`users\`" | python3 -m libsqlglot -r mysql -w postgres
# Output: SELECT "id" FROM "users"

# Pipeline with grep
cat queries.sql | python3 -m libsqlglot -r mysql -w bigquery | grep "SELECT"

# Process multiple files
cat *.sql | python3 -m libsqlglot -r sqlserver -w postgres > output.sql

# Combine with other tools
find . -name "*.sql" -exec cat {} \; | python3 -m libsqlglot -r mysql -w postgres | wc -l
```

The CLI reads SQL from stdin and writes transpiled SQL to stdout, making it composable with standard Unix tools.

See [Supported SQL dialects](#supported-sql-dialects) for all available dialect names.

**Python API**: `parse()`, `parse_one()`, `generate()`, `transpile()`, `optimize()`, `diff()`, `.sql()`, `.find_all()`, `.walk()`, `select()` builder.

**Performance**: 95-98% of C++ speed. Overhead is ~125ns per call (10ns function call + 100ns string marshal + 15ns misc). On typical 2.5μs parse, this is 5% overhead. On complex 25μs parse, 0.5% overhead.

**Memory**: Thread-local arenas, no runtime dependencies.

## Differences from original sqlglot

| | libsqlglot | Python sqlglot |
|---|---|---|
| **Performance** | 32-242× faster (126× avg) | Baseline |
| **Stored procedures** | Support for PL/pgSQL, T-SQL, MySQL, PL/SQL | Limited (`exp.Command` passthrough) |
| **Error handling** | Fail-fast with precise errors (line, column, context) | Error recovery (IDE-friendly, slower) |
| **Memory** | Arena allocation (O(1) cleanup) | Garbage collection |
| **Optimiser** | Column qualification, predicate pushdown, constant folding, subquery elimination | Same + additional passes + full execution engine |
| **Codebase** | Header-only C++26 library | 50,000+ lines Python |
| **Keywords** | C++26 reflection: auto-generated from enum (300+ keywords, zero maintenance) | Manually maintained dictionaries |
| **Binary** | C++ library 14KB, Python wheel 1.5MB | N/A |
| **Dialects** | 45 SQL dialects (14 unique to libsqlglot, including ANSI) | 32 dialects (including PRQL, which libsqlglot doesn't support) |
| **SQL coverage** | Same as Python sqlglot | Full SQL support |
| **Dependencies** | None (no runtime deps) | None |

## Building

Requires C++26 (GCC 14+ with `-freflection`) and CMake 3.21+.

**C++26 features used:**
- **Keyword reflection** (`std::meta`): Auto-generates 300+ keyword mappings from `TokenType` enum at compile time. Zero maintenance, impossible to desync.
- **Dialect reflection**: Build-time code generation parses the `Dialect` enum and generates compile-time mappings (CMake → Python script → generated header). When GCC fixes the reflection bug, will switch to pure C++26 reflection.
- **ExprType bindings**: Python bindings auto-generated from the C++ `ExprType` enum (124 values). Run `python3 scripts/generate_expr_type_bindings.py > src/python/expr_type_bindings_generated.h` after modifying `expression.h`.
- **Advanced constexpr**: Perfect hash tables, compile-time string processing.

### Docker (Recommended)

The easiest way to build with GCC trunk + reflection support:

```bash
# Build the project (first build takes 30-45 min to compile GCC trunk)
docker compose -f docker/docker-compose.yml run --rm build

# Run tests
docker compose -f docker/docker-compose.yml run --rm test

# Build Python wheel
docker compose -f docker/docker-compose.yml run --rm wheel

# Development shell
docker compose -f docker/docker-compose.yml run --rm dev
```

See `docker/README.md` for full documentation.

### C++ library (native)

Requires GCC trunk built from source with `-freflection` support:

```bash
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
sudo cmake --install build
```

### Python package

```bash
pip install -e .
```

### Building with tests and benchmarks

```bash
cmake -B build -DCMAKE_BUILD_TYPE=Release \
               -DLIBSQLGLOT_BUILD_TESTS=ON \
               -DLIBSQLGLOT_BUILD_BENCHMARKS=ON
cmake --build build -j$(nproc)
ctest --test-dir build
```

**Compiled sizes** (stripped, `-O3`): C++ library 14KB, Python extension 2.0MB.

**Code quality**: Compiles with `-Wall -Wextra -Wpedantic -Werror`. No runtime dependencies. No RTTI. Passes 27,040 assertions across 361 test cases. Fuzz-tested with `libFuzzer` + `AddressSanitizer`.

### Advanced optimisations

**Profile-Guided Optimisation (PGO)**: For production deployments requiring maximum performance, enable PGO in 3 steps:

```bash
# Step 1: Build with profiling instrumentation
cmake -B build -DCMAKE_BUILD_TYPE=Release -DLIBSQLGLOT_PGO_GENERATE=ON
cmake --build build

# Step 2: Run with representative workload to collect profile data
./build/benchmarks/bench_transpiler  # or your own queries

# Step 3: Rebuild using profile data for optimisation (10-30% faster)
cmake -B build -DCMAKE_BUILD_TYPE=Release -DLIBSQLGLOT_PGO_USE=ON
cmake --build build
```

**Compiler optimisations enabled** (Release builds): Link-Time Optimisation (LTO), constant merging, symbol visibility optimisation.

## Architecture

Header-only design: you only pay for what you use. 19 header files, no `.cpp`. See `include/libsqlglot/` for the full layout. Core files: `parser.h` (4203 lines), `generator.h` (2149), `expression.h` (1385, 115 expression types). Entry point is `transpiler.h` (86 lines).

### Memory management

Arena allocation: all AST nodes allocated in contiguous chunks, freed together in O(1) time. String interning deduplicates identifiers. Tokenisation is zero-copy via `string_view`. Everything uses RAII, no manual `delete` calls.

### Grammar pipeline
Grammar definitions from multiple sources (ANTLR and normalised JSON specs) are unified into a canonical IR, then compiled into a cache-efficient LUT used by the runtime parser.

### SQL support

**Statements**: SELECT, INSERT, UPDATE, DELETE, MERGE, CREATE TABLE, ALTER TABLE, DROP TABLE, DROP INDEX, TRUNCATE, CREATE VIEW, DROP VIEW, CREATE SCHEMA, DROP SCHEMA

**Stored procedures**: CALL, RETURN, DECLARE, IF/ELSEIF/ELSE/END IF, WHILE/DO/END WHILE, FOR/IN/LOOP/END LOOP

**Clauses**: WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET, QUALIFY

**JOINs**: INNER, LEFT, RIGHT, FULL, CROSS, NATURAL, ASOF

**Advanced**: CTEs (WITH RECURSIVE), window functions (PARTITION BY, ORDER BY, ROWS/RANGE, named windows), subqueries, PIVOT/UNPIVOT

**Operators**: Arithmetic, comparison, logical, LIKE, ILIKE, IS NULL, BETWEEN, IN, EXISTS

**Expressions**: CASE WHEN, arrays, array indexing, set operations (UNION, INTERSECT, EXCEPT)

**Transactions**: BEGIN, COMMIT, ROLLBACK, SAVEPOINT

**Utilities**: SET, DESCRIBE, SHOW, EXPLAIN

## Testing

361 test cases, 27,040 assertions, all passing.

```bash
cd build
ctest --output-on-failure

# Run specific test suites
./tests/libsqlglot_tests "[parser]"
./tests/libsqlglot_tests "[security]"
./tests/libsqlglot_tests "[stored_procedures]"
./tests/libsqlglot_tests "[performance]"

# Run benchmarks
./benchmarks/bench_tokenizer
./benchmarks/bench_parser
./benchmarks/bench_transpiler
```

## Security

27,040 assertions covering SQL injection, buffer overflow, stack overflow (recursion depth at 256, adjustable via `Parser::kMaxRecursionDepth` in `parser.h`), memory corruption (arena prevents use-after-free and double-free), integer overflow, and encoding attacks (UTF-8 identifiers rejected, UTF-8 string literals accepted). All pass.

## Fuzzing

There is a `libFuzzer` target in `fuzzing/fuzz_parser.cpp`:

```bash
cd fuzzing
clang++ -fsanitize=fuzzer,address -std=c++26 -freflection -I../include fuzz_parser.cpp -o fuzz_parser
./fuzz_parser -max_len=10000 -timeout=10
```

## Compatibility testing

Test compatibility with Python sqlglot's test fixtures:

```bash
cd compat
python sqlglot_compat.py --sqlglot-path /path/to/sqlglot --limit 1000
```

## Errors

libsqlglot provides precise, actionable error messages with exact locations:

```cpp
Arena arena;
Parser parser(arena, "SELECT FROM users");  // Missing column list

// Throws: Line 1, column 8: Expected column list after SELECT (found: 'FROM')
```

All parse errors include line number, column number, the actual token that caused the error, and a human-readable explanation of what was expected. Fail-fast, no cascading secondary errors.

## Examples

### Dialect conversion

```cpp
// MySQL -> PostgreSQL
std::string mysql_sql = "SELECT `user_id`, `name` FROM `users` LIMIT 10";
std::string pg_sql = Transpiler::transpile(
    mysql_sql, Dialect::MySQL, Dialect::PostgreSQL
);
// Output: SELECT "users"."user_id", "users"."name" FROM "users" LIMIT 10
```

### Building AST by hand

```cpp
Arena arena;
auto stmt = arena.create<SelectStmt>();

// SELECT *
stmt->columns.push_back(arena.create<Star>());

// FROM users
stmt->from = arena.create<TableRef>("users");

// WHERE age > 18
auto age_col = arena.create<Column>("age");
auto threshold = arena.create<Literal>("18");
stmt->where = arena.create<BinaryOp>(ExprType::GT, age_col, threshold);

std::string sql = Generator::generate(stmt);
// "SELECT * FROM users WHERE age > 18"
```

### Optimisation

```cpp
Arena arena;
auto stmt = Transpiler::parse(arena,
    "SELECT name FROM users WHERE age > 18");

// Apply optimisation passes
Transpiler::optimize(arena, stmt);

// Optimisations include:
// - Column qualification (name -> users.name)
// - Predicate pushdown
// - Constant folding
// - Subquery elimination

std::string sql = Transpiler::generate(stmt);
// "SELECT users.name FROM users WHERE users.age > 18"
```

### Stored procedures (PL/pgSQL, T-SQL, MySQL, Oracle)

```cpp
Arena arena;
std::string plpgsql = R"(
    FOR i IN 1..100 LOOP
        IF i > 50 THEN
            RETURN i;
        END IF
    END LOOP
)";

auto stmt = Transpiler::parse(arena, plpgsql);
std::string output = Transpiler::generate(stmt, Dialect::PostgreSQL);
```

### Transpiling between dialects

```cpp
// Real-world analytics query: CTEs, window functions, multiple JOINs
std::string mysql_query = R"(
    WITH regional_sales AS (
        SELECT
            region,
            product_id,
            SUM(amount) as total_sales,
            COUNT(order_id) as order_count,
            ROW_NUMBER() OVER (PARTITION BY region ORDER BY SUM(amount) DESC) as sales_rank
        FROM orders
        WHERE order_date >= '2024-01-01'
        GROUP BY region, product_id
        HAVING SUM(amount) > 10000
    ),
    top_regions AS (
        SELECT region
        FROM regional_sales
        WHERE sales_rank <= 5
        GROUP BY region
        HAVING COUNT(*) > 3
    )
    SELECT
        r.region,
        r.product_id,
        p.product_name,
        r.total_sales,
        r.order_count,
        RANK() OVER (ORDER BY r.total_sales DESC) as overall_rank,
        LAG(r.total_sales) OVER (ORDER BY r.total_sales DESC) as prev_sales
    FROM regional_sales r
    INNER JOIN products p ON r.product_id = p.id
    INNER JOIN top_regions t ON r.region = t.region
    WHERE r.sales_rank <= 10
    ORDER BY r.total_sales DESC
    LIMIT 100
)";

// MySQL → BigQuery
std::string bigquery = Transpiler::transpile(
    mysql_query, Dialect::MySQL, Dialect::BigQuery
);
// Full semantic preservation: CTEs, window functions, JOINs all intact

// Round-trip: MySQL → BigQuery → PostgreSQL → Snowflake → MySQL
std::string step1 = Transpiler::transpile(mysql_query, Dialect::MySQL, Dialect::BigQuery);
std::string step2 = Transpiler::transpile(step1, Dialect::BigQuery, Dialect::PostgreSQL);
std::string step3 = Transpiler::transpile(step2, Dialect::PostgreSQL, Dialect::Snowflake);
std::string back_to_mysql = Transpiler::transpile(step3, Dialect::Snowflake, Dialect::MySQL);
// Query semantics preserved across 4 dialect conversions

// Simple transformations: PostgreSQL → SQL Server
std::string pg_query = "SELECT * FROM users WHERE active = TRUE LIMIT 10";
Arena arena;
Parser parser(arena, pg_query);
auto stmt = parser.parse_select();
std::string sql_server = Generator::generate(stmt, Dialect::SQLServer);
// Result: SELECT TOP 10 * FROM users WHERE active = 1
// LIMIT → TOP, TRUE → 1

// Multi-dialect pipeline: Parse once, generate for multiple targets
Arena shared_arena;
auto ast = Transpiler::parse(shared_arena, "SELECT name FROM users WHERE age > 18");

std::string postgres_sql = Generator::generate(ast, Dialect::PostgreSQL);
std::string mysql_sql = Generator::generate(ast, Dialect::MySQL);
std::string duckdb_sql = Generator::generate(ast, Dialect::DuckDB);
std::string snowflake_sql = Generator::generate(ast, Dialect::Snowflake);
// Single parse, multiple outputs - efficient for multi-target scenarios
```

## Benchmarks

Benchmarks run on x86-64 Linux with `-O3` optimisation. libsqlglot compared against pure Python sqlglot 30.0.1. Python bindings add ~125ns overhead (95-98% of C++ performance).

**What we measure:** Full parse + generate round-trip (SQL → AST → SQL). No optimisation applied in either implementation. Both produce identical output, proving identical work done. Apples-to-apples comparison of parser and generator performance.

**Measurement:** `std::chrono::high_resolution_clock` with 1000 iterations per query, averaged.

The 16 standard queries are sqlglot's benchmark. The 8 stress tests are ours, excluded from the average. They're here to show the scaling doesn't stop: 178.6× on benchmarks, 252× on these. What happens past 252× is left as an exercise for the reader.

### Standard benchmarks (16 queries)

| Query              | sqlglot (μs) | libsqlglot (μs) | Speedup  |
|--------------------|--------------|-----------------|----------|
| many_ctes          | 1,097.84     | 4.53            | 242.4×   |
| many_joins         | 1,038.03     | 5.81            | 178.7×   |
| nested_functions   | 892.91       | 6.27            | 142.4×   |
| nested_subqueries  | 495.97       | 2.79            | 177.8×   |
| many_unions        | 2,585.04     | 17.30           | 149.4×   |
| tpch               | 1,110.16     | 7.74            | 143.4×   |
| complex_where      | 575.69       | 4.27            | 134.8×   |
| many_windows       | 933.16       | 13.00           | 71.8×    |
| deep_arithmetic    | 388.10       | 2.86            | 135.7×   |
| many_columns       | 1,643.41     | 14.81           | 110.9×   |
| values             | 15,951.28    | 162.90          | 97.9×    |
| large_case         | 4,480.21     | 35.30           | 126.9×   |
| large_in           | 11,756.63    | 116.38          | 101.0×   |
| many_numbers       | 6,463.57     | 67.71           | 95.5×    |
| short              | 118.61       | 1.54            | 77.0×    |
| large_strings      | 146.92       | 4.54            | 32.4×    |

**Average: 126.1× faster** (range: 32.4× to 242.4×). A million queries: 29 seconds vs 52 minutes.

libsqlglot achieves this through:
- **Memory**: Arena allocation (O(1) cleanup), string interning (pointer equality), zero-copy tokenisation (`string_view`)
- **Algorithms**: Perfect hash keyword lookup (O(1)), branchless uppercase conversion (no branch misprediction)
- **Compiler**: LTO (whole-program optimisation), aggressive inlining, constant folding, C++23 `constexpr`
- **Cache**: Contiguous memory layout, spatial locality, no per-node `malloc`/`new` fragmentation

### Stress tests (8 queries, supplementary)

Designed to break parsers: 15-level nested CTEs, 35-level CASE expressions, 100+ WHERE conditions, 20-way joins. 1,186 lines of SQL, 51KB total.

| Query                    | sqlglot (μs) | libsqlglot (μs) | Speedup  |
|--------------------------|--------------|-----------------|----------|
| Correlated subqueries    | 14,400.86    | 60.60           | 237.6×   |
| Complex subqueries       | 18,414.37    | 73.18           | 251.6×   |
| Complex string functions | 15,609.81    | 71.42           | 218.6×   |
| Deep CTE nesting         | 17,214.85    | 82.36           | 209.0×   |
| Union chains             | 24,075.57    | 115.77          | 208.0×   |
| Complex WHERE clause     | 14,499.12    | 81.99           | 176.8×   |
| Deep CASE nesting        | 14,651.62    | 103.10          | 142.1×   |
| Multi-table joins        | 13,862.49    | 141.16          | 98.2×    |

**Average: 192.7× faster** (range: 98.2× to 251.6×). A million queries: 86 seconds vs 4.6 hours.

### Validation

```python
# Python sqlglot: 3,917 μs
parsed = sqlglot.parse_one("SELECT * FROM users WHERE age > 18")
output = parsed.sql()  # "SELECT * FROM users WHERE age > 18"
```

```cpp
// libsqlglot: 45 μs (87× faster)
Arena arena;
auto ast = Transpiler::parse(arena, "SELECT * FROM users WHERE age > 18");
auto output = Transpiler::generate(ast);  // "SELECT * FROM users WHERE age > 18"
```

See `benchmarks/bench_complete_comparison.py` to reproduce.

## Supported SQL dialects

**45 dialects** with full parse and generation support. Use `Dialect::Name` in C++ or `"name"` strings in Python (e.g., `read="mysql"`, `write="postgres"`).

**Note:** Dialect names are case-sensitive (e.g., `Dialect::PostgreSQL`, not `Dialect::postgresql`).

Each dialect includes proper identifier quoting, keyword handling, function name translation, and syntax transformations (e.g. LIMIT vs TOP vs FETCH FIRST).

If somehow a dialect is missing from the map, get_features() returns ANSI SQL defaults.

### Core dialects (explicit configurations)

| Database | Dialect | Supported features |
|----------|---------|---------------------------|
| ANSI SQL | ANSI | Standard SQL compliance: ANSI joins, standard aggregations, CTEs |
| BigQuery | BigQuery | STRUCT types, ARRAY literals `[1,2,3]`, SAFE_CAST, INT64/STRING types, nested field access |
| Calcite | Calcite | TABLESAMPLE BERNOULLI, Apache Calcite optimiser hints |
| ClickHouse | ClickHouse | Column-oriented syntax, MergeTree engines, SAMPLE BY, ARRAY JOIN |
| DB2 | DB2 | FETCH FIRST n ROWS ONLY, DB2 stored procedures, OLAP functions |
| Drill | Drill | Schema-free JSON queries, nested data access, FLATTEN |
| DuckDB | DuckDB | QUALIFY clause, ASOF joins, PIVOT/UNPIVOT, macro functions, LIST type |
| Hive | Hive | PARTITIONED BY, CLUSTERED BY, SerDe formats, Hive UDFs |
| MySQL | MySQL | Backtick identifiers, MySQL-specific functions, storage engines |
| Oracle | Oracle | CONNECT BY hierarchical queries, PRIOR, START WITH, DUAL table, PL/SQL blocks |
| Phoenix | Phoenix | HBase integration: SALT_BUCKETS, ARRAY_APPEND, UPSERT VALUES |
| Pinot | Pinot | Real-time OLAP: segment pruning, star-tree indexes, broker queries |
| PostgreSQL | PostgreSQL | RETURNING, ON CONFLICT, LATERAL joins, window functions, array types, JSONB |
| Presto | Presto | APPROX_DISTINCT, UNNEST, ROW types, lambda functions |
| Snowflake | Snowflake | FLATTEN for JSON, VARIANT type, TIME_TRAVEL, CLUSTER BY, RESULT_SCAN |
| Solr | Solr | score() relevance function, faceted search, Lucene query syntax |
| Spark | Spark | NULL-SAFE equality `<=>`, Hive metastore, broadcast hints, cache table |
| SQL Server | SQLServer | T-SQL syntax: TOP, IDENTITY, OUTPUT clause, EXEC, GO batches, temp tables `#` |
| SQLite | SQLite | Minimal SQL: no RIGHT JOIN, PRAGMA commands, autoincrement |
| Teradata | Teradata | MULTISET tables, BTEQ syntax, FastLoad/MultiLoad hints |

### Inherited dialect configurations

These dialects inherit features from a compatible base dialect and add specific extensions.

| Database | Dialect | Inherits from | Supported features |
|----------|---------|---------------|---------------------------|
| Athena | Athena | Presto | AWS Athena Presto syntax, S3 partitioning |
| CockroachDB | CockroachDB | PostgreSQL | UPSERT statement, distributed transactions, RETURNING clause |
| Databricks | Databricks | Spark | OPTIMIZE tables, ZORDER BY clustering, Delta Lake operations |
| Doris | Doris | MySQL | DUPLICATE KEY model, BUCKETS distribution, DISTRIBUTED BY HASH |
| Dremio | Dremio | Presto | CREATE REFLECTION for materialisation, data lakehouse queries |
| Druid | Druid | MySQL | TIME_FLOOR for time bucketing, approximate aggregations, roll-up |
| Dune | Dune | PostgreSQL | Analytics: bytearray_to_uint256, ETH address functions |
| Exasol | Exasol | PostgreSQL | DISTRIBUTE BY for parallel execution, Lua scripting UDFs |
| Fabric | Fabric | SQL Server | Three-part lakehouse.schema.table naming, OneLake integration |
| Greenplum | Greenplum | PostgreSQL | DISTRIBUTED BY/RANDOMLY, column/append-optimised tables, GPORCA optimiser |
| Impala | Impala | Hive | COMPUTE STATS, Kudu integration, CACHED IN pools |
| MariaDB | MariaDB | MySQL | MySQL-compatible with RETURNING, window functions, JSON functions |
| Materialize | Materialize | PostgreSQL | TAIL for streaming results, materialised views, temporal filters |
| Netezza | Netezza | PostgreSQL | DISTRIBUTE ON distribution keys, zone maps, statistics |
| Redshift | Redshift | PostgreSQL | DISTKEY distribution, SORTKEY ordering, SUPER type (JSON) |
| RisingWave | RisingWave | PostgreSQL | EMIT CHANGES for streaming, temporal joins, watermarks |
| SingleStore | SingleStore | MySQL | VECTOR type for embeddings, DOT_PRODUCT, columnstore/rowstore |
| Spark2 | Spark2 | Spark | Legacy Spark 2.x: CACHE TABLE, broadcast joins, RDD compatibility |
| StarRocks | StarRocks | MySQL | Vectorised execution, primary key model, materialised views, bitmap indexes |
| Tableau | Tableau | PostgreSQL | ZN() null-to-zero, Tableau calculation functions, RAWSQL passthrough |
| Teradata | Teradata | Teradata | MULTISET tables, BTEQ syntax, FastLoad/MultiLoad hints |
| TiDB | TiDB | MySQL | AUTO_RANDOM for distributed primary keys, MySQL compatibility, TiKV storage |
| TimescaleDB | TimescaleDB | PostgreSQL | time_bucket() for time-series, hypertables, continuous aggregates |
| Trino | Trino | Presto | Presto-compatible: UNNEST, lambda expressions, ROW types |
| Vertica | Vertica | PostgreSQL | CREATE PROJECTION for physical design, SEGMENTED BY HASH, columnar storage |
| YugabyteDB | YugabyteDB | PostgreSQL | SPLIT INTO n TABLETS, distributed SQL, PostgreSQL compatibility |

## Contributing

libsqlglot is currently a solo project. Bug reports, test cases, and dialect edge cases are welcome via GitHub issues. If you have a dialect you wish to see added, please open an issue or PR.

If a query parses incorrectly, or a dialect transformation that produces wrong output, please open an issue with the input SQL, source dialect, target dialect, expected output and any other pertinent details.

Pull requests are considered but there is no guarantee of merge. The codebase is intentionally small and opinionated.

### How about a Rust rewrite?

No.

## Licence

Apache 2.0
