Code Review: sqlmodelgen
I've reviewed the entire codebase thoroughly. Here's my assessment with areas for improvement organized by priority and category:
Overall Architecture
The project follows a sensible pipeline architecture:
1. Input sources (SQL DDL, SQLite, PostgreSQL, MySQL) → Intermediate Representation (IR) → Code IR → AST → Generated Python code
This is a solid design pattern for code generation. The separation of concerns is generally good.
---
Technical Debt & Improvement Opportunities
1. Duplicated Code: ContraintsData class (High Priority)
There are two nearly identical ContraintsData classes:
- src/sqlmodelgen/ir/postgres/postgres_collect.py:16-34
- src/sqlmodelgen/ir/query/__init__.py:17-35
The only difference is a typo in the Postgres version (foreingn_keys vs foreign_keys). This is a clear DRY violation.
Recommendation: Consolidate into a single class in ir/query/__init__.py and refactor postgres_collect.py to use it (similar to how MySQL does). This would also fix the typo.
---
2. Dead Code (Medium Priority)
In src/sqlmodelgen/codegen/convert_data_type.py:89-92:
cod = """if data_type in INT_TYPES:
	result = 'int'
elif data_type in FLOAT_TYPES:
	result = 'float'"""
This cod variable is unused dead code that should be removed.
---
3. SQL Injection Vulnerabilities (High Priority - Security)
Several places use f-strings to construct SQL queries:
- postgres_collect.py:70: tables_query()
- postgres_collect.py:95: cols_query()
- postgres_collect.py:108-122: collect_uniques()
- postgres_collect.py:137-151: collect_primary_keys()
- postgres_collect.py:172-193: collect_foreign_keys()
- mysql/__init__.py:61-74, 85-92, 102-119, 139-150, 167-189: Multiple queries
While these are reading from information_schema (not user tables), the schema_name and dbname parameters come from user input and could be exploited.
Recommendation: Use parameterized queries instead of string interpolation:
# Instead of:
f"WHERE TABLE_SCHEMA = '{schema_name}'"
# Use:
cursor.execute("WHERE TABLE_SCHEMA = %s", (schema_name,))
---
4. Inconsistent Naming Convention for Relationships (Medium Priority)
In build_rels.py:77, the gen_m2o_candidate_names function adds an "s" for pluralization:
var_name = vassal_table_name + 's'
This produces awkward names like athletess (shown in README). Consider using a proper pluralization library like inflect or at minimum handling the case where the name already ends in 's'.
---
5. Resource Management: Missing Context Managers (Medium Priority)
Database connections in postgres_collect.py, mysql/__init__.py, and sqlite_parse.py don't use context managers (with statements). While connections are closed manually, this isn't exception-safe.
Recommendation:
# Instead of:
conn = psycopg.connect(...)
# ...
conn.close()
# Use:
with psycopg.connect(...) as conn:
    # ...
---
6. Missing Type Return Annotation (Low Priority)
gen_code_from_mysql in sqlmodelgen.py:52 lacks a return type annotation while all other gen_code_from_* functions have -> str.
---
7. Incomplete Feature: Multi-column Unique Constraints (Medium Priority)
There's a TODO in build_table_args.py:7:
# TODO: still no code to generate unique for multiple columns
The code only generates UniqueConstraint for single columns, not composite unique constraints.
---
8. Performance TODO in code_ir.py (Low Priority)
Lines 76-82 have a TODO about potential performance issues with linear iteration over attribute names:
# TODO: this maybe at a point for performance reasons there could be a
# set or some other data strcture to help the performances
For very large schemas this could become noticeable, but likely not urgent for typical use cases.
---
9. Missing Error Handling (Medium Priority)
- ir_parse.py:36: iter_ctparseds yields ctparsed even when it's None (when elem.get('CreateTable') returns None)
- No handling for malformed SQL that sqloxide might partially parse
- No validation that foreign key targets exist in the schema
Recommendation: Add validation and meaningful error messages for edge cases.
---
10. Inconsistent Indentation Style (Low Priority)
The codebase mixes tabs and spaces. For example, ir.py uses tabs while code_ir.py uses spaces. PEP 8 recommends 4 spaces consistently.
---
11. Incomplete SQLite Support (Medium Priority)
In sqlite_parse.py:102:
unique=False, # TODO: detect this constraint,
SQLite unique constraints aren't being detected, unlike PostgreSQL and MySQL.
---
12. Test Utility Improvement (Low Priority)
The collect_code_info helper in tests/helpers/helpers.py is quite complex. Consider if ast.dump() comparison or structured AST comparison libraries would simplify testing.
---
13. Missing __all__ exports (Low Priority)
The __init__.py conditionally imports functions but doesn't define __all__, which could lead to unexpected behavior with from sqlmodelgen import *.
---
14. Typo in postgres_collect.py (Low Priority)
Line 17: foreingn_keys should be foreign_keys.
---
Positive Observations
1. Clean IR design - The intermediate representation (SchemaIR, TableIR, ColIR) is well-structured
2. Good test coverage - Tests cover multiple scenarios including edge cases
3. Smart use of AST - Using Python's AST for code generation is elegant and ensures valid syntax
4. Protocol-based design - QCollector protocol in ir/query/__init__.py is a nice abstraction
5. Optional dependencies - Clean handling of postgres/mysql as optional extras
---
Summary: Prioritized Action Items
| Priority | Issue | Effort |
|----------|-------|--------|
| High | Fix duplicate ContraintsData + typo | Small |
| High | Address SQL injection risks | Medium |
| Medium | Add context managers for DB connections | Small |
| Medium | Remove dead code (cod variable) | Trivial |
| Medium | Fix iter_ctparseds yielding None | Small |
| Medium | Implement SQLite unique constraint detection | Medium |
| Low | Add return type to gen_code_from_mysql | Trivial |
| Low | Improve pluralization logic | Small |
| Low | Consistent indentation | Medium |
Would you like me to elaborate on any of these points, or would you like to discuss which improvements to prioritize?
