# Security Guide

Complete guide to security features and best practices in sqlo.

## Security Features

sqlo implements multiple layers of security to prevent SQL injection attacks:

1. **Automatic parameter binding** for all values
2. **Mandatory identifier validation** for table and column names
3. **UPDATE/DELETE safety checks** to prevent accidental mass operations
4. **Type-safe Raw() expressions** with clear documentation
5. **Empty list handling** for safe WHERE IN operations

---

## Identifier Validation

### Automatic Validation

All table and column names are automatically validated to prevent SQL injection:

```python
from sqlo import Q

# ✅ Valid identifiers are allowed
Q.select("id", "name").from_("users").build()
Q.select("*").from_("mydb.users").build()  # schema.table format

# ❌ Invalid identifiers are rejected
Q.select("*").from_("users; DROP TABLE users;").build()
# Raises: ValueError: Invalid identifier 'users; DROP TABLE users;'
```

### What's Allowed

Valid identifiers must:
- Start with a letter or underscore
- Contain only letters, numbers, underscores, and dots
- Not contain spaces, special characters, or SQL keywords in vulnerable positions

**Examples:**
- ✅ `users`
- ✅ `user_accounts`
- ✅ `mydb.users` (schema.table)
- ✅ `users.id` (table.column)
- ❌ `users; DROP TABLE` (semicolon)
- ❌ `users--comment` (SQL comment)
- ❌ `users /*comment*/` (SQL comment)
- ❌ `users OR 1=1` (SQL injection attempt)

### Table Aliases

Table aliases with spaces are supported:

```python
# ✅ Allowed - validates "orders" part
Q.select("*").from_("users u").join("orders o", "u.id = o.user_id").build()
```

---

## Parameter Binding

### Automatic Protection

All **values** are automatically parameterized to prevent SQL injection:

```python
# ✅ SAFE - value is parameterized
malicious_input = "admin' OR '1'='1"
query = Q.select("*").from_("users").where("email", malicious_input)
sql, params = query.build()
# SQL: SELECT * FROM `users` WHERE `email` = %s
# Params: ("admin' OR '1'='1",)  # Treated as literal string
```

The malicious SQL is treated as a **string value**, not as executable SQL code.

---

## Raw SQL Expressions

### When to Use Raw()

Use `Raw()` for SQL expressions that cannot be represented as identifiers or values:

```python
from sqlo import Raw

# ✅ SQL functions
Q.select(Raw("COUNT(*)")).from_("users")
Q.select(Raw("CONCAT(first_name, ' ', last_name)")).from_("users")

# ✅ SQL expressions in WHERE
Q.select("*").from_("users").where(Raw("created_at > NOW() - INTERVAL 7 DAY"))

# ✅ Raw with parameters (still safe!)
Q.select("*").from_("users").where(Raw("age > %s", [18]))
```

### Safety Warning

> [!CAUTION]
> **Never use user input directly in Raw()** - this bypasses all security protections!

```python
# ❌ DANGEROUS - SQL injection vulnerability!
user_input = request.args.get("condition")
Q.select("*").from_("users").where(Raw(user_input))  # NEVER DO THIS!

# ✅ SAFE - validate/whitelist first, or use parameterized queries
allowed_fields = {"name", "email", "age"}
if field in allowed_fields:
    Q.select("*").from_("users").where(field, user_value)
```

### Type Checking

`Raw()` requires a string argument:

```python
from sqlo import Raw

# ✅ Valid
Raw("COUNT(*)")

# ❌ Invalid - raises TypeError
Raw(123)  # TypeError: Raw SQL must be a string
```

---

## UPDATE/DELETE Safety

### Mandatory WHERE Clause

To prevent accidental mass operations, UPDATE and DELETE require either:
1. A WHERE clause, or
2. Explicit `.allow_all_rows()` call

```python
# ❌ Raises ValueError - no WHERE clause
Q.update("users").set({"active": False}).build()
# ValueError: UPDATE without WHERE clause would affect all rows

# ✅ With WHERE clause
Q.update("users").set({"active": False}).where("id", 123).build()

# ✅ Explicit confirmation for mass operation
Q.update("users").set({"active": False}).allow_all_rows().build()
```

### The `.allow_all_rows()` Method

Use this method when you intentionally want to affect all rows:

```python
from sqlo import Q

# Mass update
Q.update("settings").set({"migrated": True}).allow_all_rows().build()

# Mass delete
Q.delete_from("temp_logs").allow_all_rows().build()

# With ORDER BY and LIMIT (still requires allow_all_rows without WHERE)
Q.delete_from("logs").order_by("-created_at").limit(1000).allow_all_rows().build()
```

---

## Empty List Handling

### WHERE IN with Empty Lists

Empty lists in `WHERE IN` clauses are automatically handled:

```python
# Empty IN list generates FALSE
sql, params = Q.select("*").from_("users").where_in("id", []).build()
# SQL: SELECT * FROM `users` WHERE FALSE
# Params: ()

# Empty NOT IN list generates TRUE
sql, params = Q.select("*").from_("users").where_not_in("id", []).build()
# SQL: SELECT * FROM `users` WHERE TRUE
# Params: ()
```

This ensures syntactically correct SQL even with dynamic lists.

---

## NULL Value Handling

### Correct NULL Checks

Use dedicated methods for NULL checks:

```python
from sqlo import Q, Condition

# ✅ Correct ways to check for NULL
Q.select("*").from_("users").where_null("email")  # Simplest
Q.select("*").from_("users").where(Condition.null("email"))  # Type-safe
Q.select("*").from_("users").where(Condition("email", operator="IS NULL"))

# ❌ WRONG - this checks for the STRING 'NULL', not SQL NULL
Q.select("*").from_("users").where("email", "NULL")
Q.select("*").from_("users").where(Condition("email", "NULL", "IS"))
```

### NULL in INSERT/UPDATE

NULL values in data dictionaries are properly handled:

```python
# ✅ NULL values work correctly
Q.insert_into("users").values({"name": "Alice", "email": None})
# email will be set to NULL

Q.update("users").set({"email": None}).where("id", 123)
# email will be set to NULL
```

---

## Best Practices

### 1. Always Validate Dynamic Identifiers

When allowing users to specify table or column names:

```python
from sqlo import Q

ALLOWED_TABLES = {"users", "orders", "products"}
ALLOWED_COLUMNS = {"id", "name", "email", "created_at"}

def dynamic_query(table: str, columns: list[str]):
    # Validate inputs
    if table not in ALLOWED_TABLES:
        raise ValueError(f"Table '{table}' not allowed")
    
    if not all(col in ALLOWED_COLUMNS for col in columns):
        raise ValueError("Invalid column name")
    
    # Safe to use
    return Q.select(*columns).from_(table)
```

### 2. Use Parameterized Queries for Values

Never concatenate user input into SQL strings:

```python
# ❌ DANGEROUS
user_email = request.form.get("email")
Q.select("*").from_("users").where(Raw(f"email = '{user_email}'"))

# ✅ SAFE
Q.select("*").from_("users").where("email", user_email)
```

### 3. Use Dedicated Methods

Prefer dedicated methods over generic ones:

```python
# Instead of Raw() or complex Conditions
Q.select("*").from_("users").where_null("deleted_at")  # Clear intent
Q.select("*").from_("users").where_in("status", ["active", "pending"])
Q.update("users").set({"active": False}).where("id", 123)
```

### 4. Verify Before Mass Operations

Always double-check before calling `.allow_all_rows()`:

```python
def archive_old_data(cutoff_date: str):
    """Archive data older than cutoff_date."""
    # Add confirmation in your application layer
    query = (
        Q.update("orders")
        .set({"archived": True})
        .where("created_at <", cutoff_date)
    )
    # Only use allow_all_rows() if truly necessary
    return query
```

### 5. Use Transactions for Related Operations

When performing multiple related operations:

```python
# Use database-level transactions
# with conn.transaction():
#     Q.update("users").set({"status": "deleted"}).where("id", user_id).execute()
#     Q.delete_from("sessions").where("user_id", user_id).execute()
```

---

## Security Checklist

When building queries with sqlo:

- ✅ Table and column names are validated automatically
- ✅ User values are parameterized automatically
- ✅ `Raw()` is only used with trusted, hardcoded SQL
- ✅ UPDATE and DELETE have WHERE clauses or explicit `.allow_all_rows()`
- ✅ NULL checks use `.where_null()` or `Condition.null()`
- ✅ Dynamic identifiers (table/column names) are whitelisted
- ✅ Empty lists in WHERE IN are handled correctly
- ✅ Type checking is enabled (`Raw()` rejects non-strings)

---

## Common Vulnerabilities and Prevention

### 1. Table Name Injection

**Vulnerability:**
```python
# ❌ Old approach - vulnerable
user_table = request.args.get("table")
Q.select("*").from_(user_table)  # Could be "users; DROP TABLE users;--"
```

**Protection:**
```python
# ✅ Now automatically blocked
Q.select("*").from_("users; DROP TABLE users;--")
# Raises: ValueError: Invalid identifier
```

### 2. Column Name Injection

**Protection:**
```python
# Automatically validates all column names
Q.select("id; DROP TABLE users;--").from_("users")
# Raises: ValueError: Invalid identifier
```

### 3. WHERE Clause Injection

**Protection:**
```python
# Values are automatically parameterized
malicious = "1' OR '1'='1"
Q.select("*").from_("users").where("name", malicious)
# Safe - malicious string is treated as a literal value
```

---

## Migration Guide

If you're upgrading from an older version:

### Breaking Changes

1. **Identifier validation is now mandatory**
   ```python
   # Before: This might have worked
   Q.select("COUNT(*)").from_("users")
   
   # Now: Use Raw() for expressions
   Q.select(Raw("COUNT(*)")).from_("users")
   ```

2. **UPDATE/DELETE require WHERE or allow_all_rows()**
   ```python
   # Before: This would work (dangerous!)
   Q.update("users").set({"active": False}).build()
   
   # Now: Add WHERE or call allow_all_rows()
   Q.update("users").set({"active": False}).allow_all_rows().build()
   ```

3. **Empty lists generate FALSE/TRUE**
   ```python
   # Before: Might generate invalid SQL like "WHERE id IN ()"
   Q.select("*").from_("users").where_in("id", [])
   
   # Now: Generates valid SQL "WHERE FALSE"
   ```

---

## Additional Resources

- [OWASP SQL Injection Prevention](https://cheatsheetseries.owasp.org/cheatsheets/SQL_Injection_Prevention_Cheat_Sheet.html)
- [Condition Objects](conditions.md) - Type-safe query conditions
- [Expressions & Functions](expressions.md) - Using Raw() safely
