Metadata-Version: 2.4
Name: sequelframe
Version: 0.5.0
Summary: SQL interface for pandas DataFrames using SQLite with modern Python support
Author: Sina Mirshahi
Author-email: Sina Mirshahi <sina7th@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/Sinamirshahi/sequelframe
Project-URL: Repository, https://github.com/Sinamirshahi/sequelframe
Project-URL: Issues, https://github.com/Sinamirshahi/sequelframe/issues
Keywords: pandas,sql,sqlite,dataframe,query
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.0.0
Provides-Extra: excel
Requires-Dist: openpyxl>=3.0.0; extra == "excel"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Dynamic: author
Dynamic: license-file

# sequelframe

A lightweight Python library to run SQL queries on pandas DataFrames using SQLite.

[![Python Version](https://img.shields.io/pypi/pyversions/sequelframe)](https://pypi.org/project/sequelframe/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Features

- Full SQL on any pandas DataFrame — SELECT, INSERT, UPDATE, DELETE, CTEs, window functions
- **Multi-table support** — load several DataFrames as separate tables and JOIN across them
- **In-memory mode** — skip the temp file entirely for faster, zero-footprint usage
- **Transaction control** — commit or roll back a group of writes atomically
- **Schema inspection** — list tables, inspect columns
- Parameterized queries for SQL injection protection
- File format support: CSV, Excel, JSON, Parquet, Feather
- Context manager support for automatic cleanup
- Full type hints

## Installation

```bash
pip install sequelframe
```

For Excel file support:
```bash
pip install sequelframe[excel]
```

## Quick Start

```python
import pandas as pd
from sequelframe import SequelFrame

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age':  [25, 30, 35],
    'city': ['New York', 'San Francisco', 'Los Angeles'],
})

with SequelFrame(df) as db:
    result = db.query('SELECT * FROM data WHERE age > 25')
    print(result)
```

## Usage

### Initialization

```python
from sequelframe import SequelFrame

# From a pandas DataFrame
db = SequelFrame(df)

# From a file (CSV, Excel, JSON, Parquet, Feather)
db = SequelFrame('data.csv')
db = SequelFrame('data.xlsx')
db = SequelFrame('data.parquet')

# Custom table name
db = SequelFrame(df, table_name='customers')

# In-memory database — faster, no temp file written to disk
db = SequelFrame(df, in_memory=True)

# Custom directory for the temp .sqlite file
db = SequelFrame(df, temp_dir='/tmp/mydb')

# Print every SELECT result automatically
db = SequelFrame(df, verbose=True)
```

### SELECT queries

```python
# Basic SELECT — returns a DataFrame
result = db.query('SELECT * FROM data WHERE age > 25')

# runsql() is the same but also handles write statements
result = db.runsql('SELECT name, age FROM data ORDER BY age DESC')

# Parameterized queries (recommended — prevents SQL injection)
result = db.query('SELECT * FROM data WHERE name = ?', ('Alice',))
result = db.query('SELECT * FROM data WHERE age > ? AND city = ?', (25, 'New York'))

# CTEs and window functions work out of the box
result = db.query('''
    WITH ranked AS (
        SELECT name, age,
               ROW_NUMBER() OVER (ORDER BY age DESC) AS rank
        FROM data
    )
    SELECT * FROM ranked WHERE rank <= 3
''')
```

### INSERT, UPDATE, DELETE

```python
db.runsql("INSERT INTO data (name, age, city) VALUES ('David', 28, 'Boston')")

db.runsql("UPDATE data SET age = 31 WHERE name = 'Bob'")

db.runsql("DELETE FROM data WHERE age < 25")

# Parameterized writes
db.runsql("INSERT INTO data (name, age, city) VALUES (?, ?, ?)", ('Eve', 27, 'Seattle'))
db.runsql("UPDATE data SET age = ? WHERE name = ?", (32, 'Alice'))
```

After any write, `db.df` is automatically refreshed so it stays in sync.

### Multiple tables and JOINs

Use `add_table()` to register additional DataFrames, then JOIN them in SQL:

```python
customers = pd.DataFrame({
    'customer_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
})
orders = pd.DataFrame({
    'order_id':   [101, 102, 103],
    'customer_id': [1, 1, 2],
    'amount':      [100, 150, 200],
})

with SequelFrame(customers, table_name='customers', in_memory=True) as db:
    db.add_table(orders, 'orders')   # chainable: db.add_table(...).add_table(...)

    result = db.query('''
        SELECT c.name,
               COUNT(o.order_id)  AS order_count,
               SUM(o.amount)      AS total
        FROM customers c
        LEFT JOIN orders o ON c.customer_id = o.customer_id
        GROUP BY c.customer_id, c.name
    ''')
    print(result)
```

### Transactions

Wrap multiple writes in a single atomic transaction:

```python
with SequelFrame(df, in_memory=True) as db:
    with db.transaction():
        db.runsql("INSERT INTO data VALUES ('Eve', 28, 'Miami')")
        db.runsql("UPDATE data SET age = age + 1 WHERE name = 'Alice'")
    # committed — db.df is refreshed automatically

    # Any exception inside rolls the whole block back:
    try:
        with db.transaction():
            db.runsql("DELETE FROM data")
            raise RuntimeError("oops")
    except RuntimeError:
        pass
    # table is still intact
```

### Multi-statement scripts

```python
db.execute_script('''
    ALTER TABLE data ADD COLUMN score REAL DEFAULT 0.0;
    UPDATE data SET score = age * 1.5;
''')
```

### Schema inspection

```python
# All tables in the database
db.tables()          # ['data', 'orders']

# Column info for a table
db.schema()          # primary table
db.schema('orders')  # specific table
# returns: cid | name | type | notnull | dflt_value | pk
```

### Utility methods

```python
db.show()           # all rows as a DataFrame
db.show(limit=10)   # first 10 rows

db.columns          # ['name', 'age', 'city']
db.shape            # (3, 3)
db.row_count        # live row count from the database

db.get_dataframe()  # copy of the current primary-table DataFrame
db.update_dataframe(new_df)  # replace primary table entirely
```

### Context manager (recommended)

```python
with SequelFrame('data.csv') as db:
    result = db.query('SELECT * FROM data WHERE age > 25')
    db.runsql('UPDATE data SET age = age + 1')
# connection closed and temp file deleted automatically
```

### Manual cleanup

```python
db = SequelFrame('data.csv')
try:
    result = db.query('SELECT * FROM data')
finally:
    db.kill()
```

## Advanced examples

### Aggregations

```python
with SequelFrame('sales.csv') as db:
    report = db.query('''
        SELECT
            strftime('%Y-%m', date)  AS month,
            COUNT(*)                  AS num_sales,
            SUM(amount)               AS total_sales,
            AVG(amount)               AS avg_sale,
            MAX(amount)               AS max_sale
        FROM data
        GROUP BY month
        ORDER BY month DESC
    ''')
```

### Window functions

```python
with SequelFrame(df) as db:
    result = db.query('''
        SELECT
            name,
            age,
            ROW_NUMBER() OVER (ORDER BY age DESC) AS rank,
            AVG(age)     OVER ()                  AS avg_age
        FROM data
    ''')
```

## What's new in 0.5.0

- `in_memory=True` — use SQLite `:memory:` instead of a temp file
- `add_table(data, name)` — register additional DataFrames for cross-table JOINs
- `tables()` — list all tables in the database
- `schema(table_name)` — inspect columns via `PRAGMA table_info`
- `transaction()` — context manager for atomic commit / rollback
- `execute_script(sql)` — run multiple `;`-separated statements at once
- `row_count` property — live row count from the database
- Fixed: SELECT queries were being executed twice internally
- Fixed: SQL comments at the start of a query (`-- foo\nSELECT ...`) were misclassified as write statements
- Fixed: table names are now validated to prevent SQL injection

## Migration from 0.4.x

Fully backward compatible. The only new required change: table names must now be valid SQL identifiers (letters, digits, underscores). Names with spaces or special characters will raise `ValueError`.

```python
# Old style (still works)
from sequelframe import sequelframe
db = sequelframe('data.csv')
db.runsql('SELECT * FROM data')
db.kill()

# Recommended style
from sequelframe import SequelFrame
with SequelFrame('data.csv') as db:
    result = db.query('SELECT * FROM data')
```

## Limitations

- SQLite SQL dialect only (no `RETURNING`, no `FULL OUTER JOIN` prior to SQLite 3.39, etc.)
- All data lives in memory — not suitable for datasets larger than available RAM
- For very large datasets consider [DuckDB](https://duckdb.org/) or [Polars](https://pola.rs/)

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

MIT License — see [LICENSE](LICENSE) for details.

## Author

**Sina Mirshahi**
- Email: sina7th@gmail.com
- GitHub: [@Sinamirshahi](https://github.com/Sinamirshahi)

## Links

- [PyPI](https://pypi.org/project/sequelframe/)
- [GitHub](https://github.com/Sinamirshahi/sequelframe)
- [Issue Tracker](https://github.com/Sinamirshahi/sequelframe/issues)
