Metadata-Version: 2.4
Name: duckboat
Version: 0.12.0
Summary: A SQL-based Python dataframe library for ergonomic interactive data analysis and exploration.
Project-URL: homepage, https://github.com/ajfriend/duckboat
Project-URL: repository, https://github.com/ajfriend/duckboat
Project-URL: documentation, https://ajfriend.github.io/duckboat
Project-URL: pypi, https://pypi.org/project/duckboat
Author-email: AJ Friend <ajfriend@gmail.com>
License-Expression: MIT
Keywords: ETL,SQL,data-wrangling,duckdb,pipelines
Requires-Python: >=3.9
Requires-Dist: duckdb
Requires-Dist: pandas
Requires-Dist: pyarrow
Provides-Extra: all
Requires-Dist: ipykernel; extra == 'all'
Requires-Dist: jupyterlab; extra == 'all'
Requires-Dist: jupyterlab-execute-time; extra == 'all'
Requires-Dist: matplotlib; extra == 'all'
Requires-Dist: polars; extra == 'all'
Requires-Dist: pytest; extra == 'all'
Requires-Dist: pytest-cov; extra == 'all'
Requires-Dist: ruff; extra == 'all'
Provides-Extra: dev
Requires-Dist: ipykernel; extra == 'dev'
Requires-Dist: jupyterlab; extra == 'dev'
Requires-Dist: jupyterlab-execute-time; extra == 'dev'
Provides-Extra: docs
Requires-Dist: matplotlib; extra == 'docs'
Provides-Extra: test
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-cov; extra == 'test'
Requires-Dist: ruff; extra == 'test'
Description-Content-Type: text/markdown

# Duckboat

[GitHub](https://github.com/ajfriend/duckboat) | [Docs](https://ajfriend.github.io/duckboat/) | [PyPI](https://pypi.org/project/duckboat/)

*Unsightly to some, but gets the job done.*

Duckboat is a SQL-based Python dataframe library for ergonomic interactive
data analysis and exploration.


```python
pip install git+https://github.com/ajfriend/duckboat
```

Duckboat allows you to chain SQL snippets (often omitting `select *` and `from ...`)
to incrementally and lazily build up complex queries.

Duckboat is a light wrapper around the
[DuckDB relational API](https://duckdb.org/docs/api/python/relational_api),
which is easily accessible if you'd like to use DuckDB more directly.
Expressions are evaluated lazily and optimized by DuckDB,
so queries are fast, avoiding materializing intermediate tables and data transfers.


```python
import duckboat as uck

csv = 'https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv'

uck.Table(csv).do(
    "where sex = 'female' ",
    'where year > 2008',
    'select *, cast(body_mass_g as double) as grams',
    'select species, island, avg(grams) as avg_grams group by 1,2',
    'select * replace (round(avg_grams, 1) as avg_grams)',
    'order by avg_grams',
)
```

```
┌───────────┬───────────┬───────────┐
│  species  │  island   │ avg_grams │
│  varchar  │  varchar  │  double   │
├───────────┼───────────┼───────────┤
│ Adelie    │ Torgersen │    3193.8 │
│ Adelie    │ Dream     │    3357.5 │
│ Adelie    │ Biscoe    │    3446.9 │
│ Chinstrap │ Dream     │    3522.9 │
│ Gentoo    │ Biscoe    │    4786.3 │
└───────────┴───────────┴───────────┘
```

## Philosophy

This approach results in a mixture of Python and SQL that, I think, is semantically very similar to
[Google's Pipe Syntax for SQL](https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/):
We can leverage our existing knowledge of SQL, while making a few small changes to make it more ergonomic and composable.

When doing interactive data analysis, I find this approach easier to read and write than
fluent APIs (like in [Polars](https://pola.rs/) or [Ibis](https://ibis-project.org/)) or typical [Pandas](https://pandas.pydata.org/) code.
If some operation is easier in other libraries, Duckboat makes it straightforward translate between them, either directly or through Apache Arrow.

## Feedback

I'd love to hear any feedback on the approach here, so feel free to reach out through
[Issues](https://github.com/ajfriend/duckboat/issues)
or
[Discussions](https://github.com/ajfriend/duckboat/discussions).
