Metadata-Version: 2.4
Name: dbcooper
Version: 0.0.7
Summary: The dbcooper package turns a database connection into a collection of functions, handling logic for keeping track of connections and letting you take advantage of autocompletion when exploring a database.
Author-email: Michael Chow <mc_al_github@fastmail.com>
License: MIT
Keywords: template,packaging
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: sqlalchemy
Requires-Dist: tabulate
Provides-Extra: siuba
Requires-Dist: siuba>=0.4.4; extra == "siuba"
Dynamic: license-file

# dbcooper-py

[![CI](https://github.com/machow/dbcooper-py/actions/workflows/ci.yml/badge.svg)](https://github.com/machow/dbcooper-py/actions/workflows/ci.yml)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/machow/dbcooper-py/HEAD)

The dbcooper package turns a database connection into a collection of functions,
handling logic for keeping track of connections and letting you take advantage of
autocompletion when exploring a database.

It's especially helpful to use when authoring database-specific Python packages,
for instance in an internal company package or one wrapping a public data source.

For the R version see [dgrtwo/dbcooper](https://github.com/dgrtwo/dbcooper).

## Installation

```
pip install dbcooper
```

## Example

### Initializing the functions

The dbcooper package asks you to create the connection first.
As an example, we'll use the Lahman baseball database package (`lahman`).


```python
from sqlalchemy import create_engine
from dbcooper.data import lahman_sqlite

# connect to sqlite
engine = create_engine("sqlite://")

# load the lahman data into the "lahman" schema
lahman_sqlite(engine)
```

Next we'll set up dbcooper


```python
from dbcooper import DbCooper

dbc = DbCooper(engine)
```

The `DbCooper` object contains two important things:

* Accessors to fetch specific tables.
* Functions for interacting with the underlying database.

### Using table accessors

In the example below, we'll use the `"Lahman"."Salaries"` table as an example.
By default, dbcooper makes this accessible as `.lahman_salaries`.

**Plain** `.lahman_salaries` prints out table and column info, including types and descriptions.


```python
# show table and column descriptions
dbc.lahman_salaries
```




<h3> salaries </h3>
<p> (No table description.) </p>
<table>
<thead>
<tr><th>name    </th><th>type  </th><th>description  </th></tr>
</thead>
<tbody>
<tr><td>index   </td><td>BIGINT</td><td>             </td></tr>
<tr><td>yearID  </td><td>BIGINT</td><td>             </td></tr>
<tr><td>teamID  </td><td>TEXT  </td><td>             </td></tr>
<tr><td>lgID    </td><td>TEXT  </td><td>             </td></tr>
<tr><td>playerID</td><td>TEXT  </td><td>             </td></tr>
<tr><td>salary  </td><td>BIGINT</td><td>             </td></tr>
</tbody>
</table>



Note that sqlite doesn't support table and columnn descriptions, so these sections
are empty.

**Calling** `.lahman_salaries()` fetches a lazy version of the data.



```python
dbc.lahman_salaries()
```




    # Source: lazy query
    # DB Conn: Engine(sqlite://)
    # Preview:
       index  yearID teamID lgID   playerID  salary
    0      0    1985    ATL   NL  barkele01  870000
    1      1    1985    ATL   NL  bedrost01  550000
    2      2    1985    ATL   NL  benedbr01  545000
    3      3    1985    ATL   NL   campri01  633333
    4      4    1985    ATL   NL  ceronri01  625000
    # .. may have more rows



Note that this data is a siuba `LazyTbl` object, which you can use to analyze the data.


```python
from siuba import _, count

dbc.lahman_salaries() >> count(over_100k = _.salary > 100_000)
```




    # Source: lazy query
    # DB Conn: Engine(sqlite://)
    # Preview:
       over_100k      n
    0       True  25374
    1      False   1054
    # .. may have more rows



### Using database functions

* `.list()`: Get a list of tables
* `.tbl()`: Access a table that can be worked with using `siuba`.
* `.query()`: Perform a SQL query and work with the result.
* `._engine`: Get the underlying sqlalchemy engine.

For instance, we could start by finding the names of the tables in the Lahman database.


```python
dbc.list()
```




    ['lahman.allstar_full',
     'lahman.appearances',
     'lahman.awards_managers',
     'lahman.awards_players',
     'lahman.awards_share_managers',
     'lahman.awards_share_players',
     'lahman.batting',
     'lahman.batting_post',
     'lahman.college_playing',
     'lahman.fielding',
     'lahman.fielding_of',
     'lahman.fielding_ofsplit',
     'lahman.fielding_post',
     'lahman.hall_of_fame',
     'lahman.home_games',
     'lahman.managers',
     'lahman.managers_half',
     'lahman.parks',
     'lahman.people',
     'lahman.pitching',
     'lahman.pitching_post',
     'lahman.salaries',
     'lahman.schools',
     'lahman.series_post',
     'lahman.teams',
     'lahman.teams_franchises',
     'lahman.teams_half']



We can access one of these tables with `dbc.tbl()`, then put it through any kind
of siuba operation.


```python
dbc.tbl("Salaries")
```




    # Source: lazy query
    # DB Conn: Engine(sqlite://)
    # Preview:
       index  yearID teamID lgID   playerID  salary
    0      0    1985    ATL   NL  barkele01  870000
    1      1    1985    ATL   NL  bedrost01  550000
    2      2    1985    ATL   NL  benedbr01  545000
    3      3    1985    ATL   NL   campri01  633333
    4      4    1985    ATL   NL  ceronri01  625000
    # .. may have more rows




```python
from siuba import _, count
dbc.tbl("Salaries") >> count(_.yearID, sort=True)
```




    # Source: lazy query
    # DB Conn: Engine(sqlite://)
    # Preview:
       yearID     n
    0    1999  1006
    1    1998   998
    2    1995   986
    3    1996   931
    4    1997   925
    # .. may have more rows



If you'd rather start from a SQL query, use the `.query()` method.


```python
dbc.query("""
    SELECT
        playerID,
        sum(AB) as AB
    FROM Batting
    GROUP BY playerID
""")
```




    # Source: lazy query
    # DB Conn: Engine(sqlite://)
    # Preview:
        playerID     AB
    0  aardsda01      4
    1  aaronha01  12364
    2  aaronto01    944
    3   aasedo01      5
    4   abadan01     21
    # .. may have more rows



For anything else you might want to do, the sqlalchemy Engine object is available.
For example, the code below shows how you can set its `.echo` attribute, which
tells sqlalchemy to provide useful logs.


```python
dbc._engine.echo = True
table_names = dbc.list()
```

    2022-03-20 22:49:37,553 INFO sqlalchemy.engine.Engine PRAGMA database_list
    2022-03-20 22:49:37,554 INFO sqlalchemy.engine.Engine [raw sql] ()
    2022-03-20 22:49:37,555 INFO sqlalchemy.engine.Engine SELECT name FROM "main".sqlite_master WHERE type='table' ORDER BY name
    2022-03-20 22:49:37,555 INFO sqlalchemy.engine.Engine [raw sql] ()
    2022-03-20 22:49:37,556 INFO sqlalchemy.engine.Engine SELECT name FROM "lahman".sqlite_master WHERE type='table' ORDER BY name
    2022-03-20 22:49:37,557 INFO sqlalchemy.engine.Engine [raw sql] ()


Note that the log messages above show that the `.list()` method executed two queries:
One to list tables in the "main" schema (which is empty), and one to list tables
in the "lahman" schema.

## Advanced Configuration

> ⚠️: These behaviors are well tested, but dbcooper's internals and API may change.

dbcooper can be configured in three ways, each corresponding to a class interface:

* **TableFinder**: Which tables will be used by `dbcooper`.
* **AccessorBuilder**: How table names are turned into accessors.
* **DbcDocumentedTable**: The class that defines what an accessor will return.


```python
from sqlalchemy import create_engine
from dbcooper.data import lahman_sqlite
from dbcooper import DbCooper, AccessorBuilder

engine = create_engine("sqlite://")
lahman_sqlite(engine)
```

### Excluding a schema


```python
from dbcooper import TableFinder

finder = TableFinder(exclude_schemas=["lahman"])
dbc_no_lahman = DbCooper(engine, table_finder=finder)
dbc_no_lahman.list()
```




    []



### Formatting table names


```python
from dbcooper import AccessorBuilder

# omits schema, and keeps only table name
# e.g. `salaries`, rather than `lahman_salaries`
builder = AccessorBuilder(format_from_part="table")

tbl_flat = DbCooper(engine, accessor_builder=builder)
tbl_flat.salaries()
```




    # Source: lazy query
    # DB Conn: Engine(sqlite://)
    # Preview:
       index  yearID teamID lgID   playerID  salary
    0      0    1985    ATL   NL  barkele01  870000
    1      1    1985    ATL   NL  bedrost01  550000
    2      2    1985    ATL   NL  benedbr01  545000
    3      3    1985    ATL   NL   campri01  633333
    4      4    1985    ATL   NL  ceronri01  625000
    # .. may have more rows



### Grouping tables by schema


```python
from dbcooper import AccessorHierarchyBuilder

tbl_nested = DbCooper(engine, accessor_builder=AccessorHierarchyBuilder())

# note the form: <schema>.<table>
tbl_nested.lahman.salaries()
```




    # Source: lazy query
    # DB Conn: Engine(sqlite://)
    # Preview:
       index  yearID teamID lgID   playerID  salary
    0      0    1985    ATL   NL  barkele01  870000
    1      1    1985    ATL   NL  bedrost01  550000
    2      2    1985    ATL   NL  benedbr01  545000
    3      3    1985    ATL   NL   campri01  633333
    4      4    1985    ATL   NL  ceronri01  625000
    # .. may have more rows



### Don't show table documentation


```python
from dbcooper import DbcSimpleTable

dbc_no_doc = DbCooper(engine, table_factory=DbcSimpleTable)
dbc_no_doc.lahman_salaries
```




    DbcSimpleTable(..., 'salaries', 'lahman')



Note that sqlalchemy dialects like `snowflake-sqlalchemy` cannot look up things
like table and column descriptions as well as other dialects, so `DbcSimpleTable`
may be needed to connect to snowflake (see [this issue](https://github.com/snowflakedb/snowflake-sqlalchemy/issues/276)).

## Developing

```shell
# install with development dependencies
pip install -e .[dev]

# or install from requirements file
pip install -r requirements/dev.txt
```

### Test

```shell
# run all tests, see pytest section of pyproject.toml
pytest

# run specific backends
pytest -m 'not snowflake and not bigquery'

# stop on first failure, drop into debugger
pytest -x --pdb
```

### Release

```shell
# set version number
git tag v0.0.1

# (optional) push to github
git push origin --tags

# check version
python -m setuptools_scm
```
