Metadata-Version: 2.3
Name: tugboat-py
Version: 0.1.1
Summary: Simple utilities to generate a Dockerfile from a directory or project, build the corresponding Docker image, push the image to DockerHub, and publicly share the project via Binder.
Author: Daniel Molitor
Author-email: Daniel Molitor <molitdj97@gmail.com>
Requires-Dist: great-docs>=0.14.0
Requires-Dist: pigar>=2.2.0
Requires-Dist: pygit2>=1.19.3
Requires-Dist: pyperclip>=1.11.0
Requires-Dist: python-dotenv>=1.2.2
Requires-Python: >=3.11
Description-Content-Type: text/markdown

<p align="center">
<a href="https://posit-dev.github.io/great-docs/">
<img src="assets/tugboat-logo-py.png" alt="tugboat" width="350">
</a>
</p>

# tugboat

<!-- badges: start -->
<!-- badges: end -->

A simple Python package to generate a Dockerfile and corresponding Docker image
from an analysis directory. tugboat also prepares your analysis repository to be
shared via [Binder](https://mybinder.readthedocs.io/en/latest/index.html).

tugboat uses the [pigar](https://github.com/damnever/pigar) package to automatically
detect all the packages necessary to replicate your analysis and will generate
a Dockerfile that contains an exact copy of your entire directory with all
the packages installed. tugboat transforms an unstructured analysis folder into a `requirements.txt` file
and constructs a Docker image that includes all your essential R packages
based on this file. tugboat utilizes [uv](https://docs.astral.sh/uv/) under the hood;
as a result, projects that already utilize uv should be directly compatible with no
additional setup.

tugboat may be of use, for example, when preparing a replication package for
research. With tugboat, you can take a directory on your local computer
and quickly generate a corresponding Dockerfile and Docker image that contains all the
code and the necessary software to reproduce your findings.

## Installation

Install tugboat from PyPI:
```python
pip install tugboat-py
```

or install tugboat from GitHub:
```python
pip install git+https://github.com/dmolitor/tugboat-py
```

## Usage

tugboat has three primary functions; one to create a Dockerfile from your
analysis directory, one to build the corresponding Docker image, and one to make
your project ready to share and run in an online, interactive compute environment
via [Binder](https://mybinder.readthedocs.io/en/latest/index.html).

### Create the Dockerfile

The primary function from tugboat is `create()`. This function converts 
your analysis directory into a Dockerfile that includes all your code 
and essential Python packages.

This function scans all files in the current analysis directory,
attempts to detect all Python packages, and installs these packages in
the resulting Docker image. It also copies the entire contents of the
analysis directory into the Docker image. For example, if
your analysis directory is named `incredible_analysis`, the corresponding
location of your code and data files in the generated Docker image will
be `/incredible_analysis`.

For the most common use-cases, there are a couple of arguments in this
function that are particularly important:

- `project`: This argument tells tugboat which directory is the one to generate
the Dockerfile from. You can set this value yourself, or you can just use
the default value. By default, tugboat uses the working directory to
determine the analysis directory.
- `exclude`: A list of files or sub-directories in your analysis directory
that should ***NOT*** be included in the Docker image. This is particularly
important when you have, for example, a sub-directory with large data files
that would make the resulting Docker image extremely large if included. You
can tell tugboat to exclude this sub-directory and then simply mount it to
a Docker container as needed.

Below I'll outline a couple examples.
```python
from tugboat import create

# The simplest scenario where your analysis directory is your current
# working directory, you are fine with the default base "python:3.x-slim"
# Docker image, and you want to include all files/directories:
create()

# Suppose your analysis directory is actually a sub-directory of your
# main project directory:
create(project="./sub-directory")

# Suppose that you specifically need a Docker base image that has uv
# installed. To do this, we will explicitly specify a different Docker
# base image using the `FROM` argument.
create(FROM="ghcr.io/astral-sh/uv:latest")

# Finally, suppose that we want to include all files except a couple
# particularly data-heavy sub-directories:
create(exclude=["data/big_directory_1", "data/big_directory_2"])
```

### Build the Docker image

Once the Dockerfile has been created, we can build the Docker image
with the `build()` function. By default this will assume the Dockerfile
is located in the current working directory. This function assumes a little knowledge
about Docker; if you aren't sure where to start,
[this is a great starting point](https://colinfay.me/docker-r-reproducibility/).

The following example will do the simplest thing and will build the
image locally.
```python
build(image_name="awesome_analysis")
```

Suppose that, like above, your analysis directory is a sub-directory of
your main project directory:
```r
build(
    dockerfile="./sub-directory",
    build_context="./sub-directory",
    image_name="awesome_analysis"
)
```

### Push to DockerHub

If, instead of just building the Docker image locally, you want to build
the image and then push to DockerHub, you can make a couple small additions
to the code above:
```python
import os
from dotenv import load_dotenv
from tugboat import build

load_dotenv()

build(
    dockerfile="./sub-directory",
    build_context="./sub-directory",
    image_name="awesome_analysis",
    push=True,
    dh_username=os.environ["DOCKERHUB_USERNAME"],
    dh_password=os.environ["DOCKERHUB_USERNAME"]
)
```

Note: If you choose to push, you also need to provide your DockerHub
username and password. Typically you don't want to pass these in
directly and should instead use environment variables (or a similar
method) instead.

### Share your project via Binder

Binder lets others instantly launch and interact with your R project in a
live, cloud-based environment with no local setup required. tugboat will
prepare your project to be shared with Binder. The process is simple:

- First, create the Dockerfile from your analysis directory:

    ``` python
    create(
        project=".",
        exclude=["data/big_directory_1", "data/big_directory_2"]
    )
    ```

- Then, prep your directory for Binder. Your analysis directory _must_ be
a GitHub repository:

    ``` python
    binderize(branch="main")
    ```
By default this will add a Binder badge to your README.md file if it already has a section for badges:

``` python
Added badge to /.../README.md
```
If your README file does _not_ have a section for badges, it will automatically
save the badge to your clipboard and you will need to manually insert it
into the README.

``` python
Add the following to your README.md file:

<!-- badges: start -->
[![Launch RStudio Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/{username}/{repo}/{branch}?urlpath=rstudio)
<!-- badges: end -->
```

After running `binderize()` you will see the following message:
```
Your repository has been configured for Binder.
[x] Commit and push all changes
[x] Launch Binder at: https://mybinder.org/v2/gh/{username}/{repo}/{branch}?urlpath=rstudio
```

You must commit and push all changes _before_ visiting the Binder link,
otherwise it will likely fail. Binder can automatically detect changes 
to the repository and will rebuild as necessary, ensuring that the Binder
repository stays up to date.

## R package

This package has a sibling [R package](https://github.com/dmolitor/tugboat)!