Metadata-Version: 2.4
Name: jupyterlab-minio
Version: 2.4.1
Summary: JupyterLab extension for browsing Minio object storage
Project-URL: Homepage, https://github.com/aristide/jupyterlab-minio
Project-URL: Bug Tracker, https://github.com/aristide/jupyterlab-minio/issues
Project-URL: Repository, https://github.com/aristide/jupyterlab-minio.git
Author-email: Aristide Mendo'o <mendoo.aristide@gmail.com>
License: BSD 3-Clause License
        
        Copyright (c) 2024, Aristide Mendo'o
        All rights reserved.
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this
           list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice,
           this list of conditions and the following disclaimer in the documentation
           and/or other materials provided with the distribution.
        
        3. Neither the name of the copyright holder nor the names of its
           contributors may be used to endorse or promote products derived from
           this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
License-File: LICENSE
Classifier: Framework :: Jupyter
Classifier: Framework :: Jupyter :: JupyterLab
Classifier: Framework :: Jupyter :: JupyterLab :: 4
Classifier: Framework :: Jupyter :: JupyterLab :: Extensions
Classifier: Framework :: Jupyter :: JupyterLab :: Extensions :: Prebuilt
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Requires-Dist: boto3
Requires-Dist: colorlog
Requires-Dist: jupyter-server<3,>=2.0.1
Requires-Dist: jupyterlab<5,>=4.0.0
Requires-Dist: pyarrow>=14.0
Requires-Dist: s3fs>=2021.10.1
Requires-Dist: singleton-decorator
Provides-Extra: test
Requires-Dist: coverage; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-asyncio; extra == 'test'
Requires-Dist: pytest-cov; extra == 'test'
Requires-Dist: pytest-jupyter[server]>=0.6.0; extra == 'test'
Description-Content-Type: text/markdown

# jupyterlab-minio

[![Github Actions Status](https://github.com/aristide/jupyterlab-minio/workflows/Build/badge.svg)](https://github.com/aristide/jupyterlab-minio/actions/workflows/build.yml)
[![PyPI version](https://badge.fury.io/py/jupyterlab-minio.svg)](https://badge.fury.io/py/jupyterlab-minio)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/jupyterlab-minio/jupyterlab-minio/master?urlpath=lab)

JupyterLab extension for browsing Minio object storage.

This extension is composed of a Python package named `jupyterlab-minio`.

## Screenshots

<table>
  <tr>
    <td align="center" width="50%">
      <img src="https://raw.githubusercontent.com/aristide/jupyterlab-minio/master/img/login.png" alt="Branded auth form" width="320"><br/>
      <sub><b>Branded auth form</b> — connect to any S3/MinIO/lakehouse endpoint with Use&nbsp;TLS / Path&nbsp;style toggles.</sub>
    </td>
    <td align="center" width="50%">
      <img src="https://raw.githubusercontent.com/aristide/jupyterlab-minio/master/img/buckets.png" alt="Bucket list with zone classification" width="320"><br/>
      <sub><b>Bucket list</b> — per-bucket object counts, sizes, last-modified, and zone classification (raw / anonymized / staging / aggregated / archive).</sub>
    </td>
  </tr>
  <tr>
    <td align="center" width="50%">
      <img src="https://raw.githubusercontent.com/aristide/jupyterlab-minio/master/img/gridview.png" alt="Grid view inside a bucket" width="320"><br/>
      <sub><b>Grid view inside a bucket</b> — file-type icon badges (parquet, csv, json, ipynb, yaml, md, log), breadcrumb navigation, selection bar with Preview / Copy URI / Open.</sub>
    </td>
    <td align="center" width="50%">
      <img src="https://raw.githubusercontent.com/aristide/jupyterlab-minio/master/img/parquetprev.png" alt="First-class parquet preview" width="320"><br/>
      <sub><b>First-class parquet preview</b> — schema, first rows, and per-column histograms with min/max badges; one click to open in a notebook.</sub>
    </td>
  </tr>
</table>

## Requirements

- JupyterLab >= 4.0.0
- Python >= 3.8
- Node.js >= 18 (for development only)
- `pyarrow >= 14.0` (pulled in automatically; needed for parquet preview)
- A modern browser with WebSocket support (any released in the last ~5 years)

## Installation

To install:

```bash
pip install jupyterlab-minio
```

You may also need to run:

```bash
jupyter server extension enable jupyterlab-minio
```

to make sure the server extension is enabled. Then, restart (stop and start) JupyterLab.

## Features

### Custom Data4Now file browser

- **Two-screen browser** in a dedicated sidebar — a bucket list (with zone-coloured stripes, monospace bucket names, per-bucket object counts + sizes + last-modified), and an inside-bucket object view with file-type icon badges (parquet=gold, csv=teal, json=slate, ipynb=magenta, …)
- **List and grid view** toggle in the object view, persisted across sessions via `localStorage`
- **Multi-select** with click, Shift-click range, Cmd/Ctrl-click toggle; selection bar shows "N selected · X MB" with Preview / Copy URI / Delete
- **Custom row menu** (kebab ⋮ button on every row + right-click) with Open in notebook, Preview, Copy URI, Copy to S3 Path…, Move to S3 Path…, Copy to Local…, Delete
- **Path persistence** across sessions via `IStateDB` — re-opens at whatever path you were last browsing
- **Live in-app search** with optional **Recursive** toggle that walks all objects under the current path via the server's paginator
- **Sort menu** (Name / Size / Last modified / Type · Ascending / Descending), persisted in `localStorage`
- **Bucket zone classification** (raw / anonymized / staging / aggregated / archive) with stripe + badge; configurable prefix table per zone

### Preview pane

- **First-class preview** for parquet (schema + first rows + per-column histograms with min/max badges), CSV/TSV (raw lines + subtle horizontal scrollbar), JSON (syntax-highlighted snippet), YAML/TOML/XML/HTML/MD/text (snippet capped at ~12 lines or 1200 chars with `....` truncation marker), images (with dimensions), and an "unsupported" fallback with a download button
- **Object metadata** block (ETag, Storage class, Content type, Encryption, Owner) populated from `head_object`
- **Refresh button** in the preview header busts the 60-second server cache
- **Responsive footer** — Copy URI / Download / Open in notebook; buttons collapse to icon-only at narrow panel widths

### Transfer manager

- **Pill indicator** in the bottom toolbar appears the moment any transfer starts: spinning teal ring while active, green check when done, red alert if any failed, 2-px progress strip showing aggregate progress
- **Full manager view** (4th sidebar mode) listing every in-flight, completed, and failed file with per-file progress bar, percent, bytes done/total, **speed** and **ETA**, error message
- **Per-file actions**: Pause / Resume / Cancel / Retry / Remove / Reveal-in-bucket
- **Real Pause / Resume** for uploads + downloads — multipart `UploadId` + completed parts are checkpointed to disk, so resumed transfers skip already-uploaded parts (no re-upload from byte 0)
- **Streaming WebSocket** transport — sub-second per-file progress updates pushed from the server; reconnects with exponential backoff; falls back to per-job REST polling if WS is blocked
- **Concurrent file uploads** inside a single job (default 3 parallel workers, per-bucket override available)
- **Bandwidth cap** per job (per-bucket override available); 0 = unlimited
- **Checksum verification** — per-part MD5 sent to S3 as `ContentMD5`, final ETag cross-checked after `complete_multipart_upload`
- **Drag-and-drop** in 4 directions: S3 → default file browser (download), default browser → S3 (upload), within S3 (move; Ctrl/Cmd-drag = copy), and onto the preview pane (swap)
- **Drag from OS into S3** — drop one or more files and/or whole folders straight from Finder / Explorer / Nautilus onto a bucket row (bucket list), a folder row (object list / grid), or the empty space of the current pane. Multi-selection and nested sub-folders are preserved as object key prefixes. Single-file drops produce one job; drops containing two or more files are bundled into a single bulk job via a stage-then-commit flow (`/upload-stream?group_id=…` + `/upload-stream/commit`) so the upload manager shows one row per drop with combined progress.
- **"Recently removed" undo buffer** — Remove from list is reversible for 5 minutes via a per-row Restore button + Restore-all bulk action; survives server restart (persisted to disk)
- **Resume across server restart** — `_resume_bulk_jobs_once` re-loads the per-file state and each in-flight file continues from the last persisted chunk
- **Background-tab smoothing** — when a backgrounded tab refocuses, the store requests a fresh snapshot instead of replaying the burst of buffered updates
- **Streaming uploads** to a server-side tempfile (`/upload-stream` endpoint) supports files up to 50 GB

### File browser operations

- **Bucket management**: Create and delete buckets
- **File operations**: Upload (single + folders), download, rename, copy, move, and delete files and folders
- **Cross-bucket copy/move** via a path picker dialog
- **S3 ↔ Local transfer** between S3 and the local JupyterLab filesystem
- **Recursive deletion** for folders + bulk delete from the selection bar (Delete or Backspace key bound)
- **Copy to S3** — right-click files in the default JupyterLab file browser
- **Open in JupyterLab** — double-click any file (notebooks render as notebooks, text as the editor, images in the image viewer) via the registered `S3Drive`

### Authentication & branding

- **Authentication**: Configure credentials via environment variables or `~/.mc/config.json` (single-connection mode), or via the built-in **connection manager** when it is enabled (save, switch, edit and delete multiple S3/MinIO connections; credentials persisted server-side)
- **Connection chip** at the top of the sidebar with LIVE badge, endpoint, and a `←` back arrow when you're inside a bucket
- **Bottom bar** with mount-path code chip + Disconnect (connection-manager mode) + a **Buckets** button that jumps to the bucket list
- **Data4Now design tokens** — Navy / Teal / Magenta brand colors, Montserrat / Roboto / JetBrains Mono fonts bundled locally
- **Light + Dark mode** — every surface adapts; brand teal stays teal
- **i18n** — English and French translations for every visible string
- **Theme-aware sidebar icon** (Lakebed mark) adapts to JupyterLab Light, Dark, and Dark High Contrast

## Usage

### Configuration

The extension runs in one of two modes, selected by the
`MINIO_ENABLE_CONNECTION_MANAGER` environment variable. **It defaults to the
connection manager (on).** Set the variable to a falsy value
(`false` / `0` / `no` / `off`) to opt into single-connection mode.

#### Connection manager (multi-connection) — default

By default (or with `MINIO_ENABLE_CONNECTION_MANAGER=true`), the interactive
login + multi-connection manager is enabled:

- The **connection list** becomes the extension's home page. From there you can
  add, edit, duplicate, delete and switch between any number of S3/MinIO
  connections. Each connection has a name, colour tag, endpoint, region,
  credentials, and TLS / path-style toggles.
- **Open a connection** (list its buckets) by **double-clicking** its row or
  picking **Open** from the row's right-click / ⋮ menu; a single click just
  selects. The active connection can be re-opened the same way.
- **The connection name is the `mc` alias.** It must be a valid alias (letters,
  digits, `-` and `_`) and unique across your connections. Names are also
  repaired (sanitized + de-duplicated) on server start, in case
  `~/.jupyter/minio_connections.json` was hand-edited.
- Row badges show reachability: the list runs a quick **background health check**
  and marks unreachable connections with an **Error** badge.
- **Duplicate** copies a connection server-side (including its secret) and opens
  the copy for editing — no need to re-enter credentials.
- The add/edit form surfaces errors inline: red field markers for missing /
  malformed inputs plus a footer status banner for **Test** / **Save** results
  (testing…, connection successful, auth failed, endpoint unreachable, missing
  fields, invalid URL).
- The full connection list is persisted to `~/.jupyter/minio_connections.json`
  (secrets in plaintext, consistent with `~/.mc/config.json`).
- **`~/.mc/config.json` is kept in sync (full overwrite).** Its `aliases` section
  is rewritten to contain exactly one alias per saved connection, so the server's
  `mc` CLI can reach every connection (`mc ls <name>`). **Aliases created outside
  the extension are removed** while the manager is enabled.
- The **active** connection feeds the browser and propagates its credentials to
  kernels and terminals (via `~/.jupyter/minio_env.json` + startup hooks), so
  notebooks see the matching `MINIO_*` environment variables. Use **Switch** in
  the connection chip to return to the list while keeping it active; use
  **Disconnect** in the bottom bar to **deactivate** it (clears the active
  connection and unsets the kernel/terminal credentials on restart).
- On first launch, an existing single connection (`~/.mc/config.json` "storage"
  alias, or `MINIO_*` env vars) is migrated into the store automatically so you
  keep your current connection.

#### Single-connection mode

Opt out of the manager to run with a single, externally-provisioned connection:

```bash
export MINIO_ENABLE_CONNECTION_MANAGER=false
```

In this mode there is **no login or management UI**. The one connection is read
from `~/.mc/config.json` (if present) or from environment variables, and the
panel opens straight to the bucket browser. If neither is configured, the panel
shows a short "not configured" message instead of a login form.

If you have a `~/.mc/config.json` file, no further configuration is necessary.

To configure using environment variables, set:

```bash
export MINIO_ENDPOINT="https://s3.us.cloud-object-storage.appdomain.cloud"
export MINIO_ACCESS_KEY="my-access-key-id"
export MINIO_SECRET_KEY="secret"
# optional
export MINIO_CONNECTION_NAME="storage"   # doubles as the mc alias name
export MINIO_REGION="us-east-1"
export MINIO_USE_TLS="true"
export MINIO_PATH_STYLE="false"
```

`MINIO_CONNECTION_NAME` names the connection's `mc` alias (default `storage`;
sanitized to letters, digits, `-` and `_`). When the `MINIO_*` credentials are
set, the extension writes/refreshes that alias in `~/.mc/config.json` on startup,
so the server's `mc` CLI resolves the same connection (`mc ls <name>`).

> **Migration note:** the interactive login form only appears when the connection
> manager is enabled — which is now the default. Deployments that provision a
> single connection via env vars or `~/.mc/config.json` and want to keep the old
> no-login behaviour should set `MINIO_ENABLE_CONNECTION_MANAGER=false`.

### S3 Browser Toolbar

Toolbar buttons vary between the bucket-list view (at root) and the inside-a-bucket view.

**Bucket list view:**

| Button       | Action                                                    |
| ------------ | --------------------------------------------------------- |
| **+**        | Create a new bucket                                       |
| **Folder+**  | Create a new folder (after navigating into a bucket)      |
| **Upload**   | Stream-upload files to the current path                   |
| **Search**   | Toggle the inline search bar                              |
| **Sort**     | Open the sort popover (Name / Size / Last modified ± dir) |
| **Refresh**  | Refresh the bucket listing                                |
| **Settings** | Open the extension settings editor                        |

**Inside-bucket view** swaps in: Download (multi-select), Filter & sort divider, and a List ⇄ Grid view toggle on the right.

### Connection chip + bottom bar

- **Connection chip** (top): connection name + LIVE badge + endpoint, plus a `←` back arrow whenever you're inside a bucket, and a **Switch** button (connection-manager mode only) to return to the connection list
- **Bottom bar**: green dot · `Mounted at s3://` · **upload-progress pill** (appears when transfers run) · **Buckets** (jump to the bucket list). In connection-manager mode it also shows **Disconnect** (deactivates the connection); in single-connection mode there is no Switch or Disconnect.

### Context Menu / Row kebab ⋮

Right-click any row, or click its kebab ⋮ button. The same popup appears in both cases.

- **Preview** — Open the file in the in-app preview pane (parquet, CSV, JSON, YAML, TOML, MD, XML, HTML, text, images)
- **Open in notebook** — Open in a JupyterLab editor / notebook / image viewer tab
- **Copy URI** — Copy `s3://bucket/key` to the clipboard
- **Copy to S3 Path…** — Copy to another S3 location
- **Move to S3 Path…** — Move to another S3 location
- **Copy to Local…** — Download to the local JupyterLab filesystem
- **Delete** (or **Delete Bucket** when right-clicking a bucket row at root)

Keyboard: `Delete` and `Backspace` inside the sidebar trigger the delete command on the current selection.

### Transfer manager

- A **pill** appears in the bottom bar the moment any transfer kicks off. Click it to open the **Transfers** view (a 4th sidebar mode that takes over the panel; the `←` back arrow returns you to the file browser).
- The view groups files into **Uploading / Done / Failed** plus a collapsible **Recently removed (N)** section with per-row Restore + a Restore-all bulk action.
- The empty state is a dashed drop zone — drag files **or whole OS folders** onto it to start uploads to the current path.

### Settings (Advanced Settings Editor)

| Setting                       | Type            | Default | Notes                                              |
| ----------------------------- | --------------- | ------- | -------------------------------------------------- |
| `defaultConnectionName`       | string          | `""`    | Pre-fills the auth form's connection-name field    |
| `zonePrefixes`                | object<zone,[]> | (incl.) | Maps bucket-name prefixes to medallion zones       |
| `transferConcurrency`         | int (1..16)     | `3`     | Parallel files per bulk job                        |
| `transferConcurrencyByBucket` | object<str,int> | `{}`    | Per-bucket override                                |
| `bandwidthLimitMbps`          | int (0..10000)  | `0`     | 0 = unlimited                                      |
| `bandwidthLimitMbpsByBucket`  | object<str,int> | `{}`    | Per-bucket override                                |
| `verifyChecksums`             | bool            | `true`  | Per-part MD5 + final ETag check on upload/download |

## Development

### Development Installation

> **Note:** You will need NodeJS >= 18 to build the extension package.

The `jlpm` command is JupyterLab's pinned version of [yarn](https://yarnpkg.com/), but you may also use `yarn` or `npm` as an alternative.

To install the development environment:

```bash
# Clone the repository and navigate to the project folder
git clone https://github.com/aristide/jupyterlab-minio.git
cd jupyterlab-minio

# Set up a virtual environment
virtualenv .venv
source .venv/bin/activate

# Install the package in development mode
pip install -e ".[test]"

# Link the development version of the extension with JupyterLab
jupyter labextension develop . --overwrite

# Enable the server extension
jupyter server extension enable jupyterlab-minio

# Build the extension TypeScript source files
jlpm build
```

To continuously watch the source directory and rebuild the extension on changes, run:

```bash
# Watch the source directory in one terminal
jlpm watch

# In another terminal, run JupyterLab in debug mode
jupyter lab --debug
```

To ensure source maps are generated for easier debugging:

```bash
jlpm build:lib && jlpm build:labextension:dev
```

### Development Uninstallation

```bash
# Disable the server extension in development mode
jupyter server extension disable jupyterlab-minio

# Uninstall the package
pip uninstall jupyterlab-minio
```

In development mode, you may also need to remove the symlink created by `jupyter labextension develop`. To find its location, use `jupyter labextension list` to locate the `labextensions` folder, then remove the `jupyterlab-minio` symlink within it.

### Testing the Extension

#### Server Tests

To install test dependencies and execute server tests:

```bash
pip install -e ".[test]"
jupyter labextension develop . --overwrite
pytest -vv -r ap --cov jupyterlab-minio
```

#### Frontend Tests

To execute frontend tests using [Jest](https://jestjs.io/):

```bash
jlpm
jlpm test
```

#### Integration Tests

This extension uses [Playwright](https://playwright.dev/docs/intro/) with the JupyterLab helper [Galata](https://github.com/jupyterlab/jupyterlab/tree/master/galata) for integration tests.

Refer to the [ui-tests README](./ui-tests/README.md) for further details.

## Running the Devcontainer in Visual Studio Code

1. **Install Docker**: Ensure Docker is installed and running on your machine. You can download it from [Docker's official site](https://www.docker.com/products/docker-desktop).

2. **Install Visual Studio Code**: Download and install [Visual Studio Code](https://code.visualstudio.com/).

3. **Install the Dev Containers Extension**:
   - In Visual Studio Code, go to the Extensions view (`Ctrl+Shift+X` or `Cmd+Shift+X` on Mac).
   - Search for and install the "Dev Containers" extension by Microsoft.

4. **Open the Project in a Devcontainer**:
   - Open the `jupyterlab-minio` project folder in Visual Studio Code.
   - You should see a prompt to reopen the folder in a devcontainer. Click "Reopen in Container." If you don't see the prompt, use the **Command Palette** (`Ctrl+Shift+P` or `Cmd+Shift+P` on Mac), type "Dev Containers: Reopen in Container," and select it.

5. **Wait for the Container to Build**:
   - VS Code will build the devcontainer using the `.devcontainer/Dockerfile` or `.devcontainer/devcontainer.json` configuration. This setup may take a few minutes as it installs dependencies and configures the environment.

6. **Access the Development Environment**:
   - Once the container is running, you can access the terminal (`Ctrl+\`` or `Cmd+\``on Mac) and use the VS Code editor as usual. The devcontainer has all necessary tools pre-installed for working on`jupyterlab-minio`.

7. **Run the Extension**:
   - To run and test the extension in JupyterLab, use the development commands from above, such as `jlpm watch` and `jupyter lab --debug --ServerApp.token='' --ip=0.0.0.0 --notebook-dir=notebooks`.

This setup allows you to develop in a consistent, isolated environment that replicates the project dependencies and configurations, making collaboration easier.
