Metadata-Version: 2.3
Name: mascope-sdk
Version: 2026.4.27
Summary: Mascope's public SDK library, wrapping the app's REST API.
Author: Oskari Kausiala, Philip Chernonog
Author-email: Oskari Kausiala <oskari.kausiala@karsa.fi>, Philip Chernonog <philip.chernonog@karsa.fi>
License: MIT
Classifier: Topic :: Scientific/Engineering :: Atmospheric Science
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: requests>=2.32.3
Requires-Dist: loguru>=0.7.3
Requires-Dist: pandas>=2.0.0
Requires-Dist: tqdm>=4.60.0
Requires-Dist: python-dotenv>=1.0.0
Maintainer: Oskari Kausiala, Philip Chernonog
Maintainer-email: Oskari Kausiala <oskari.kausiala@karsa.fi>, Philip Chernonog <philip.chernonog@karsa.fi>
Requires-Python: >=3.10
Project-URL: Bug Tracker, https://github.com/Karsa-Oy/Mascope/issues
Project-URL: documentation, https://github.com/Karsa-Oy/Mascope
Project-URL: homepage, https://karsa.fi/
Project-URL: repository, https://github.com/Karsa-Oy/Mascope
Description-Content-Type: text/markdown

# Mascope SDK

Python SDK for the Mascope mass spectrometry data analysis platform. Designed for researchers who want to load and analyze data from a Mascope server in Jupyter notebooks or Python scripts, or export it to use in other environments.

## Contents

- [Installation](#installation)
- [Tutorial Notebooks](#tutorial-notebooks)
- [Quick Start](#quick-start)
- [Configuration](#configuration)
- [High-Level Loaders](#high-level-loaders)
- [Caching](#caching)
- [API Reference](#api-reference)
- [Examples](#examples)
- [Migration from Legacy API](#migration-from-legacy-api)
- [For Developers](#for-developers)

## Installation

### Prerequisites

- **Python 3.10+**: [python.org/downloads](https://www.python.org/downloads/)
- **An IDE that supports Jupyter notebooks**: [VS Code](https://code.visualstudio.com/) with [Jupyter](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) and [Data Wrangler](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.datawrangler) extensions is recommended

### Set up a virtual environment

```bash
python -m venv .venv
```

Activate it (Windows):

```bash
.venv\Scripts\activate
```

Or on macOS/Linux:

```bash
source .venv/bin/activate
```

### Install the SDK

```bash
pip install mascope_sdk
```

Or with [uv](https://github.com/astral-sh/uv):

```bash
uv add mascope_sdk
```

## Tutorial Notebooks

The best way to learn the SDK is to walk through the bundled example notebooks. They cover everything from basic setup to advanced analysis workflows.

Copy them to your project directory:

```python
import mascope_sdk
mascope_sdk.copy_examples("./tutorials")
```

This creates a `tutorials/` folder with the following notebooks:

| #   | Notebook                           | Topic                                                   |
| --- | ---------------------------------- | ------------------------------------------------------- |
| 1   | `01_getting_started.ipynb`         | Connect, list datasets/batches/samples, view a spectrum |
| 2   | `02_batch_timeseries.ipynb`        | Load peaks across batches, filter, and plot             |
| 3   | `03_intra_sample_timeseries.ipynb` | Per-scan intensity timeseries for specific compounds    |
| 4   | `04_mass_defect_plot.ipynb`        | Mass defect visualization                               |
| 5   | `05_peaks_by_stage.ipynb`          | Compare measurement stages within a single sample       |

Open them in VS Code (or any Jupyter-compatible IDE) and run the cells. Each notebook is self-contained, just make sure your `.env` credentials are set up first (see [Configuration](#configuration)).

> Existing files are never overwritten, so you can safely re-run `copy_examples` after an SDK update to get new notebooks.

## Quick Start

### 1. Configure credentials

Create a `.env` file in your working directory (or any parent directory):

```env
MASCOPE_URL=https://example.mascope.app
MASCOPE_ACCESS_TOKEN=your-api-token
```

> **Tip:** Generate the API token in your Mascope instance's user settings.

### 2. Use the SDK

```python
from mascope_sdk import MascopeClient

# Auto-loads credentials from .env
mascope = MascopeClient()

# List datasets (returns a DataFrame)
datasets = mascope.datasets.list()

# List batches by dataset name
batches = mascope.batches.list("My Dataset")

# List samples from a single batch (raises if ambiguous)
samples = mascope.samples.list(batch="My Batch")

# List samples from all matching batches (containing the given keyword)
samples = mascope.samples.list(batches="Uronium")

# Load peaks across all samples in matching batches
peaks = mascope.load_peaks(dataset="My Dataset", batches="Uronium")

# Load peaks across a subset of samples, within matching batches
peaks = mascope.load_peaks(dataset="My Dataset", batches="Uronium", samples="12:")

# Plot a sample spectrum (based on sample id)
spectrum = mascope.samples.get_spectrum(sample_id=samples.iloc[0]["sample_item_id"])

import matplotlib.pyplot as plt
plt.scatter(spectrum["mz"], spectrum["intensity"])
plt.xlabel("m/z")
plt.ylabel("Intensity")
plt.show()
```

See more examples [below](#examples).

## Configuration

The `MascopeClient` can be configured in three ways (in override priority order):

1. **Constructor parameters** (highest priority):

   Initialize client with parameters:

   ```python
   mascope = MascopeClient(
       url="https://example.mascope.app",
       access_token="your-token"
   )
   ```

2. **Environment variables**:

   Set environment variables:

   ```bash
   export MASCOPE_URL=https://example.mascope.app
   export MASCOPE_ACCESS_TOKEN=your-token
   ```

   Initialize client (parameters are read from the environment unless overridden):

   ```python
   mascope = MascopeClient()
   ```

3. **`.env` file** (**_recommended for notebooks_**):

   Create `.env` file in the project directory:

   ```env
   MASCOPE_URL=https://example.mascope.app
   MASCOPE_ACCESS_TOKEN=your-token
   ```

   Initialize client (parameters are read from the `.env` file unless overridden):

   ```python
   mascope = MascopeClient()
   ```

## High-Level Loaders

The SDK provides three convenience loaders that handle dataset/batch/sample resolution, concurrent requests, and progress bars automatically. These are the recommended way to load data for analysis.

---

### `load_peaks`: Peaks across batches

Load averaged peaks ("sum spectrum") for all samples across one or more batches, returned as a single DataFrame enriched with batch and sample metadata.

```python
# All peaks from batches matching "Uronium"
peaks = mascope.load_peaks(dataset="My Dataset", batches="Uronium")

# Filter by sample name
peaks = mascope.load_peaks(dataset="My Dataset", samples="blank")

# All peaks from every batch (skip confirmation prompt)
peaks = mascope.load_peaks(dataset="My Dataset", confirm_above=None)

# Without match data, areas only
peaks = mascope.load_peaks(dataset="My Dataset", matches=False, heights=False)
```

---

### `load_peak_timeseries`: Intra-sample timeseries

Load per-scan intensity timeseries for peaks matching a compound, ion, or isotope across batches. Provide exactly one of `compound`, `ion`, or `isotope` — the value can be a formula or compound name. Pass a list to load multiple targets in a single call (peaks are discovered once per sample).

```python
# Timeseries for all peaks matched to Urea (by name or formula)
ts = mascope.load_peak_timeseries(
    dataset="My Dataset",
    batches="Uronium",
    compound="Urea",       # or compound="CH4N2O"
)

# Multiple compounds in one call
ts = mascope.load_peak_timeseries(
    dataset="My Dataset",
    compound=["Urea", "Lactic acid"],
)

# Plot per-sample timeseries
import matplotlib.pyplot as plt
for name, group in ts.groupby("sample_item_name"):
    plt.plot(group["time"], group["height"], label=name)
plt.legend()
plt.show()
```

---

### `load_peaks_by_stage`: Stage-based peak loading

Load averaged peaks for distinct time-range stages of a single sample. Useful when a measurement has phases (e.g. blank, sample introduction, wash).

```python
stages = [
    (0, 30, "blank"),
    (30, 120, "sample"),
    (120, 180, "wash"),
]

peaks = mascope.load_peaks_by_stage(sample="My Sample", stages=stages)

# Compare areas between stages
peaks.groupby("stage_name")["area"].sum()
```

The `sample` parameter accepts a sample name or ID. Stage tuples can be `(t_min, t_max)` or `(t_min, t_max, name)`.

Key columns: `stage`, `stage_name`, `t_min`, `t_max`, plus all columns from `get_peaks`.

---

### Confirmation prompt

`load_peaks` and `load_peak_timeseries` show an interactive confirmation prompt when the number of samples exceeds `confirm_above`. Defaults are 100 for `load_peaks` and 20 for `load_peak_timeseries`. This prevents accidentally launching hundreds of concurrent requests from a notebook cell. Set `confirm_above=None` to disable.

## Caching

Dataset, batch, sample, and ionization mechanism listings are cached (in volatile memory) automatically after the first call. This speeds up repeated name resolution and avoids redundant API calls. When data on the server changes (e.g. new batch created), the cache needs to be cleared to reload the data on the next call. The cache is not persisted on disk, so restarting the kernel always clears the cache.

```python
# Clear the cache when server data changes
mascope.clear_cache()
```

## API Reference

### MascopeClient

```python
from mascope_sdk import MascopeClient

mascope = MascopeClient()
```

### Resources

All `list()` methods accept names (or substrings) instead of IDs and return `pd.DataFrame | None`.
Name filters use `pandas.Series.str.contains` under the hood, so they accept
plain substrings **or** regular expressions (case-insensitive). For example,
`batches="2025|2026"` matches batch names containing "2025" or "2026".

#### `mascope.datasets`

| Method   | Description                  | Returns             |
| -------- | ---------------------------- | ------------------- |
| `list()` | List all accessible datasets | `pd.DataFrame│None` |

#### `mascope.batches`

| Method          | Description                         | Returns             |
| --------------- | ----------------------------------- | ------------------- |
| `list(dataset)` | List batches in a dataset (by name) | `pd.DataFrame│None` |

#### `mascope.samples`

| Method                                          | Description                                          | Returns             |
| ----------------------------------------------- | ---------------------------------------------------- | ------------------- |
| `list(batch=, batches=, dataset=, samples=)`    | List samples from one or more batches                | `pd.DataFrame│None` |
| `get(sample_id)`                                | Get sample details                                   | `dict│None`         |
| `get_peaks(sample_id, ...)`                     | Get peak data with optional match/filter/time params | `pd.DataFrame│None` |
| `get_peak_timeseries(sample_id, mz=, peak_id=)` | Get intensity over time for a peak                   | `pd.DataFrame│None` |
| `get_spectrum(sample_id, ...)`                  | Get averaged spectrum                                | `pd.DataFrame│None` |
| `get_spectra(sample_ids, ...)`                  | Get spectra for multiple samples                     | `pd.DataFrame│None` |
| `get_centroids(sample_ids)`                     | Get centroid data                                    | `dict│None`         |

`list` accepts exactly one of `batch` (must match a single batch; raises if ambiguous) or `batches` (returns samples from all matching batches, with an added `sample_batch_name` column).

#### `mascope.matching`

| Method                                 | Description                  | Returns     |
| -------------------------------------- | ---------------------------- | ----------- |
| `match_compound(sample_id, formula)`   | Match a compound in a sample | `dict│None` |
| `match_compounds(sample_id, formulas)` | Match multiple compounds     | `dict│None` |

#### `mascope.ionization`

| Method   | Description                          | Returns             |
| -------- | ------------------------------------ | ------------------- |
| `list()` | List available ionization mechanisms | `pd.DataFrame│None` |

Columns: `ionization_mechanism_id`, `ionization_mechanism` (human-readable name), `ionization_mechanism_polarity`.

#### `mascope.cheminfo`

| Method                                | Description                               | Returns      |
| ------------------------------------- | ----------------------------------------- | ------------ |
| `query_by_mz(mz, mechanism_ids, ...)` | Query potential formulas for an m/z value | `list[dict]` |

## Examples

### Load peaks and plot by compound

```python
from mascope_sdk import MascopeClient

mascope = MascopeClient()

peaks = mascope.load_peaks(dataset="My Dataset", batches="Uronium")

# Filter to matched peaks and summarise by compound
matched = peaks[peaks["target_compound_formula"].notna()]
summary = matched.groupby("target_compound_formula")["area"].mean()
summary.sort_values(ascending=False).head(10).plot.barh()
```

### Intra-sample timeseries

```python
import matplotlib.pyplot as plt
from mascope_sdk import MascopeClient

mascope = MascopeClient()

ts = mascope.load_peak_timeseries(
    dataset="My Dataset",
    compound="Urea",
)

# Plot per-isotope timeseries for one sample
sample = ts[ts["sample_item_name"] == ts["sample_item_name"].iloc[0]]
for isotope, group in sample.groupby("target_isotope_formula"):
    plt.plot(group["time"], group["height"], label=isotope)
plt.xlabel("Time (s)")
plt.ylabel("Intensity")
plt.legend()
plt.show()
```

### Compare stages within a sample

```python
from mascope_sdk import MascopeClient

mascope = MascopeClient()

# First load samples so the name is cached for resolution
mascope.samples.list(batch="My Batch")

stages = [
    (0, 30, "blank"),
    (30, 120, "sample"),
    (120, 180, "wash"),
]

peaks = mascope.load_peaks_by_stage(sample="My Sample", stages=stages)
peaks.groupby("stage_name")[["area", "height"]].mean()
```

### Low-level peak timeseries

```python
import matplotlib.pyplot as plt
from mascope_sdk import MascopeClient

mascope = MascopeClient()

ts = mascope.samples.get_peak_timeseries(
    sample_id="sample-123",
    mz=180.063,
    mz_tolerance_ppm=5.0,
)

if ts is not None:
    plt.plot(ts["time"], ts["height"])
    plt.xlabel("Time (s)")
    plt.ylabel("Intensity")
    plt.title(f"Peak at m/z {ts['mz'].iloc[0]:.3f}")
    plt.show()
```

## Migration from Legacy API

| Old API                                                       | New API                                               |
| ------------------------------------------------------------- | ----------------------------------------------------- |
| `get_workspaces(url, token)`                                  | `mascope.datasets.list()`                             |
| `get_sample_batches(url, token, ws_id)`                       | `mascope.batches.list("Dataset Name")`                |
| `get_samples(url, token, batch_id)`                           | `mascope.samples.list(batch="Batch Name")`            |
| `get_sample(url, token, sample_id)`                           | `mascope.samples.get(sample_id)`                      |
| `get_sample_peaks(url, token, sample_id)`                     | `mascope.samples.get_peaks(sample_id)`                |
| `get_sample_spectrum(url, token, sample_id)`                  | `mascope.samples.get_spectrum(sample_id)`             |
| `get_sample_peak_timeseries(url, token, sample_id, mz)`       | `mascope.samples.get_peak_timeseries(sample_id, mz)`  |
| `get_sample_compound_matches(url, token, sample_id, formula)` | `mascope.matching.match_compound(sample_id, formula)` |
| `get_ionization_mechanisms(url, token)`                       | `mascope.ionization.list()`                           |
| `get_cheminfo_by_mz(url, token, mz, mech_ids)`                | `mascope.cheminfo.query_by_mz(mz, mech_ids)`          |

---

## For Developers

### Logging

The SDK logs operational info (batch resolution, request counts, etc.) via [loguru](https://github.com/Delgan/loguru). The default level is `INFO`.

Set the `MASCOPE_SDK_LOG_LEVEL` environment variable to change it:

```env
MASCOPE_SDK_LOG_LEVEL=DEBUG    # verbose (HTTP requests, cache hits, etc.)
MASCOPE_SDK_LOG_LEVEL=WARNING  # quiet (only warnings and errors)
```

Or in Python before importing the SDK:

```python
import os
os.environ["MASCOPE_SDK_LOG_LEVEL"] = "DEBUG"

from mascope_sdk import MascopeClient
```

### SSL Verification

By default the SDK verifies SSL certificates. To disable verification (e.g. for local development with a self-signed certificate), set:

```env
MASCOPE_SDK_VERIFY_SSL=false
```

Or pass it to the constructor:

```python
mascope = MascopeClient(verify_ssl=False)
```

### Error Handling

```python
from mascope_sdk import MascopeClient
from mascope_sdk.exceptions import (
    AuthenticationError,
    NotFoundError,
    ConfigurationError,
)

try:
    mascope = MascopeClient()
    sample = mascope.samples.get("invalid-id")
except ConfigurationError:
    print("Missing MASCOPE_URL or MASCOPE_ACCESS_TOKEN")
except AuthenticationError:
    print("Invalid API token")
except NotFoundError:
    print("Sample not found")
```

#### Exception Hierarchy

- `MascopeError` — Base exception
  - `ConfigurationError` — Missing/invalid configuration
  - `MascopeConnectionError` — Network/connection issues
    - `MascopeTimeoutError` — Request timeout
  - `MascopeAPIError` — API errors (includes `status_code`, `message`, `url`)
    - `AuthenticationError` — 401/403 responses
    - `NotFoundError` — 404 responses
    - `ValidationError` — 422 responses
    - `ServerError` — 5xx responses

### Project Structure

```
mascope_sdk/
├── __init__.py          # Public exports (MascopeClient, exceptions)
├── client.py            # MascopeClient: main entry point and high-level loader methods
├── exceptions.py        # Exception hierarchy
├── _http.py             # Low-level HTTP session (requests wrapper)
├── _resolve.py          # Name-to-ID resolution helpers
├── _loaders.py          # High-level loaders (load_peaks, load_peak_timeseries, load_peaks_by_stage)
├── _concurrent.py       # ThreadPoolExecutor wrapper with progress bars and cancellation
├── _legacy.py           # Deprecated functional API (get_datasets, get_samples, etc.)
├── resources/
│   ├── _base.py         # BaseResource: shared HTTP helpers and datetime coercion
│   ├── datasets.py    # DatasetsResource
│   ├── batches.py       # BatchesResource
│   ├── samples.py       # SamplesResource (peaks, spectra, timeseries)
│   ├── matching.py      # MatchingResource (compound matching)
│   ├── ionization.py    # IonizationResource
│   └── cheminfo.py      # CheminfoResource (m/z queries)
└── examples/            # Jupyter notebook examples
```

**Key patterns:**

- **`client.py`** owns the public API. High-level loaders (`load_peaks`, etc.) are thin wrappers that delegate to `_loaders.py`.
- **`resources/`** contains one class per API domain. Each resource inherits `BaseResource` which provides `_get()` / `_post()` helpers and automatic datetime column coercion.
- **`_concurrent.py`** centralises `ThreadPoolExecutor` usage with `run_concurrent()`, which handles progress bars (tqdm), `None`-filtering, future cancellation on error, and the `max_workers <= 8` guard.
- **`_resolve.py`** handles name -> ID resolution with substring/regex matching (used by resources and loaders).
- **Underscore-prefixed modules** (`_http`, `_loaders`, `_concurrent`, `_resolve`, `_legacy`) are internal - not part of the public API.
