Metadata-Version: 2.4
Name: spectrl
Version: 0.1.0
Summary: Inline spectrum URL encoder — embeds a complete mass spectrum in a compact, URL-safe token.
Project-URL: Homepage, https://github.com/pgarrett-scripps/spectrl
Project-URL: Repository, https://github.com/pgarrett-scripps/spectrl
Project-URL: Issues, https://github.com/pgarrett-scripps/spectrl/issues
Project-URL: Specification, https://github.com/pgarrett-scripps/spectrl/blob/main/SPECIFICATION.md
Project-URL: Changelog, https://github.com/pgarrett-scripps/spectrl/blob/main/CHANGELOG.md
Author-email: Patrick Garrett <pgarrett@scripps.edu>
License-Expression: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: hupo-psi,mass-spectrometry,mzml,proforma,proteomics,usi
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Chemistry
Requires-Python: >=3.12
Requires-Dist: msgpack>=1.1.0
Requires-Dist: mzmlpy>=0.4.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: pynumpress>=0.0.4
Provides-Extra: proforma
Requires-Dist: pyteomics>=4.7; extra == 'proforma'
Description-Content-Type: text/markdown

# spectrl

**Inline spectrum URL encoder for mass spectrometry.**

Encodes a complete mass spectrum — peaks, metadata, precursor info — into a single compact, URL-safe token. The entire spectrum lives in the string. No backend required.

```
spectrl1.<header>.<mz_array>.<intensity_array>[.<extra_arrays>]
```

## Why

A [USI](https://www.psidev.info/usi) *references* a spectrum stored in a repository. `spectrl` *embeds* it. Use `spectrl` when you want to share a spectrum directly in a URL, QR code, notebook, or paper — without requiring the reader to have access to the original file.

The two are complementary: a `spectrl` token can carry a USI back-link, and spectra too large to embed fall back to a USI reference.

## Install

```bash
pip install spectrl
```

Requires Python 3.12+.

## Quick start

### Encode from mzmlpy

```python
from mzmlpy.run import Mzml
from spectrl import encode_spectrum, from_mzmlpy

with Mzml("data.mzML") as mzml:
    spec = mzml.spectra[0]
    token = encode_spectrum(from_mzmlpy(spec))

print(token)
# spectrl1.hQ...
```

### Encode manually

```python
import numpy as np
from spectrl import encode_spectrum
from spectrl.model import InlineSpectrum, SpectrlCvParam

spec = InlineSpectrum(
    default_array_length=3,
    mz=np.array([147.0, 175.1, 246.2]),
    intensity=np.array([1e5, 8e4, 3e4]),
    id="scan=42",
    params=[
        SpectrlCvParam(accession="MS:1000511", value=2),   # ms level
        SpectrlCvParam(accession="MS:1000130"),             # positive scan
        SpectrlCvParam(accession="MS:1000127"),             # centroid
    ],
)

token = encode_spectrum(spec)
```

### Decode

```python
from spectrl import decode_token

decoded = decode_token(token)
print(decoded.mz)        # numpy array
print(decoded.intensity) # numpy array
print(decoded.id)        # "scan=42"
```

### URL bindings

```python
from spectrl import to_fragment, to_query, to_data_uri, extract_token

# Embed in a URL fragment (recommended — never sent to server)
url = to_fragment(token, "https://viewer.example.com/spectrum")
# https://viewer.example.com/spectrum#spectrl1.hQ...

# Or as a query parameter
url = to_query(token, "https://viewer.example.com/spectrum")
# https://viewer.example.com/spectrum?d=spectrl1.hQ...

# Or as a data URI
uri = to_data_uri(token)
# data:application/vnd.spectrl;v=1,spectrl1.hQ...

# Extract token back from any of the above
token = extract_token(url)
```

### Trim large spectra

```python
from spectrl import top_n

# Keep the 50 most intense peaks before encoding
trimmed = top_n(spec, 50)
token = encode_spectrum(trimmed)
```

### Lossless encoding

```python
# Default is lossy MS-Numpress (~0.003 mDa m/z error, ~0.007% intensity error)
# Use lossless=True for bit-exact IEEE-754 doubles
token = encode_spectrum(spec, lossless=True)
```

## Token format

```
spectrl1.<b64url(msgpack_header)>.<b64url(mz_array)>.<b64url(intensity_array)>[.<b64url(...)>]
```

- **`spectrl1`** — magic + format version; clean version bumps.
- Segments separated by `.`; each is base64url without padding (RFC 4648 §5).
- **Header** — msgpack map with integer keys mirroring mzML structure: ms level, polarity, scan times, precursor isolation window, activation method, collision energy, ProForma interpretation, and a truncated SHA-256 content hash.
- **Array segments** — one per array type (m/z, intensity, charge, ion mobility). Each encoded as MS-Numpress (lossy) or raw IEEE-754 (lossless) + zlib, matching mzML's own `binaryDataArray` pipeline.

## Encoding precision

Measured over 479,455 peaks from a real LC-MS/MS dataset (BSA, Orbitrap):

| Array | Mean error | Max error |
|---|---|---|
| m/z (MS-Numpress linear) | 0.0025 mDa / 0.006 ppm | 0.005 mDa / 0.056 ppm |
| Intensity (MS-Numpress slof) | 0.007% relative | 0.029% relative |

## Size vs mzML

Measured on the same BSA dataset (1,684 spectra):

| Format | MS1 avg (545 peaks) | MS2 avg (109 peaks) |
|---|---|---|
| Raw mzML XML | 12,876 B | 6,004 B |
| **spectrl (lossy)** | **4,241 B** | **1,340 B** |
| spectrl (lossless) | 10,302 B | 1,909 B |

## CLI

```bash
# Encode from JSON
echo '{"mz":[147.0,175.1],"intensity":[1e5,8e4]}' | spectrl encode

# Decode a token
echo "spectrl1.hQ..." | spectrl decode

# Inspect the header as readable JSON
echo "spectrl1.hQ..." | spectrl inspect
```

## Design

- **mzML-faithful** — metadata is carried as CV accession maps (MS: ontology), mirroring mzML `cvParam` semantics. No invented field names.
- **CV binding** — all accession constants come from [mzmlpy](https://github.com/tacular-omics/mzmlpy)'s StrEnum enums; no hardcoded integers.
- **Deterministic (within an implementation)** — canonical form (m/z-ascending, fixed numpress scale factors) yields a stable token from a given implementation, plus a truncated SHA-256 content hash (key 9) verified on decode as a transport-integrity check. Token bytes are not guaranteed identical across implementations (DEFLATE/msgpack output is not canonical); see [SPECIFICATION.md](SPECIFICATION.md#8-canonical-form-and-content-hash).
- **ProForma** — carries a ProForma 2.0 peptide interpretation string (key 8), the same mechanism used by USI.

## Specification

The normative token format is specified in [SPECIFICATION.md](SPECIFICATION.md)
(draft, intended for submission to HUPO-PSI). This README is a tutorial; the
specification is the contract. A machine-readable CV/codec/key registry lives in
[schema/registry.json](schema/registry.json).

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) and the [Code of Conduct](CODE_OF_CONDUCT.md).
Changes to the on-the-wire token format are governed more strictly — see the
*Format changes* section of the contributing guide.

## License

Licensed under the [Apache License 2.0](LICENSE). If you use spectrl in
research, please cite it via [CITATION.cff](CITATION.cff).

## Related

- [mzmlpy](https://github.com/tacular-omics/mzmlpy) — the mzML parser this library bridges from
- [PSI Universal Spectrum Identifier (USI)](https://www.psidev.info/usi) — references spectra in repositories; complementary to spectrl
- [ProForma 2.0](https://www.psidev.info/proforma) — peptidoform notation carried in the token
