Metadata-Version: 2.4
Name: cellarr-se
Version: 0.1.0
Summary: TileDB-backed SummarizedExperiment using cellarr objects
Home-page: https://github.com/CellArr/cellarr-se
Author: chanjd
Author-email: chan.donny1@gmail.com
License: MIT
Project-URL: Source, https://github.com/CellArr/cellarr-se
Project-URL: Tracker, https://github.com/CellArr/cellarr-se/issues
Project-URL: Changelog, https://github.com/CellArr/cellarr-se/blob/main/CHANGELOG.md
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
License-File: LICENSE.txt
Requires-Dist: pandas<3.0
Requires-Dist: numpy
Requires-Dist: cellarr-array
Requires-Dist: cellarr-frame
Requires-Dist: summarizedexperiment
Provides-Extra: testing
Requires-Dist: setuptools; extra == "testing"
Requires-Dist: pytest; extra == "testing"
Requires-Dist: pytest-cov; extra == "testing"
Dynamic: license-file

[![PyPI-Server](https://img.shields.io/pypi/v/cellarr-se.svg)](https://pypi.org/project/cellarr-se/)
[![CI](https://github.com/CellArr/cellarr-se/actions/workflows/run-tests.yml/badge.svg)](https://github.com/CellArr/cellarr-se/actions/workflows/run-tests.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

# cellarr-se

`cellarr-se` is a read-only, out-of-core coordinator for TileDB-backed genomic datasets. It wraps the [`cellarr-array`](https://pypi.org/project/cellarr-array/) and [`cellarr-frame`](https://pypi.org/project/cellarr-frame/) primitives into a lazy, [`SummarizedExperiment`](https://pypi.org/project/summarizedexperiment/)-compatible interface, so you can slice large genomics datasets stored on disk without loading them into memory.

Single-cell and bulk RNA-seq datasets frequently exceed available RAM. `cellarr-se` keeps assay matrices and metadata tables on disk as TileDB arrays, performing synchronized lazy slices across all components only when you request them. The result is always a standard in-memory `SummarizedExperiment` object.

## Install

```bash
pip install cellarr-se
```

## Usage

### Construction

`CellArraySE` wraps existing TileDB arrays and frames; it does not create them. Use `cellarr-array` and `cellarr-frame` to build the backing stores first.

```python
from cellarr_se import CellArraySE

se = CellArraySE(
    assays={"counts": my_cell_array, "tpm": my_tpm_array},
    row_data=my_row_frame,   # gene annotations (CellArrayFrame)
    col_data=my_col_frame,   # sample annotations (CellArrayFrame)
)
```

### Inspection

```python
se.shape          # (n_genes, n_samples)
se.assay_names    # ["counts", "tpm"]
se.row_names      # pd.Index of gene identifiers
se.col_names      # pd.Index of sample identifiers
se.row_columns    # list of gene metadata fields
se.col_columns    # list of sample metadata fields

se.show()         # print a summary with the first 5 rows of each metadata table
repr(se)          # <CellArraySE: 20000x500 | counts, tpm>
```

### Slicing

Bracket notation supports integer indices, slices, name strings, and lists:

```python
# Positional slice
subset = se[0:100, 0:50]

# Single element
gene = se[5, 3]

# Lists of indices or names
subset = se[["BRCA1", "TP53"], ["sample_001", "sample_042"]]
```

For attribute-filtered access, use `slice()` with TileDB query strings:

```python
# Filter rows and columns by metadata attributes
subset = se.slice(
    row_query="gene_type == 'protein_coding'",
    col_query="tissue == 'liver'",
)

# Combine query with explicit column selection
subset = se.slice(
    row_query="gene_type == 'protein_coding'",
    col_subset=slice(0, 50),
    assays=["counts"],
    row_columns=["gene_id", "gene_name"],
)
```

Both `se[...]` and `se.slice(...)` return a standard in-memory `SummarizedExperiment`.

### Assay metadata

```python
se.is_sparse("counts")        # True if backed by SparseCellArray
se.get_assay_type("counts")   # numpy dtype of the assay
```

## Demo

A worked example covering construction, inspection, and slicing is available in the [demo notebook](https://cellarr-se.readthedocs.io/en/latest/demo.html).

## Note

This project has been set up using [BiocSetup](https://github.com/biocpy/biocsetup)
and [PyScaffold](https://pyscaffold.org/).
