Metadata-Version: 2.4
Name: zombie-squirrel
Version: 0.28.6
Summary: Generated from aind-library-template
Author: Allen Institute for Neural Dynamics
License: MIT
Classifier: Programming Language :: Python :: 3
Requires-Python: <3.14,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: duckdb
Requires-Dist: pyarrow
Requires-Dist: boto3
Requires-Dist: pandas>=2.2.0
Requires-Dist: aind-data-access-api[docdb]
Dynamic: license-file

# ZOMBIE Squirrel

[![License](https://img.shields.io/badge/license-MIT-brightgreen)](LICENSE)
![Code Style](https://img.shields.io/badge/code%20style-black-black)
[![semantic-release: angular](https://img.shields.io/badge/semantic--release-angular-e10079?logo=semantic-release)](https://github.com/semantic-release/semantic-release)
![Interrogate](https://img.shields.io/badge/interrogate-37.4%25-red)
![Coverage](https://img.shields.io/badge/coverage-24%25-red)
![Python](https://img.shields.io/badge/python->=3.10,<3.14-blue?logo=python)

<img src="zombie-squirrel_logo.png" width="400" alt="Logo (image from ChatGPT)">

`zombie-squirrel` is a set of one-line functions that handle the entire process of caching and retrieving data (and metadata) from AIND data assets.

In the background, the ZOMBIE squirrel repackages data/metadata into dataframes and stores them on S3 in versioned folders (`data-asset-cache/zs-v{version}/`), or in memory for testing. Each release writes to its own versioned folder, so older versions of the website remain accessible while new versions are deployed. A top-level `data-asset-cache/zombie-squirrels.json` index lists all available version folders.

Important: this package is not at 1.0. It is changing *fast* and breaking changes are still occurring, although rarely. To reduce the chance of impact on your code the cache tables are versioned. This does mean that if you want the latest version of the tables you need to keep zombie-squirrel up-to-date, but it also means your code won't immediately break when I change the way the tables work.

## Installation

```bash
pip install zombie-squirrel
```

## Usage

### Set backend

```bash
export FOREST_TYPE='S3'
```

Options are 'S3', 'MEMORY'.

### Scurry (fetch) data

```python
from zombie_squirrel import unique_project_names

project_names = unique_project_names()
```

#### Acorns

`get_squirrel_info` returns the following information about all available acorns. Paths are versioned — `{version}` is the installed `zombie-squirrel` package version (e.g. `0.27.3`).

| Acorn | Description | Location | Type | Partitioned | Columns |
| ----- | ----------- | -------- | ---- | ----------- | ------- |
| `unique_project_names` | Unique project names across all assets | `s3://allen-data-views/data-asset-cache/zs-v{version}/unique_project_names.pqt` | metadata | False | `project_name` |
| `unique_subject_ids` | Unique subject_ids across all assets | `s3://allen-data-views/data-asset-cache/zs-v{version}/unique_subject_ids.pqt` | metadata | False | `subject_id` |
| `unique_genotypes` | Unique genotypes across all assets where `subject.subject_details.genotype` is present | `s3://allen-data-views/data-asset-cache/zs-v{version}/unique_genotypes.pqt` | metadata | False | `genotype` |
| `asset_basics` | Commonly used asset metadata, one row per data asset | `s3://allen-data-views/data-asset-cache/zs-v{version}/asset_basics.pqt` | metadata | False | `_id`, `_last_modified`, `modalities`, `project_name`, `data_level`, `subject_id`, `acquisition_start_time`, `acquisition_end_time`, `code_ocean`, `process_date`, `genotype`, `age`, `acquisition_type`, `location`, `name`, `experimenters`, `experimenters_normalized`, `instrument_id`, `instrument_id_normalized`, `investigators`, `investigators_normalized` |
| `source_data` | Mapping from derived asset names to their source raw asset names | `s3://allen-data-views/data-asset-cache/zs-v{version}/source_data.pqt` | metadata | False | `name`, `source_data`, `pipeline_name`, `processing_time` |
| `quality_control` | Quality control table with one row per QC metric, partitioned by subject_id | `s3://allen-data-views/data-asset-cache/zs-v{version}/qc/` | asset | True (by `subject_id`) | `name`, `stage`, `modality`, `value`, `status`, `asset_name` |
| `platform_qc` | Tag-level QC statuses aggregated per platform, one row per asset/tag combination | `s3://allen-data-views/data-asset-cache/zs-v{version}/platform_qc/` | platform | True (by `platform`) | `asset_name`, `tag`, `status`, `timestamp`, `instrument_id_normalized`, `experimenters_normalized` |
| `assets_smartspim` | SmartSPIM assets with processing status and neuroglancer links, one row per (asset, channel) | `s3://allen-data-views/data-asset-cache/zs-v{version}/assets_smartspim.pqt` | metadata | False | `name`, `raw_name`, `processed`, `institution`, `processing_end_time`, `stitched_link`, `raw_link`, `channel`, `segmentation_link`, `quantification_link`, `alignment_link` |
| `platform_fib` | Fiber photometry assets in long form, one row per asset/fiber/channel combination | `s3://allen-data-views/data-asset-cache/zs-v{version}/platform_fib.pqt` | metadata | False | `asset_name`, `fiber`, `patch_cord`, `channel`, `intended_measurement`, `targeted_structure` |
| `foraging_sessions` | Foraging behavior sessions with key performance metrics, one row per session | `s3://allen-data-views/data-asset-cache/zs-v{version}/foraging_sessions.pqt` | metadata | False | `subject_id`, `session_date`, `session`, `nwb_suffix`, `rig`, `trainer`, `trainer_normalized`, `task`, `curriculum_name`, `curriculum_version`, `current_stage_actual`, `foraging_eff`, `foraging_eff_random_seed`, `finished_trials`, `finished_rate`, `total_trials`, `bias_naive` |
| `behavior_curriculum` | Behavior assets with curriculum name and stage, one row per behavior asset | `s3://allen-data-views/data-asset-cache/zs-v{version}/behavior_curriculum.pqt` | asset | False | `asset_name`, `curriculum_name`, `stage_name`, `stage_node_id` |

Hive-partitioned acorns use `key=value` directory segments, enabling DuckDB queries like:

```python
import duckdb
duckdb.query("""
    SELECT * FROM read_parquet(
        's3://allen-data-views/data-asset-cache/zs-v0.27.3/qc/**',
        hive_partitioning=true,
        union_by_name=true
    )
""")
```

The `squirrel.json` registry lives at `s3://allen-data-views/data-asset-cache/zs-v{version}/squirrel.json`. The top-level `s3://allen-data-views/data-asset-cache/zombie-squirrels.json` lists all available version folders as a JSON array.

The `raw_to_derived` function is not a table stored in S3, instead it is used by passing an asset_name (or list of asset names) and a modality. The function returns the latest derived asset matching the requested pattern.

### Custom acorn

The `custom` function allows you to store and retrieve your own user-defined DataFrames in the cache by name. This requires write authentication to the active backend.

```python
from zombie_squirrel import custom
import pandas as pd

df = pd.DataFrame({"col": [1, 2, 3]})
custom("my_data", df)

retrieved_df = custom("my_data")
```

### Hide all the acorns

We run a nightly capsule on Code Ocean with this code to hide all acorns (not the custom ones).

```python
from zombie_squirrel.sync import hide_acorns
hide_acorns()
```
