Metadata-Version: 2.4
Name: radiens-drive-catalog
Version: 0.0.8
License-File: LICENSE
Requires-Python: >=3.12
Requires-Dist: google-api-python-client>=2.188.0
Requires-Dist: google-auth-oauthlib>=1.2.4
Requires-Dist: google-auth>=2.47.0
Requires-Dist: pandas>=2.3.3
Requires-Dist: tqdm>=4.67.3
Description-Content-Type: text/markdown

# radiens-drive-catalog

A Python package for programmatically managing large neural datasets stored on Google Drive. It handles Drive scanning, local cataloging, and selective dataset download. Analysis is done locally — this package is purely about data management.

Documentation: <https://neuronexus.github.io/radiens-drive-catalog/latest/>

## Overview

Neural data is stored as xdat filesets (NeuroNexus format) on a shared Google Drive. Each dataset consists of 3 files sharing a common `base_name`:

```
{base_name}_data.xdat
{base_name}.xdat.json
{base_name}_timestamp.xdat
```

`radiens-drive-catalog` scans the Drive hierarchy, builds a local catalog indexed by `base_name`, and lets you query and download datasets selectively. Non-xdat content found alongside datasets — logs directories, PowerPoints, writeups — is also discovered and tracked as **assets**.

## Usage

### Datasets

```python
from radiens_drive_catalog import Catalog, Config

config = Config.from_file("config.json")
catalog = Catalog(config)

# Scan Drive and build the catalog (discovers datasets and assets)
catalog.scan()

# Query datasets using pandas directly
catalog.df
catalog.list()                                                          # everything
catalog.list(drive_path="2026-02-15_batch/reaching")                   # exact folder
catalog.list(drive_path_prefix="2026-02-15_batch")                     # full date subtree
catalog.list(drive_path_contains="reaching")                            # any depth

# Download a dataset (3 xdat files)
catalog.download("2026-02-15_batch/reaching", "rat01_session3")

# Get the local path, downloading automatically if needed
path = catalog.get_path("2026-02-15_batch/reaching", "rat01_session3")
```

### Assets (non-xdat content)

Non-xdat files and folders (e.g. `logs/`, PowerPoints, writeups) found inside experiment folders are automatically cataloged as assets during `scan()`.

```python
# Query assets using pandas directly
catalog.assets_df
catalog.assets_df[catalog.assets_df["drive_path"].str.startswith("2026-02-15_batch")]
catalog.assets_df[catalog.assets_df["asset_type"] == "folder"]

# Download an asset (drive_path is the slash-joined path to the asset's parent folder)
catalog.download_asset("2026-02-15_batch/reaching", "logs")

# Get the local path, downloading automatically if needed
path = catalog.get_asset_path("2026-02-15_batch/reaching", "logs")
```

Assets land under `local_data_dir/assets/{drive_path}/{asset_name}`, separate from the path-mirrored xdat dataset files.

## Configuration

Create a `config.json` (outside your repo — do not commit it):

```json
{
    "credentials_path": "/path/to/service_account.json",
    "root_folder_id": "your-drive-folder-id",
    "local_data_dir": "/path/to/local/data",
    "catalog_path": "/path/to/local/data/catalog.json"
}
```

`Config.from_file()` locates the config file using this resolution order:

1. Explicit `path` argument.
2. `RADIENS_DRIVE_CATALOG_CONFIG` environment variable.
3. `.secrets/config.json` in the current working directory.
4. `config.json` in the current working directory.
5. `~/.config/radiens-drive/config.json`.
6. `/etc/radiens-drive/config.json`.

```python
# Automatic discovery (env var or well-known paths)
config = Config.from_file()

# Explicit path
config = Config.from_file("/path/to/config.json")
```

The `root_folder_id` is the alphanumeric string in the Drive URL when you're inside the root data folder.

## Authentication

This package uses a Google service account for shared access among collaborators. To set it up:

1. Create a project in [Google Cloud Console](https://console.cloud.google.com)
2. Enable the Google Drive API
3. Create a service account and download its JSON credentials file
4. Share your root Drive data folder with the service account's email address (Viewer access is sufficient)
5. Point `credentials_path` in your config at the downloaded JSON file

Distribute the credentials file to collaborators securely — treat it like a password.

## Installation

This project uses `uv` for dependency management. If you don't have it:

**macOS / Linux:**

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

**Windows:**

```powershell
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
```

Then install the project:

```bash
uv sync
```

## Development

```bash
uv run pytest          # run tests
uv run mypy            # type checking
uv run ruff check .    # linting
uv run ruff format .   # formatting
```
