Metadata-Version: 2.4
Name: radiens-drive-catalog
Version: 0.0.1
License-File: LICENSE
Requires-Python: >=3.12
Requires-Dist: google-api-python-client>=2.188.0
Requires-Dist: google-auth-oauthlib>=1.2.4
Requires-Dist: google-auth>=2.47.0
Requires-Dist: pandas>=2.3.3
Description-Content-Type: text/markdown

# radiens-drive-catalog

A Python package for programmatically managing large neural datasets stored on Google Drive. It handles Drive scanning, local cataloging, and selective dataset download. Analysis is done locally — this package is purely about data management.

## Overview

Neural data is stored as xdat filesets (NeuroNexus format) on a shared Google Drive. Each dataset consists of 3 files sharing a common `base_name`:

```
{base_name}_data.xdat
{base_name}.xdat.json
{base_name}_timestamp.xdat
```

`radiens-drive-catalog` scans the Drive hierarchy, builds a local catalog indexed by `base_name`, and lets you query and download datasets selectively.

## Usage

```python
from radiens_drive_catalog import Catalog, Config

config = Config.from_file("config.json")
catalog = Catalog(config)

# Scan Drive and build the catalog
catalog.scan()

# Query datasets
catalog.list()                                      # everything
catalog.list(date="2026-02-15_batch")               # all datasets in a date folder
catalog.list(date="2026-02-15_batch", experiment="reaching")  # narrowed to an experiment

# Access the raw DataFrame
catalog.df

# Check what's available locally
catalog.status()

# Download a dataset
catalog.download("rat01_session3")

# Get the local path, downloading automatically if needed
path = catalog.get_path("rat01_session3")
```

## Configuration

Create a `config.json` (outside your repo — do not commit it):

```json
{
    "credentials_path": "/path/to/service_account.json",
    "root_folder_id": "your-drive-folder-id",
    "local_data_dir": "/path/to/local/data",
    "catalog_path": "/path/to/local/data/catalog.json"
}
```

Or set a single environment variable pointing at the config file, and call `from_file()` with no arguments:

```bash
export RADIENS_DRIVE_CATALOG_CONFIG=/path/to/config.json
```

```python
config = Config.from_file()
```

The `root_folder_id` is the alphanumeric string in the Drive URL when you're inside the root data folder.

## Authentication

This package uses a Google service account for shared access among collaborators. To set it up:

1. Create a project in [Google Cloud Console](https://console.cloud.google.com)
2. Enable the Google Drive API
3. Create a service account and download its JSON credentials file
4. Share your root Drive data folder with the service account's email address (Viewer access is sufficient)
5. Point `credentials_path` in your config at the downloaded JSON file

Distribute the credentials file to collaborators securely — treat it like a password.

## Installation

This project uses `uv` for dependency management. If you don't have it:

**macOS / Linux:**

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

**Windows:**

```powershell
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
```

Then install the project:

```bash
uv sync
```

## Development

```bash
uv run pytest          # run tests
uv run mypy            # type checking
uv run ruff check .    # linting
uv run ruff format .   # formatting
```
