Metadata-Version: 2.4
Name: physoce-datasets
Version: 0.1.0
Summary: A Python package and CLI for downloading various physical oceanographic datasets.
Requires-Python: >=3.12
Requires-Dist: cdsapi>=0.7.7
Requires-Dist: cfgrib>=0.9.15.1
Requires-Dist: click>=8.2.1
Requires-Dist: copernicusmarine>=2.3.0
Requires-Dist: ecmwf-datastores-client>=0.5.1
Requires-Dist: gcsfs>=2026.3.0
Requires-Dist: h5py>=3.16.0
Requires-Dist: ipykernel>=7.2.0
Requires-Dist: metpy>=1.7.1
Requires-Dist: numpy>=2.4.4
Requires-Dist: pycoare>=0.4.0
Requires-Dist: python-dotenv>=1.2.2
Requires-Dist: xarray[accel,io,parallel]>=2026.2.0
Description-Content-Type: text/markdown

# physoce-datasets

A Python package and CLI for downloading various physical oceanographic datasets.

## Install

Clone the package and navigate to the directory:

```bash
git clone https://github.com/andrew-s28/physoce-datasets.git
cd physoce-datasets
```

### uv

`uv` is the preferred way to manage this package. Please refer to the [uv installation instructions](https://docs.astral.sh/uv/getting-started/installation/). Once uv is installed, you can initialize a virtual environment with `uv sync` or you can run any commands directly with `uv run` and `uv` will handle the `venv` creation and activation auto-magically.

### pip

Of course, you can also use the classic `pip`, but you have to handle creating and activating the `venv` yourself:

```bash
python -m venv .venv
source .venv/bin/activate
pip install -e .
```

## Credentials

Upon usage, this package may prompt for credentials to various data stores. Please refer to their documentation for how to access credentials and how credentials are stored:

- [Copernicus Marine Services](https://toolbox-docs.marine.copernicus.eu/en/stable/usage/login-usage.html)
- [Copernicus Climate Data Store](https://cds.climate.copernicus.eu/how-to-api)

## CLI usage

Run the CLI with uv:

```bash
uv run datasets.py --help
```

Note that if you prefer the `pip` environment management, activate your environment according to the above and replace all `uv run` commands with `python`, e.g.:

```bash
python datasets.py --help
```

Available commands:

- `eke`: Download altimetry-derived geostrophic velocities and compute eddy kinetic energy from Copernicus Marine Services.
- `era5`: ERA5 workflow group with `submit` and `download` subcommands.

Show command help:

```bash
uv run datasets.py eke --help
```

### Options

All commands share the same base options:

- `--save-dir`: Directory where the dataset file is written. If not set, defaults to the package `data/` directory.
- `--start-date`: Start date in `YYYY-MM-DD` format. If not set, uses the earliest available date.
- `--end-date`: End date in `YYYY-MM-DD` format. If not set, uses the latest available date

Run `eke` download with defaults:

```bash
uv run datasets.py eke
```

Submit ERA5 jobs with explicit options:

```bash
uv run datasets.py era5 submit --save-dir data --start-date 2020-01-01 --end-date 2020-01-31 --max-active-requests 10
```

Download and process ERA5 files once all remote jobs are successful:

```bash
uv run datasets.py era5 download --save-dir data
```

The ERA5 workflow writes its progress to `submitted_requests.csv` inside the save directory. If the script exits or is interrupted, rerun the same command to resume from saved state.
