Metadata-Version: 2.4
Name: aef-download
Version: 0.1.0
Summary: Download yearly AlphaEarth embeddings for arbitrary regions.
Author-email: JunCao <caojun@whu.edu.cn>
License-Expression: MIT
Project-URL: Homepage, https://github.com/Caojun-whu/aef-download
Project-URL: Documentation, https://github.com/Caojun-whu/aef-download#readme
Project-URL: Repository, https://github.com/Caojun-whu/aef-download
Project-URL: Source, https://github.com/Caojun-whu/aef-download
Project-URL: Issues, https://github.com/Caojun-whu/aef-download/issues
Project-URL: Changelog, https://github.com/Caojun-whu/aef-download/blob/main/CHANGELOG.md
Keywords: alphaearth,earth observation,geospatial,remote sensing,zarr
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: GIS
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Requires-Dist: geopandas>=0.14
Requires-Dist: zarr>=3.0.0
Requires-Dist: affine>=2.4
Requires-Dist: obstore>=0.4
Requires-Dist: rasterio>=1.3
Requires-Dist: shapely>=2.0
Requires-Dist: tqdm>=4.66
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Dynamic: license-file

# aef-download

`aef-download` is a Python package and CLI for downloading yearly
**AlphaEarth Foundations** embeddings for arbitrary regions from the public
AlphaEarth mosaic hosted on Source Cooperative.

It is designed for workflows where you want to:

- download one or more years for a custom region
- use a boundary file or a direct bbox
- write local yearly `Zarr` stores
- optionally mask pixels outside the requested polygon
- resume interrupted downloads safely

## Features

- boundary files readable by `geopandas` (`GeoJSON`, `GPKG`, `SHP`, and more)
- direct bbox input in `EPSG:4326`
- yearly downloads for `2017-2025`
- local yearly `Zarr` outputs
- optional boundary mask
- optional dequantized `float32` output
- resumable downloads via `.progress.json`
- both CLI and Python API

## Install

### From PyPI

Once published:

```bash
pip install aef-download
```

### From GitHub

```bash
pip install git+https://github.com/Caojun-whu/aef-download.git
```

### From a local clone

```bash
pip install -e .
```

## Data source

The package reads from the public AlphaEarth mosaic:

- [https://source.coop/tge-labs/aef-mosaic](https://source.coop/tge-labs/aef-mosaic)

This package downloads **regional subsets** from that public mosaic. It does
not download the full global dataset.

By default, downloaded embeddings are stored as quantized `int8`. Use
`dequantize=True` or `--dequantize` if you want `float32` outputs instead.

## CLI quickstart

Download one year using a boundary file:

```bash
aef-download download-year \
  --boundary region.geojson \
  --year 2022 \
  --output-dir ./out \
  --mask-boundary
```

Download multiple years:

```bash
aef-download download-years \
  --boundary region.geojson \
  --year-start 2017 \
  --year-end 2025 \
  --output-dir ./out \
  --mask-boundary
```

Download using a bbox:

```bash
aef-download download-year \
  --bbox 112.95 22.56 114.05 23.93 \
  --year 2022 \
  --output-dir ./out
```

Estimate storage without downloading:

```bash
aef-download estimate-years \
  --boundary region.geojson \
  --year-start 2017 \
  --year-end 2025
```

## Python API quickstart

```python
from aef_download import download_year, estimate_year

estimate = estimate_year(
    boundary_path="region.geojson",
    year=2022,
)
print(estimate["estimated_storage"])

output = download_year(
    boundary_path="region.geojson",
    year=2022,
    output_dir="./out",
    mask_boundary=True,
)
print(output)
```

Using a bbox:

```python
from aef_download import download_year

output = download_year(
    bbox=(112.95, 22.56, 114.05, 23.93),
    year=2022,
    output_dir="./out",
)
print(output)
```

## Output format

Each year is written as:

- `{prefix}_{year}.zarr`
- `{prefix}_{year}.zarr.progress.json`

The Zarr store contains:

- `embeddings`
- `x`
- `y`
- `mask` when `mask_boundary=True`

Typical Zarr attributes include:

- `year`
- `bbox`
- `boundary_source`
- `mask_boundary`
- `dequantized`
- source mosaic settings

## Resume behavior

Downloads are written chunk by chunk. If interrupted, rerun the same command
without `--overwrite`; completed chunks will be skipped and the download will
resume from the progress file.

## Notes

- Supported years are `2017-2025`.
- Bboxes are interpreted as `(xmin, ymin, xmax, ymax)` in `EPSG:4326`.
- Boundary files are normalized to `EPSG:4326` before slicing the AlphaEarth
  mosaic.

## Development

Run tests:

```bash
python -m pytest tests -q
```

Check the CLI:

```bash
aef-download --help
```

Build a release:

```bash
python -m build
python -m twine check dist/*
```
