Metadata-Version: 2.4
Name: olmoearth-pretrain
Version: 0.0.2
Summary: A library for developing earth system foundation models
Author: OlmoEarth Team
License: OlmoEarth Artifact License
        
        By exercising the rights granted to you under this OlmoEarth Artifact License
        ("Agreement"), you accept and agree to its terms and conditions and enter into this
        Agreement with The Allen Institute for Artificial Intelligence ("Ai2"). All references
        to "you" herein means both an individual and legal entity that an individual is acting
        on behalf of.
        
        Subject to your compliance with this Agreement, Ai2 grants you permission, free of
        charge, to use the machine learning artifacts, materials, and documentation provided by
        Ai2 under this Agreement as follows (collectively, "Artifacts"):
        
        - Model weights, including architecture and parameters ("Model");
        - Associated dataset or collection of data in connection with the Model ("Dataset"); and/or
        - Associated software to process and run the Dataset and Model, including code in
          source or binary form for training, inference, and evaluation ("Code").
        
        1. Use Rights
        
        Subject to the terms in Section 2 and 3 below, you may:
        
        (a) Use, reproduce, modify, display and distribute the Artifacts, in whole or in part;
        (b) Create any other machine learning models, datasets, and derivative works that are
            derived from or based on the Artifacts, including by (i) transfer of patterns of
            the weights, parameters, operations, or outputs of the Model, (ii) generating
            outputs of the Model to produce synthetic data, and (iii) using Code to prepare any
            work of authorship (collectively, "Derivatives"); and
        (c) Publish and share Derivatives.
        
        2. Use Restrictions
        
        You will not (and will not encourage, permit, or facilitate any others to) use any
        portion of the Artifacts or Derivatives for the following purposes:
        
        (a) Any military and defense-related applications and use cases, including without
            limitation, for weapons development, military operations, intelligence gathering,
            or human surveillance and policing activities.
        (b) Any extractive activities, operations and use cases involving the removal of raw
            materials from the earth, including without limitation, to plan or facilitate the
            extraction of oil, natural gas and minerals through activities such as drilling,
            mining, and deforestation.
        
        3. Distribution
        
        In any distribution of the Artifacts or Derivatives:
        
        (a) You will cite Ai2 as the source of the Artifacts in any distribution of Artifacts
            or Derivatives.
        (b) If you distribute any portion of the Artifacts, you will either link to or provide
            a copy of this Agreement to all third party recipients.
        (c) If you distribute any Derivatives, you may add your own intellectual property
            notices and apply other licenses and terms of use, provided that you include and
            require the use restrictions in Section 2 in all downstream distribution unless Ai2
            provides written approval otherwise.
        
        4. Termination
        
        This Agreement will automatically terminate with immediate effect and without notice to
        you in the following circumstances:
        
        (a) Your breach of the use restrictions in Section 2 and any other terms and conditions
            herein; or
        (b) If you file, maintain, or voluntarily participate in a lawsuit against any person
            or entity asserting that the Artifacts or any portion thereof directly or
            indirectly infringe any patent, except where a lawsuit is filed in response to a
            corresponding lawsuit first brought against you.
        
        For the avoidance of doubt, Ai2 may also offer the Artifacts under separate terms and
        conditions or stop distributing the Artifacts at any time; however, doing so will not
        terminate this Agreement. You may continue to use the Artifacts under this Agreement
        unless it is terminated in accordance with the circumstances expressly stated herein.
        
        5. Rights Not Covered
        
        This Agreement does not cover any patents or trademarks associated with the Artifacts,
        including with respect to any individual items of information and materials that are
        included or incorporated within a Dataset ("Contents"). Such Contents may be factual
        data or independent works such as text, images, audio, and audio visual material.
        Contents may be subject to other rights, including copyright, patent, data protection,
        privacy, or personality rights, and this Agreement does not cover such rights. The use
        rights in Section 1 expressly exclude any and all other rights that may apply to the
        Contents of a Dataset.
        
        6. Disclaimer and Limitation of Liability
        
        (a) THE ARTIFACTS ARE PROVIDED "AS IS", "AS AVAILABLE", AND "WITH ALL FAULTS", WITHOUT
            ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION, IMPLIED
            WARRANTIES OF MERCHANTABILITY, TITLE, NON-INFRINGEMENT, FITNESS FOR A PARTICULAR
            PURPOSE, ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OR ABSENCE
            OF ERRORS, WHETHER OR NOT KNOWN OR DISCOVERABLE. AI2 MAKES NO REPRESENTATIONS OR
            WARRANTIES AS TO THE RELIABILITY, COMPLETENESS, QUALITY, PERFORMANCE,
            FUNCTIONALITY, OR UTILITY OF ANY ARTIFACTS. ANY USE OF THE ARTIFACTS IS AT YOUR
            SOLE RISK AND DISCRETION. YOU ARE SOLELY RESPONSIBLE FOR (1) CLEARING ANY THIRD
            PARTY RIGHTS THAT MAY APPLY TO OR BE EMBODIED IN ANY ARTIFACTS, INCLUDING ANY
            CONTENTS IN A DATASET; (2) OBTAINING ANY NECESSARY RIGHTS, LICENSES, CONSENTS, OR
            PERMISSIONS REQUIRED FOR YOUR USE OF THE ARTIFACTS; AND (3) PERFORMING ANY DUE
            DILIGENCE ON THE ARTIFACTS TO VERIFY SUITABILITY FOR YOUR INTENDED USE.
        
        (b) TO THE MAXIMUM EXTENT PERMITTED UNDER APPLICABLE LAWS, IN NO EVENT WILL AI2 BE
            LIABLE FOR ANY CLAIM, DAMAGES, LOSSES, OR OTHER LIABILITY OF ANY KIND WHATSOEVER,
            WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM OR IN CONNECTION
            WITH THE ARTIFACTS OR YOUR USE THEREOF.
        
        (c) The disclaimers and limitations of liability set forth above will be interpreted in
            a manner that, to the greatest extent possible, constitute an absolute disclaimer
            and waiver of all liability.
        
        7. Other Agreements
        
        If you have entered into a separate written agreement with Ai2 regarding your use of
        the specific Artifacts that are subject to this Agreement ("Other Agreement"), the
        terms of such Other Agreement will supplement the terms herein. To the extent of any
        conflict between the terms of the Other Agreement and this Agreement, the Other
        Agreement will take precedence.
        
        8. Miscellaneous
        
        If any term or provision of this Agreement is deemed invalid or unenforceable, it will
        automatically be reformed to the minimum extent necessary to make it enforceable. To
        the extent it cannot be reformed, it will be severed from this Agreement, and the
        remaining terms and conditions will remain in full force and effect. Any delay or
        failure by Ai2 to take any action or enforce any breach of this Agreement will not be
        deemed as a waiver or consent by Ai2. No term or provision of this Agreement will be
        waived by Ai2 unless expressly agreed in writing. Nothing in this Agreement constitutes
        as a limitation of any privileges, immunities, and rights that apply to you or Ai2
        under applicable laws, including from the legal processes of any jurisdiction or
        authority.
        
Requires-Python: <3.14,>=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: einops>=0.7.0
Requires-Dist: huggingface_hub
Requires-Dist: numpy>=1.26.4
Requires-Dist: torch<2.8,>=2.7
Requires-Dist: universal-pathlib>=0.2.5
Dynamic: license-file

<div align="center">
  <img src="https://raw.githubusercontent.com/allenai/olmoearth_pretrain/main/assets/OlmoEarth-logo.png" alt="OlmoEarth Logo" style="width: 600px; margin-left:'auto' margin-right:'auto' display:'block'"/>
  <br>
  <br>
</div>
<p align="center">
  <a href="https://github.com/allenai/olmoearth_pretrain/blob/main/LICENSE">
    <img alt="GitHub License" src="https://img.shields.io/badge/license-OlmoEarth-green">
  </a>
  <a href="https://huggingface.co/collections/allenai/olmoearth">
    <img alt="Model Checkpoints" src="https://img.shields.io/badge/%F0%9F%A4%97%20HF-Models-yellow">
  </a>
  <a href="https://allenai.org/papers/olmoearth">
    <img alt="Paper PDF" src="https://img.shields.io/badge/OlmoEarth-pdf-blue">
  </a>
</p>

The OlmoEarth models are a flexible, multi-modal, spatio-temporal family of foundation models for Earth Observations.

The OlmoEarth models exist as part of the [OlmoEarth platform](https://olmoearth.allenai.org/). The OlmoEarth Platform is an end-to-end solution for scalable planetary intelligence, providing everything needed to go from raw data through R&D, to fine-tuning and production deployment.

## Installation

We recommend Python 3.12, and recommend using [uv](https://docs.astral.sh/uv/getting-started/installation/).
To install dependencies with uv, run:

```bash
git clone git@github.com:allenai/olmoearth_pretrain.git
cd olmoearth_pretrain
uv sync --locked --all-groups --python 3.12
# only necessary for development
uv tool install pre-commit --with pre-commit-uv --force-reinstall
```

uv installs everything into a venv, so to keep using python commands you can activate uv's venv: `source .venv/bin/activate`. Otherwise, swap to `uv run python`.

### Inference-Only Installation

For inference and model loading without training dependencies:
```bash
uv sync --locked
```

OlmoEarth is built using [OLMo-core](https://github.com/allenai/OLMo-core.git). OLMo-core's published [Docker images](https://github.com/orgs/allenai/packages?repo_name=OLMo-core) contain all core and optional dependencies.


## Model Summary

<img src="https://raw.githubusercontent.com/allenai/olmoearth_pretrain/main/assets/model.png" alt="Model Architecture Diagram" style="width: 800px; margin-left:'auto' margin-right:'auto' display:'block'"/>

The OlmoEarth models are trained on three satellite modalities (Sentinel 2, Sentinel 1 and Landsat) and six derived maps (OpenStreetMap, WorldCover, USDA Cropland Data Layer, SRTM DEM, WRI Canopy Height Map, and WorldCereal).
| Model Size | Weights | Encoder Params | Decoder Params |
| --- | --- | --- | --- |
| Nano | [link](https://huggingface.co/allenai/OlmoEarth-v1-Nano) | 1.4M | 800K |
| Tiny | [link](https://huggingface.co/allenai/OlmoEarth-v1-Tiny) | 6.2M | 1.9M |
| Base | [link](https://huggingface.co/allenai/OlmoEarth-v1-Base) | 89M | 30M |
| Large | [link](https://huggingface.co/allenai/OlmoEarth-v1-Large) | 308M | 53M |

## Using OlmoEarth

[InferenceQuickstart](docs/Inference-Quickstart.md) shows how to initialize the
OlmoEarth model and apply it on a satellite image.

We also have several more in-depth tutorials for computing OlmoEarth embeddings and fine-tuning OlmoEarth on downstream tasks:

- [Fine-tuning OlmoEarth for Segmentation](https://github.com/allenai/olmoearth_projects/blob/main/docs/tutorials/FinetuneOlmoEarthSegmentation.md)
- [Computing Embeddings using OlmoEarth](https://github.com/allenai/rslearn/blob/master/docs/examples/OlmoEarthEmbeddings.md)
- [Fine-tuning OlmoEarth in rslearn](https://github.com/allenai/rslearn/blob/master/docs/examples/FinetuneOlmoEarth.md)

Additionally, [`olmoearth_projects`](https://github.com/allenai/olmoearth_projects) has several examples of active OlmoEarth deployments.

## Data Summary

Our pretraining dataset contains 285,288 samples from around the world of 2.56km×2.56km regions, although many samples contain only a subset of the timesteps and modalities.

The distribution of the samples is available below:

<img src="https://raw.githubusercontent.com/allenai/olmoearth_pretrain/main/assets/datamap.png" alt="Training sample distribution" style="width: 500px; margin-left:'auto' margin-right:'auto' display:'block'"/>

The dataset can be downloaded [here](https://huggingface.co/datasets/allenai/olmoearth_pretrain_dataset).

Detailed instructions on how to make your own pretraining dataset are available in [the dataset README](docs/Dataset-Creation.md).

## Training scripts

Detailed instructions on how to pretrain your own OlmoEarth model are available in [Pretraining.md](docs/Pretraining.md).

## Evaluations

Detailed instructions on how to replicate our evaluations is available here:

- [Evaluations on Research Benchmarks](docs/Evaluation.md)
- [Evaluations on Partner Tasks](https://github.com/allenai/rslearn_projects/blob/master/rslp/olmoearth_evals/README.md)

## Running Tests

Tests can be run with different dependency configurations using `uv run`:

```bash
# Full test suite (all dependencies - flash attn including olmo-core)
uv run --all-groups --no-group flash-attn pytest tests/

# Model loading tests with full deps (with olmo-core)
uv run --all-groups --no-group flash-attn pytest tests_minimal_deps/

# Model loading tests with minimal deps only (no olmo-core)
uv run --group dev pytest tests_minimal_deps/
```

The `tests_minimal_deps/` directory contains tests that verify model loading works both with and without `olmo-core` installed. These run twice in CI to ensure compatibility.

## License

This code is licensed under the [OlmoEarth Artifact License](LICENSE).
