Metadata-Version: 2.4
Name: drb-chunk-sentinel2
Version: 0.1.0
Summary: Sentinel-2 product chunk descriptors (drb-chunk)
Author: GAEL Systems
Author-email: drb-python@gael.fr
License: LGPLv3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)
Requires-Python: <3.14,>=3.11
Description-Content-Type: text/markdown
License-File: LICENCE.txt
Requires-Dist: rdflib
Dynamic: license-file

# drb-chunk-sentinel2

Chunk descriptors for the Sentinel-2 **L2A** product type, built on top of the
[drb-chunk](../../core/README.md) add-on.

## What it is

`drb-chunk-sentinel2` ships a generated `cortex.ttl` that declares **68 chunk
descriptors**, grouped into **4 collections** (`R10m`, `R20m`, `R60m`, `QI`),
on the Sentinel-2 L2A class URI:

```
http://knowledge-base.gael.fr/drb/sentinel-2/product_user_level-2a
```

Each descriptor is a `drb:chunk` entry linking a logical chunk name (e.g.
`B04_10m`) and a `drb:collection` (e.g. `R10m`) to an XQuery expression that
navigates the product tree to the corresponding JP2 raster node.

The package does **not** modify the Sentinel-2 topic TTL and registers **no
entry-point group**. It is a pure descriptor extension merged into the topic
graph at runtime (see [How attachment works](#how-attachment-works)).

**L1C** support is planned — same pattern, different `IMG_DATA` layout — but is
not present in this release.

---

## Chunks exposed

68 chunks in total, grouped into 4 **collections** (tile 512 × 512). The three
`IMG_DATA` resolution collections hold the spectral + ancillary bands (named
`<LAYER>_<RES>`); the `QI` collection holds the `QI_DATA` quality masks and the
preview.

| Collection | Count | Contents |
|------------|-------|----------|
| `R10m`     | 7     | B02 B03 B04 B08 AOT WVP (uint16) · TCI (uint8 RGB) |
| `R20m`     | 14    | B01–B07 B8A B11 B12 AOT WVP (uint16) · SCL TCI (uint8) |
| `R60m`     | 15    | B01–B07 B8A B09 B11 B12 AOT WVP (uint16) · SCL TCI (uint8) |
| `QI`       | 32    | CLDPRB_{20m,60m}, SNWPRB_{20m,60m}, CLASSI_B00, DETFOO_B01–B12+B8A, QUALIT_B01–B12+B8A, PVI (all uint8) |

### R10m — 10 980 × 10 980 pixels, tile 512 × 512

| Chunk name | Band  | dtype  |
|------------|-------|--------|
| `B02_10m`  | B02   | uint16 |
| `B03_10m`  | B03   | uint16 |
| `B04_10m`  | B04   | uint16 |
| `B08_10m`  | B08   | uint16 |
| `AOT_10m`  | AOT   | uint16 |
| `WVP_10m`  | WVP   | uint16 |
| `TCI_10m`  | TCI   | uint8 (RGB) |

### R20m — 5 490 × 5 490 pixels

| Chunk name | Band  | dtype  |
|------------|-------|--------|
| `B01_20m`  | B01   | uint16 |
| `B02_20m`  | B02   | uint16 |
| `B03_20m`  | B03   | uint16 |
| `B04_20m`  | B04   | uint16 |
| `B05_20m`  | B05   | uint16 |
| `B06_20m`  | B06   | uint16 |
| `B07_20m`  | B07   | uint16 |
| `B8A_20m`  | B8A   | uint16 |
| `B11_20m`  | B11   | uint16 |
| `B12_20m`  | B12   | uint16 |
| `AOT_20m`  | AOT   | uint16 |
| `WVP_20m`  | WVP   | uint16 |
| `SCL_20m`  | SCL   | uint8  |
| `TCI_20m`  | TCI   | uint8 (RGB) |

### R60m — 1 830 × 1 830 pixels

| Chunk name | Band  | dtype  |
|------------|-------|--------|
| `B01_60m`  | B01   | uint16 |
| `B02_60m`  | B02   | uint16 |
| `B03_60m`  | B03   | uint16 |
| `B04_60m`  | B04   | uint16 |
| `B05_60m`  | B05   | uint16 |
| `B06_60m`  | B06   | uint16 |
| `B07_60m`  | B07   | uint16 |
| `B8A_60m`  | B8A   | uint16 |
| `B09_60m`  | B09   | uint16 |
| `B11_60m`  | B11   | uint16 |
| `B12_60m`  | B12   | uint16 |
| `AOT_60m`  | AOT   | uint16 |
| `WVP_60m`  | WVP   | uint16 |
| `SCL_60m`  | SCL   | uint8  |
| `TCI_60m`  | TCI   | uint8 (RGB) |

### QI — quality masks & preview (`QI_DATA`)

All `uint8`. Probability masks carry a resolution suffix; the per-band
detector-footprint and quality masks are named after their band.

| Chunk name | Layer |
|------------|-------|
| `CLDPRB_20m`, `CLDPRB_60m` | cloud probability |
| `SNWPRB_20m`, `SNWPRB_60m` | snow probability |
| `CLASSI_B00` | classification mask |
| `DETFOO_B01`…`DETFOO_B12`, `DETFOO_B8A` | detector footprint (per band) |
| `QUALIT_B01`…`QUALIT_B12`, `QUALIT_B8A` | quality mask (per band) |
| `PVI` | preview image |

Each chunk's `drb:source` is an XQuery expression of the form:

```
GRANULE/*/IMG_DATA/R10m/*[fn:matches(fn:name(),'.*_B04_10m\.jp2$')]
```

> **Note:** the leading `.*` is required because `fn:matches` in drb's XQuery
> engine uses full-match semantics (`re.fullmatch`), not search semantics.
> The pattern must match the entire filename, not just a suffix.

---

## How attachment works

`cortex.ttl` extends the L2A class **by URI**, targeting
`sentinel-2:product_user_level-2a`. It does not include a `setup.cfg`
entry-point group — the TTL is not auto-discovered by the resolver on its own.

At runtime the chunk TTL must be **merged into the topic graph** in one of two
ways (see [Bootstrap](#bootstrap)):

- In a **Fuseki** deployment, add the packaged TTL to the named-graph list
  loaded into the dataset.
- In an **offline** setup, compose it together with the S2-SAFE topic TTL in a
  single `RDFDao([path_to_s2_ttl, cortex_path()])`.

Once merged, `drb-chunk` reads the descriptors via `get_dao(topic).graph` and
makes them available through the standard chunk API.

The packaged TTL path is returned by:

```python
from drb.addons.chunk.sentinel2 import cortex_path
print(cortex_path())   # /path/to/drb/addons/chunk/sentinel2/cortex.ttl
```

---

## Bootstrap

### Mode 1 — Fuseki (recommended for production)

Set the environment variables before resolving any node:

```bash
export FUSEKI_URL=http://localhost:3030
export DATASET=drb
export DRB_FUSEKI_GRAPHS="http://drb.gael.fr/graph/kb/drbx-kb-topics-sentinel-2-safe/latest,http://drb.gael.fr/graph/kb/drbx-kb-topics-safe/latest"
```

`DRB_FUSEKI_GRAPHS` is a comma-separated list of Fuseki named-graph URIs; the
chunk-descriptor graph (loaded from this package's `cortex.ttl`) must also be
merged into the same dataset/graph set.

### Mode 2 — Offline vendored TTL

Compose the two TTL files into a single `RDFDao` and register it before
resolving:

```python
from drb.topics.dao import ManagerDao
from drb.topics.dao.rdf_dao import RDFDao
from drb.addons.chunk.sentinel2 import cortex_path

s2_ttl = "/path/to/vendored/sentinel-2-safe-topics.ttl"
# Merge the S2 topic descriptors and the chunk descriptor into one graph.
ManagerDao().add_dao_instance(RDFDao([s2_ttl, str(cortex_path())]))
```

See [`examples/demo_s2_l2a_chunk.py`](examples/demo_s2_l2a_chunk.py) for a
complete runnable demonstration of both modes.

---

## Worked example

```python
import numpy
from drb.topics import resolver
from drb.addons.addon import AddonManager
from drb.chunk.selection import WindowSelection

# 1. Resolve the S3 product (bootstrap the KB topics first; see Bootstrap)
topic, node = resolver.resolve(
    "s3://my-sentinel2-bucket/"
    "S2A_MSIL2A_20220101T000000_N0400_R000_T19VCG_20220101T000000.SAFE")

# 2. Discover the chunks, by collection, declared for this topic
addon = AddonManager().get_addon("chunk")
print(addon.available_collections(topic))
#   {'R10m': ['B02_10m', ...], 'R20m': [...], 'R60m': [...], 'QI': [...]}

# 3a. Build one chunk and read a 512x512 window
chunk = addon.apply(node, chunk_name="B04_10m")
window = WindowSelection(x=0, y=0, w=512, h=512)
array = chunk.select(window).get_impl(numpy.ndarray)
print(array.shape)          # (1, 512, 512) — single band, uint16

# 3b. Or build every chunk of a whole collection at once
r10m_chunks = addon.apply(node, collection="R10m")   # list[Chunk] (7 here)
```

---

## Limitations

- **Windowing is correct but not partial-fetch:** `WindowSelection` returns the
  right pixel values, but the current implementation materialises the **full JP2
  member** from the archive before slicing. There is no partial network fetch
  (e.g. HTTP range request on a cloud-hosted SAFE). This is a drb-chunk v1
  limitation.
- **Single-tile materialisation:** each `apply` / `select` call opens one JP2
  node. Multi-tile mosaicking is not supported in this release.

See [`docs/evolution-requests.md`](docs/evolution-requests.md) for tracked
improvement requests.

---

## Regenerating the descriptor

The `cortex.ttl` is generated from a band×resolution table embedded in
`_generate.py`. To regenerate after editing the table:

```bash
python -m drb.addons.chunk.sentinel2._generate
```

The file is written next to the module (i.e. `drb/addons/chunk/sentinel2/cortex.ttl`).
Commit the result.

---

## Installation

```bash
pip install drb-chunk-sentinel2
```

Requires `drb-chunk` (installed as a dependency). A working drb-chunk setup
with either a Fuseki instance or a vendored S2 topic TTL is needed at runtime
(see [Bootstrap](#bootstrap)).
