Metadata-Version: 2.4
Name: drb-chunk-sentinel1
Version: 0.1.0
Summary: Sentinel-1 product chunk descriptors (drb-chunk)
Author: GAEL Systems
Author-email: drb-python@gael.fr
License: LGPLv3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)
Requires-Python: <3.14,>=3.11
Description-Content-Type: text/markdown
License-File: LICENCE.txt
Requires-Dist: rdflib
Dynamic: license-file

# drb-chunk-sentinel1

Chunk descriptors for Sentinel-1 **Level-1** product types: **GRD** (detected
amplitude) and **SLC** (Single Look Complex), built on top of the
[drb-chunk](../../core/README.md) add-on.

## What it is

`drb-chunk-sentinel1` ships a generated `cortex.ttl` that declares chunk
descriptors for Level-1 GRD and SLC products, attached **once** to the
existing Sentinel-1 Level-1 class URI:

```
http://knowledge-base.gael.fr/drb/sentinel-1/product_level-1
```

Every Level-1 product — whatever its acquisition mode (IW/EW/SM) or satellite
(S1A/B/C/D) — resolves to a subclass of `product_level-1` and **inherits**
the chunks through the topic graph's `subClassOf` chain. There is no
product-type-specific topic class in the knowledge base, and none was added:
each chunk's `drb:source` XQuery self-selects on the filename token
(`-grd-<pol>-` for GRD, `-slc-<pol>-` for SLC) — products of the non-matching
type yield an empty source for that chunk.

The package does **not** modify the Sentinel-1 topic TTL and registers **no
entry-point group**. It is a pure descriptor extension merged into the topic
graph at runtime (see [How attachment works](#how-attachment-works)).

---

## Chunks exposed

One `measurement` chunk per polarization, tile 512 × 512, `uint16`, one band.
Array shape is read at runtime from the raster itself (GRD dimensions vary
per product/slice) — never hard-coded in the descriptor.

| Chunk name | Content | dtype | Tile |
|------------|---------|-------|------|
| `VV` | VV-polarization detected amplitude | uint16 | 512×512 |
| `VH` | VH-polarization detected amplitude | uint16 | 512×512 |
| `HH` | HH-polarization detected amplitude | uint16 | 512×512 |
| `HV` | HV-polarization detected amplitude | uint16 | 512×512 |

### Mode × polarization coverage

The four chunk descriptors are declared once, mode-agnostic, and apply
uniformly across every Level-1 GRD product type. Which polarizations are
actually present on a given product depends on its acquisition mode and
polarization scheme; absent polarizations resolve to an empty XQuery source
(see [Limitations](#limitations)).

| Mode | Product type | VV | VH | HH | HV | Validated live |
|------|--------------|:--:|:--:|:--:|:--:|-----------------|
| IW (Interferometric Wide) | GRDH | ✅ | ✅ | ✅ | ✅ | ✅ **yes** (local S1C fixture, SDV) |
| EW (Extra Wide) | GRDM | ✅ | ✅ | ✅ | ✅ | ⚠️ **no** — no local EW fixture |
| SM (Stripmap) | GRDH / GRDM | ✅ | ✅ | ✅ | ✅ | ⚠️ **no** — no local SM fixture |

A single product carries at most a dual-pol pair (e.g. SDV → VV+VH, SDH →
HH+HV) or a single polarization (SSV → VV, SSH → HH); `available_chunks`
still lists all four chunk names for every GRD product (they are declared on
the shared `product_level-1` ancestor), but `apply()` on an absent
polarization raises `DrbChunkError` — never silently wrong data.

---

## Collection taxonomy

`drb:collection` carries the SAFE **data family**; `drb:chunkName` carries
the addressing within it (polarization for GRD; swath×pol×burst-index for
SLC). This release populates only `measurement`; the other collections are
reserved names for future increments (see the design spec §5 for the full
rationale).

| Collection | Content | Status | Increment |
|------------|---------|--------|-----------|
| `measurement` | SAR image raster — GRD: 1 chunk/pol (detected, ground-range, uint16, regular 512² tiling) | ✅ **implemented** (this release) | 1 |
| `bursts` | SLC measurement re-viewed per burst (per swath), geometry from `swathTiming/linesPerBurst` + `burstList` | reserved — new annotation-driven `TilingScheme` | 2 |
| `calibration` | Radiometric LUTs (sigma0/beta0/gamma/dn), coarse azimuth×range grid, per pol | reserved — candidate LUT/coarse-grid reader | future / RFE |
| `noise` | Thermal-noise LUTs (range + azimuth), per pol | reserved | future / RFE |
| `geolocation` | `geolocationGridPoint` (lat/lon/height/incidence…) coarse grid | reserved | future / RFE |
| `rfi` | RFI detection/mitigation reports (flags) | out of scope — no regular array | — |
| `preview` | Reduced-resolution quicklook/browse | reserved | future |

---

## How attachment works

`cortex.ttl` extends the Level-1 class **by URI**, targeting
`sentinel-1:product_level-1` — the level-1 root every GRD (and SLC/WV)
product inherits from. It does not include a `setup.cfg` entry-point group —
the TTL is not auto-discovered by the resolver on its own.

At runtime the chunk TTL must be **merged into the topic graph** in one of
two ways (see [Bootstrap](#bootstrap)):

- In a **Fuseki** deployment, add the packaged TTL to the named-graph list
  loaded into the dataset.
- In an **offline** setup, compose it together with the S1-SAFE topic TTLs
  in a single `RDFDao([...])`.

Once merged, `drb-chunk` reads the descriptors via `get_dao(topic).graph`
(walking `subClassOf` upward from the resolved topic) and makes them
available through the standard chunk API.

The packaged TTL path is returned by:

```python
from drb.addons.chunk.sentinel1 import cortex_path
print(cortex_path())   # /path/to/drb/addons/chunk/sentinel1/cortex.ttl
```

---

## Bootstrap

### Mode 1 — Fuseki (recommended for production)

Set the environment variables before resolving any node:

```bash
export FUSEKI_URL=http://localhost:3030
export DATASET=drb
export DRB_FUSEKI_GRAPHS="http://drb.gael.fr/graph/kb/drbx-kb-topics-sentinel-1-safe/latest,http://drb.gael.fr/graph/kb/drbx-kb-topics-safe/latest"
```

`DRB_FUSEKI_GRAPHS` is a comma-separated list of Fuseki named-graph URIs; the
chunk-descriptor graph (loaded from this package's `cortex.ttl`) must also
be merged into the same dataset/graph set. **No change to `drb-fuseki`
itself is required** — the chunk descriptor attaches to the pre-existing
`product_level-1` URI, so the KB is used exactly as shipped.

### Mode 2 — Offline vendored TTL

Compose **three** files into a single `RDFDao` and register it before
resolving. The base `safe-topics.ttl` is the `owl:imports` root that carries
the `subClassOf+ drb:item` closure — loading only the Sentinel-1-specific TTL
loses that closure and every `product*` class vanishes from the DAO:

```python
from drb.topics.dao import ManagerDao
from drb.topics.dao.rdf_dao import RDFDao
from drb.addons.chunk.sentinel1 import cortex_path

safe_topics_ttl = "/path/to/vendored/safe-topics.ttl"
s1_safe_ttl = "/path/to/vendored/sentinel-1-safe-topics.ttl"
ManagerDao().add_dao_instance(
    RDFDao([safe_topics_ttl, s1_safe_ttl, cortex_path()]))
```

See [`examples/demo_s1_grd_chunk.py`](examples/demo_s1_grd_chunk.py) for a
complete runnable demonstration of both modes, and
[`examples/RESULTS-grd.md`](examples/RESULTS-grd.md) for a captured run
against the local S1C IW GRDH fixture.

---

## Worked example

```python
import numpy
from drb.topics import resolver
from drb.addons.addon import AddonManager
from drb.chunk.selection import WindowSelection

# 1. Resolve the product (bootstrap the KB topics first; see Bootstrap).
#    resolver.create() path-walks a nested "x.SAFE.zip/x.SAFE" URL directly.
node = resolver.create(
    "/data/S1C_IW_GRDH_1SDV_..._7B0C.SAFE.zip/"
    "S1C_IW_GRDH_1SDV_..._7B0C.SAFE")
topic = resolver.resolve(node)[0]
print(topic.uri)
#   http://knowledge-base.gael.fr/drb/sentinel-1/product_level-1_iw_s

# 2. Discover the chunks, by collection, declared for this topic
#    (inherited from product_level-1 -- no GRD-specific topic class exists)
addon = AddonManager().get_addon("chunk")
print(addon.available_collections(topic))
#   {'measurement': ['VV', 'VH', 'HH', 'HV']}

# 3. Build a chunk and read a 512x512 window
chunk = addon.apply(node, chunk_name="VV", topic=topic)
window = WindowSelection(x=0, y=0, w=512, h=512)
array = chunk.select(window).get_impl(numpy.ndarray)
print(array.shape, array.dtype)   # (1, 512, 512) uint16
```

---

## SLC bursts

Single Look Complex (SLC) products are resolved as `product_level-1_iw_s[_abc]`
(a satellite-letter class) and carry **burst-indexed chunks** in the `bursts`
collection. Each burst is identified by its swath and polarization (e.g.,
`IW1_VV`, `IW2_HH`); they are enumerated **dynamically** at runtime by reading
the SLC annotation XML's `burstList`.

### Chunk addressing and geometry

An SLC burst chunk is addressed as `<SWATH>_<POL>` (e.g., `IW1_VV`):

| Chunk name | Content | Tiling scheme | dtype | Sample rate |
|------------|---------|---|-------|------|
| `IW1_VV` | Swath 1, VV polarization, per-burst complex SAR image | Burst-indexed (N bursts × 2D) | complex64 | native |
| `IW1_VH` | Swath 1, VH polarization, per-burst complex SAR image | Burst-indexed (N bursts × 2D) | complex64 | native |
| `IW2_VV` | Swath 2, VV polarization, per-burst complex SAR image | Burst-indexed (N bursts × 2D) | complex64 | native |
| `IW2_VH` | Swath 2, VH polarization, per-burst complex SAR image | Burst-indexed (N bursts × 2D) | complex64 | native |
| `IW3_VV` | Swath 3, VV polarization, per-burst complex SAR image | Burst-indexed (N bursts × 2D) | complex64 | native |
| `IW3_VH` | Swath 3, VH polarization, per-burst complex SAR image | Burst-indexed (N bursts × 2D) | complex64 | native |

Burst **geometry is dynamic**: `linesPerBurst` and `samplesPerBurst` are read
from the product's annotation XML, and burst windows are assembled from the
`burstList` offsets. All bursts within a swath carry the **same** sample count;
line counts may vary per-burst within a swath (nominal case: all equal; edge
bursts occasionally shorter).

### Addressing bursts via `IselSelection(per_dim={"burst": i})`

The chunk is a multi-dimensional array with the **burst index as the first
dimension** (e.g., shape `(N, lines, samples)` for a complex64 image). Bursts
are **enumerated** via `chunk.tiles()` (yields `(0,), (1,), ..., (N-1,)` tuples)
and accessed individually via `chunk.tile((i,))` or via selection:

```python
from drb.chunk.selection import IselSelection
import numpy

# Enumerate bursts for swath IW1, polarization VV
chunk = addon.apply(node, chunk_name="IW1_VV", topic=topic)
burst_count = len(list(chunk.tiles()))  # e.g. 9

# Read burst 3 as a complex64 numpy array
burst3 = chunk.select(
    IselSelection(per_dim={"burst": 3})).get_impl(numpy.ndarray)
print(burst3.shape)  # (1, lines, samples)
```

### Kept metadata per burst

Each burst reference holds **metadata** (accessed via `chunk.tile((i,)).info`):

| Key | Example | Type | Notes |
|-----|---------|------|-------|
| `burstIndex` | `3` | int | 0-based burst index within the swath |
| `burstIdAbsolute` | `900015` | int | Product-relative burst ID from `burstList/@absolute` |
| `azimuthTime` | `2026-01-20T00:10:05.123456` | str | Burst acquisition timestamp (ISO 8601) |
| `byteOffset` | `245760` | int | Byte offset of the burst's first line in the measurement file |
| `window` | `((0, 500), (0, 1296))` | tuple | Line and sample indices bounding the burst in the measurement raster `(line_start:line_end, sample_start:sample_end)` |
| `footprint` | `POLYGON((20 10, 21 10, ...))` | str | Burst's geolocation footprint (WKT POLYGON) derived from the coarse geolocation grid |

### Deferred metadata

The `firstValidSample` / `lastValidSample` arrays (per-line validity masks)
and the per-burst radiometric data (calibration, noise LUTs) are **not yet
exposed** as chunks — they are reserved for future increments. Descriptive
metadata (incidence angle, Doppler, etc.) is also deferred.

### Bootstrap (same as GRD)

SLC chunk descriptors are merged into the same topic graph as GRD chunks. The
bootstrap procedure (Fuseki or offline vendored TTL) is identical; see
[Bootstrap](#bootstrap).

### Worked example

```python
import numpy
from drb.topics import resolver
from drb.addons.addon import AddonManager
from drb.chunk.selection import IselSelection

# 1. Resolve an SLC product.
node = resolver.create(
    "s3://bucket/S1A_IW_SLC__1SDV_20260120T001002_..._1234.SAFE/"
    "measurement/s1a-iw1-slc-vv-20260120t001002-...-001.tiff")
topic = resolver.resolve(node)[0]
print(topic.uri)
#   http://knowledge-base.gael.fr/drb/sentinel-1/product_level-1_iw_s_a

# 2. Get the chunk, enumerating bursts at runtime.
addon = AddonManager().get_addon("chunk")
chunk = addon.apply(node, chunk_name="IW1_VV", topic=topic)
bursts = list(chunk.tiles())
print(f"IW1_VV: {len(bursts)} bursts")  # e.g. "9 bursts"

# 3. Read burst metadata and data.
ref = chunk.tile((0,))
print(ref.window)         # e.g. ((0, 500), (0, 1296))
print(ref.info['azimuthTime'])  # 2026-01-20T00:10:02.123456
print(ref.info['footprint'])    # POLYGON((20 10, 21 10, ...))

# 4. Read the burst's full data as a numpy array.
burst0 = chunk.select(
    IselSelection(per_dim={"burst": 0})).get_impl(numpy.ndarray)
print(burst0.shape, burst0.dtype)  # (1, 500, 1296) complex64
```

See [`examples/demo_s1_slc_burst.py`](examples/demo_s1_slc_burst.py) for a
complete runnable demonstration against the offline synthetic fixture, and
[`examples/RESULTS-slc.md`](examples/RESULTS-slc.md) for a captured run.

---

## Limitations

- **Chunks are listed even when not materialisable on a given product.**
  Because all four polarization chunks (and, from increment 2, the SLC
  bursts) are declared once on the shared `product_level-1` ancestor,
  `available_chunks`/`available_collections` on any Level-1 product (GRD,
  SLC, or WV) lists all of them, whether or not that specific product
  carries that polarization or product type. `apply(chunk_name=...)` raises
  `DrbChunkError` if the chunk's XQuery source matches nothing in the given
  product — never silently wrong data.
- **EW GRDM / SM GRD are not validated against a live product** in this
  release — only IW GRDH was exercised end-to-end (no local EW/SM fixture
  available). The chunk descriptors are mode-agnostic and unit-parsed
  (`tests/test_cortex_ttl.py`), so EW/SM are expected to work identically,
  but this is a documented gap, not a claim. See
  [`examples/RESULTS-grd.md`](examples/RESULTS-grd.md).
- **Windowing is correct but not a true partial network fetch on every
  medium.** Locally (plain file or zip member) a 512×512 window reads back
  in ~0.02 s with no measurable zip overhead. Over S3, GRD's raster layout
  (striped, 1-line blocks, no internal tiling, no overviews) caps
  efficiency at line-band granularity — see
  [`docs/evolution-requests.md`](docs/evolution-requests.md) RFE 1.
- **Single-tile materialisation:** each `apply` / `select` call opens one
  measurement GeoTIFF. Multi-tile mosaicking is not supported in this
  release (existing drb-chunk v1 deferral, tracked in
  `docs/evolution-requests.md`).

---

## Regenerating the descriptor

The `cortex.ttl` is generated from a small polarization table embedded in
`_generate.py`. To regenerate after editing the table:

```bash
python -m drb.addons.chunk.sentinel1._generate
```

The file is written next to the module (i.e.
`drb/addons/chunk/sentinel1/cortex.ttl`). Commit the result.

---

## Installation

```bash
pip install drb-chunk-sentinel1
```

This package's only declared runtime dependency is `rdflib` (it ships a
generated `cortex.ttl` descriptor, see `requirements.txt`). The consuming
environment is responsible for providing a working drb stack — `drb`,
`drb-chunk`, `drb-extractor`, `drb-driver-image`, `rasterio`, `numpy`
(listed in `requirements-test.txt` for dev/CI) — plus either a Fuseki
instance or the vendored S1 topic TTLs at runtime (see
[Bootstrap](#bootstrap)). Resolving a `.SAFE.zip` or S3-hosted product
additionally requires `drb-driver-zip` / `drb-driver-s3`, which are not
declared dependencies of this package either.
