Metadata-Version: 2.4
Name: atproto-sensorthings
Version: 0.1.0
Summary: Flatten dev.sensorthings ATProto repo (CAR) exports into tidy, geo-referenced tables.
Project-URL: Homepage, https://sensorthings.dev
Project-URL: Source, https://github.com/urschrei/atproto_sensorthings
Project-URL: Issues, https://github.com/urschrei/atproto_sensorthings/issues
Author-email: Stephan Hügel <urschrei@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: atproto,car,geopandas,ogc,sensors,sensorthings
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: GIS
Requires-Python: >=3.10
Requires-Dist: atproto>=0.0.69
Requires-Dist: click>=8.1
Provides-Extra: geo
Requires-Dist: geopandas>=1.0; extra == 'geo'
Requires-Dist: pandas>=2.0; extra == 'geo'
Requires-Dist: shapely>=2.0; extra == 'geo'
Provides-Extra: parquet
Requires-Dist: pyarrow>=14.0; extra == 'parquet'
Description-Content-Type: text/markdown

# atproto-sensorthings

Flatten a `dev.sensorthings` ATProto repo export into tidy, geo-referenced
tables for analysis.

Sensor networks on [sensorthings.dev](https://sensorthings.dev) publish their
observations to an ATProto repository. You can download that repository as a
single CAR file (from `sensorthings.dev/yourdata`, or via
`com.atproto.sync.getRepo`). This package turns that CAR into three tables:

| Table          | One row per            | Geometry              |
| -------------- | ---------------------- | --------------------- |
| `stations`     | Thing (station)        | station point         |
| `datastreams`  | Datastream             | station point         |
| `observations` | reading (long / tidy)  | station point, or the observed position for a GPS fix |

Everything is derived from the CAR: no network access, no repo checkout, and the
lexicons are not needed at runtime.

## Install

The base install (parsing, the summary, and CSV output) depends only on
`atproto` and `click`. Install as a command-line tool:

```sh
uv tool install atproto-sensorthings         # or: pipx install atproto-sensorthings
```

or add it to a project as a library:

```sh
uv add atproto-sensorthings                  # or: pip install atproto-sensorthings
```

GeoDataFrame and GeoJSON output need the `geo` extra (geopandas / shapely /
pandas); GeoParquet also needs `parquet`:

```sh
uv add "atproto-sensorthings[geo]"           # or: pip install "atproto-sensorthings[geo]"
uv add "atproto-sensorthings[geo,parquet]"   # or: pip install "atproto-sensorthings[geo,parquet]"
```

## Command line

```sh
# record counts and reading time span
atproto-sensorthings summary export.car

# write stations.csv, datastreams.csv, observations.csv to ./sensorthings_export
atproto-sensorthings flatten export.car

# GeoJSON (needs the geo extra); one layer at a time is also supported
atproto-sensorthings flatten export.car --format geojson --out export
atproto-sensorthings flatten export.car --format geoparquet --layer observations
```

CSV rows carry `longitude` / `latitude` columns, so they load as points in any
GIS without the geo extra.

## Library

```python
import atproto_sensorthings as ast

repo = ast.read_car("export.car")

# pure core: plain dicts, streamed; no geopandas required
for row in ast.observation_rows(repo):
    ...

# with the geo extra: GeoDataFrames (imported lazily on first use)
stations = ast.stations_gdf(repo)        # 1 row / station
observations = ast.observations_gdf(repo)  # 1 row / reading, sorted, typed
```

## Data notes

- **Coordinates** are decoded from the lexicon's integer `latE7` / `lonE7`
  (degrees times 1e7) to WGS84 degrees; altitude from `altMm` to metres.
- **Numeric results** are scaled integers: the real value is
  `value * 10**-resultScaleFactor`, applied automatically. In the observations
  table, `value` holds the decoded native result (float, string, boolean or
  list) and `value_num` is a nullable float for the numeric and boolean cases.
- **Result types** covered: numeric, category, boolean, array, geoPoint.
- **Record types** covered: `observationBatch`, `observation` and
  `multiObservation`, so every network's export flattens the same way.
- The whole export is read into memory; for the largest agency repositories
  prefer GeoParquet output over GeoJSON.

## Licence

MIT. See [LICENSE](LICENSE).
