Metadata-Version: 2.4
Name: tree-registration-and-matching
Version: 0.1.0
Summary: Algorithms to register or sets of trees or find correspondences between them
License: BSD-3
License-File: LICENSE
Author: David Russell
Author-email: djrussell@ucdavis.edu
Requires-Python: >=3.12
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: geopandas (>=1.1.1,<2.0.0)
Requires-Dist: ipykernel (>=7.1.0,<8.0.0)
Requires-Dist: matplotlib (>=3.10.7,<4.0.0)
Requires-Dist: rasterio (>=1.5.0,<2.0.0)
Requires-Dist: scikit-image (>=0.26.0,<0.27.0)
Requires-Dist: scipy (==1.15)
Description-Content-Type: text/markdown

# tree-registration-and-matching

## Install

```
conda create -n tree-registration-and-matching python=3.12 -y
conda activate tree-registration-and-matching
```
Install poetry
```
poetry install
```

## Example data
An example benchmark is provided to test registration algorithms and can be downloaded from Box at this [link](https://ucdavis.box.com/v/ofo-tree-registration). This dataset consists of data from 232 drone imagery collections and corresponding field reference information. The field reference information was manually registered to the CHM product for each dataset. The data should be downloaded and placed inside of the `data` folder in this repository (`tree-registration-and-matching/data/ofo-tree-registration`). You can download a subset of the CHM products to save space and time if desired; all other files are small.
* **CHMs:** The drone images were registered together using photogrammetry (Agisoft Metashape). This produces a digital surface model (DSM; top of canopy) and digital terrain model (DTM; bare earth model) for each site. Taking the differece of the two we obtain a raster representing the estimated canopy height model (CHM). This is cropped to a buffer of 50 meters around the region which was surveyed. The data is provided as a geotiff file (`.tif`) which encodes the spatial location of the data.
* **detected-trees.gpkg:** This is a geospatial vector file containing the point locations of individual detected trees. These were identified from the CHM using a variable radius maximum filter, implemented in the [Tree Detection Framework](https://github.com/open-forest-observatory/tree-detection-framework/blob/35c8020f86a6f51c582962298ee37ab9acdcfd21/tree_detection_framework/detection/detector.py#L478).
* **field_trees.gpkg:** The locations of trees observed from ground surveys. This is provided as a geospatial vector file where each point represents a single tree. Many attributes from the field survey are provided but are not used in most registration approaches. The `dataset_id` field denotes which CHM the tree should be paired with and the `height` field represents the field-measured height (m). The data has undergone some pre-processing to reduce it to only trees that are expected to be visible from above. First, all trees that are dead (`live_dead == 'D'`) are removed since dead trees are reconstructed poorly by photogrammetry. Then trees are removed if they are likely to be under another larger tree. Starting with the tallest tree, any shorter trees are removed if they are within `1 + 0.1 * height (m)` of the tall tree. This process is repeated for all trees from tallest to shortest.
Finally, height is imputed for all trees that do not have a field-measured value. This is done using an allometric equation fit on the diameter at breast height (DBH) value which is recorded for all trees. Using the trees in the dataset which have both DBH and height we fit the following allometric function: $H = 1.3 + e ^ (-0.314 + 0.846 log_e(DBH))$ to find the height in meters.
* **plot_bounds.gpkg:** The region surveyed in field represented as polygon vector data. The `dataset_id` column denotes which CHM and field trees the data should be paired with.
* **shift_quality.json:** This is a mapping from `dataset_id` to a number from 1-4. The latter is a quality score, with 4 being the highest. This takes into account how accurate the field survey appeared to be when compared to the CHM data. Furthermore, it also represents how confident a human annotator was in finding the correct shift for that dataset.
* **shfts_per_dataset.json:** All of the field trees and plot bounds have been shifted so that they align as well as possible with the CHM, as determined by a human annotator. This shift, represented as an (x, y) shift in meters in the `EPSG:26910` coordinate frame, represents how much the data needed to be shifted by. Since the provided trees and plot bounds are already shifted, you must apply the negative of this value to get the initial location of the trees and plots.

### Running real-world examples
Using the data described here, you can run both the `examples/example_registratiration_real_data_CHM.ipynb` and `examples/example_registration_real_data_MEE.ipynb` notebooks. The first one is a canopy height model (CHM) approach which computes the correlation between the field-measured heights and the corresponding location on the CHM. The latter uses trees detected from the CHM and tries to match these to the observed trees.


