Metadata-Version: 2.4
Name: vectorizer-pro
Version: 0.2.0
Summary: Raster mask vectorization with topology-preserving simplification
Author: Wuhan University CVEO Team
License-Expression: MIT
Project-URL: Homepage, https://www.whu-cveo.com/
Project-URL: Repository, https://github.com/CVEO/vectorizer-pro
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: GIS
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: rasterio>=1.3.0
Requires-Dist: shapely>=2.1.0
Requires-Dist: click>=8.1.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: fiona>=1.9.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Dynamic: license-file

# Vectorizer Pro

[English](README.md) | [中文](README_zh-CN.md)

A Python CLI tool to vectorize raster mask files into polygon shapefiles with topology-preserving simplification.

## Features

- Convert raster masks (int8/16/32 class IDs) to vector polygons
- Pure Python topology-preserving Visvalingam-Whyatt (TPVW) simplification
- No GEOS dependency for simplification — self-contained pure Python implementation
- Support for large images (30000x30000+)
- 4-connectivity polygonization
- Output formats: Shapefile (.shp), GeoPackage (.gpkg), GeoJSON (.geojson)
- Preserve class ID attributes
- CRS preservation from input raster

## Installation

```bash
pip install -e .
```

Or install from source:

```bash
git clone https://github.com/CVEO/vectorizer-pro.git
cd vectorizer-pro
pip install -e .
```

## Usage

### Basic Usage

```bash
vectorizer-pro input.tif output.shp
```

### With Options

```bash
# Specify nodata value to exclude
vectorizer-pro input.tif output.shp --nodata 0

# Remove small regions (merge regions smaller than 100 pixels)
vectorizer-pro input.tif output.shp --min-area 100

# Set simplification tolerance
vectorizer-pro input.tif output.shp --tolerance 0.1

# Output as GeoPackage
vectorizer-pro input.tif output.gpkg --format gpkg

# Simplify only internal edges (preserve boundary)
vectorizer-pro input.tif output.shp --no-simplify-boundary
```

### Command Line Options

| Option | Description |
|--------|-------------|
| `--nodata INT` | Nodata value to exclude from vectorization |
| `--min-area FLOAT` | Minimum polygon area threshold. Smaller polygons will be merged into their largest adjacent neighbor |
| `--tolerance FLOAT` | Simplification tolerance (default: half pixel size) |
| `--format, -f` | Output format: `shp`, `gpkg`, or `geojson` (default: `shp`) |
| `--simplify-boundary/--no-simplify-boundary` | Simplify exterior boundaries (default: yes) |
| `--detect-nodata` | Print nodata value and exit |
| `--list-classes` | List unique class IDs and exit |

## Python Package Usage

```python
from vectorizer_pro import vectorize, VectorizeResult

# Simple usage - writes to file
result = vectorize("input.tif", "output.shp", nodata=0)

# Remove small regions in Python API
result = vectorize("input.tif", "output.shp", nodata=0, min_area=100)

# Get geometries without writing
result = vectorize("input.tif", nodata=0, output_path=None)
polygons = result.polygons
class_ids = result.class_ids
crs = result.crs
```

## Examples

### Quick Start

```bash
# Check nodata value
vectorizer-pro sample/top_potsdam_2_13.tif --detect-nodata

# List class IDs
vectorizer-pro sample/top_potsdam_2_13.tif --list-classes

# Vectorize excluding class 0
vectorizer-pro sample/top_potsdam_2_13.tif output.shp --nodata 0
```

### Advanced Usage

```bash
# High simplification for smoother polygons
vectorizer-pro input.tif output.shp --nodata 0 --tolerance 0.5

# Remove small regions before simplification
vectorizer-pro input.tif output.shp --nodata 0 --min-area 50 --tolerance 0.1

# Preserve exact boundary shape
vectorizer-pro input.tif output.shp --nodata 0 --no-simplify-boundary

# GeoPackage output with custom tolerance
vectorizer-pro input.tif output.gpkg --format gpkg --tolerance 0.05
```

## Requirements

- Python >= 3.10
- rasterio
- shapely >= 2.1
- click
- fiona
- numpy

## References

### Projects

- **GDAL** - Raster I/O and Polygonize algorithm  
  https://gdal.org/

- **Shapely** - Python geometry operations  
  https://shapely.readthedocs.io/

- **GEOS** - C/C++ Geometry engine (reference implementation for TPVW algorithm)
  https://libgeos.org/

- **JTS (Java Topology Suite)** - JAVA Topology Processing
  https://github.com/locationtech/jts

### Algorithms

- **GDAL Polygonize** - Two-arm chain edge tracing algorithm for 4-connectivity raster vectorization

- **Visvalingam-Whyatt** - Area-based vertex removal simplification that preserves topology in polygonal coverages
- **TPVW (Topology-Preserving Visvalingam-Whyatt)** - Extension of VW algorithm that ensures shared edges between adjacent polygons are simplified identically, preventing gaps and overlaps

### Sample Data

- **`sample/top_potsdam_2_13.tif`** - Semantic labeling result generated by an AI model on the ISPRS Potsdam 2D Semantic Labeling Contest benchmark dataset. Used as a demonstration of vectorizing large raster masks.

- **`sample/small.tif`** - A smaller sample for quick testing.

The original Potsdam aerial imagery and ground truth are from the ISPRS benchmark: https://www.isprs.org/

## Authors

**Wuhan University CVEO Team** (武汉大学CVEO课题组)

Website: https://www.whu-cveo.com/

## License

MIT License - see [LICENSE](LICENSE) for details.
