Metadata-Version: 2.4
Name: esri-converter
Version: 0.1.0
Summary: Modern tools for converting ESRI proprietary formats to open source formats
Project-URL: Homepage, https://github.com/mihiarc/esri-converter
Project-URL: Documentation, https://esri-converter.readthedocs.io
Project-URL: Repository, https://github.com/mihiarc/esri-converter.git
Project-URL: Issues, https://github.com/mihiarc/esri-converter/issues
Project-URL: Changelog, https://github.com/mihiarc/esri-converter/blob/main/CHANGELOG.md
Author: Chris Mihiar
Maintainer: Chris Mihiar
License: MIT
License-File: LICENSE
Keywords: conversion,esri,gdb,geodatabase,geoparquet,geospatial,gis
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: GIS
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: fiona>=1.9.0
Requires-Dist: polars>=0.20.0
Requires-Dist: psutil>=5.9.0
Requires-Dist: pyarrow>=15.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: shapely>=2.0.0
Requires-Dist: tqdm>=4.65.0
Provides-Extra: all
Requires-Dist: black>=23.0.0; extra == 'all'
Requires-Dist: build>=0.10.0; extra == 'all'
Requires-Dist: duckdb>=0.10.0; extra == 'all'
Requires-Dist: mkdocs-gen-files>=0.5.0; extra == 'all'
Requires-Dist: mkdocs-literate-nav>=0.6.2; extra == 'all'
Requires-Dist: mkdocs-material>=9.6.14; extra == 'all'
Requires-Dist: mkdocs-mermaid2-plugin>=1.2.1; extra == 'all'
Requires-Dist: mkdocs-section-index>=0.3.10; extra == 'all'
Requires-Dist: mkdocs>=1.6.1; extra == 'all'
Requires-Dist: mkdocstrings[python]>=0.29.1; extra == 'all'
Requires-Dist: mypy>=1.0.0; extra == 'all'
Requires-Dist: pre-commit>=3.0.0; extra == 'all'
Requires-Dist: pytest-cov>=4.0.0; extra == 'all'
Requires-Dist: pytest>=7.0.0; extra == 'all'
Requires-Dist: python-dotenv>=1.0.0; extra == 'all'
Requires-Dist: ruff>=0.1.0; extra == 'all'
Requires-Dist: twine>=4.0.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: build>=0.10.0; extra == 'dev'
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pre-commit>=3.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: python-dotenv>=1.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: twine>=4.0.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-gen-files>=0.5.0; extra == 'docs'
Requires-Dist: mkdocs-literate-nav>=0.6.2; extra == 'docs'
Requires-Dist: mkdocs-material>=9.6.14; extra == 'docs'
Requires-Dist: mkdocs-mermaid2-plugin>=1.2.1; extra == 'docs'
Requires-Dist: mkdocs-section-index>=0.3.10; extra == 'docs'
Requires-Dist: mkdocs>=1.6.1; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.29.1; extra == 'docs'
Provides-Extra: duckdb
Requires-Dist: duckdb>=0.10.0; extra == 'duckdb'
Description-Content-Type: text/markdown

# ESRI Converter

Modern tools for converting ESRI proprietary formats to open source formats. Built for 2025 with cutting-edge Python libraries and beautiful progress tracking.

## 🚀 Features

- **Large-Scale Processing**: Handle multi-GB GDB files with streaming and chunking
- **Modern Stack**: Built with Polars, Rich, and PyArrow for maximum performance
- **Beautiful UI**: Rich progress bars, tables, and visual feedback
- **Memory Efficient**: Process datasets larger than available RAM
- **Robust Error Handling**: Comprehensive validation and error recovery
- **Clean API**: Simple, well-documented functions for programmatic use

## 📦 Installation

```bash
# Install from PyPI (when published)
pip install esri-converter

# Or install in development mode
pip install -e .

# With optional dependencies
pip install esri-converter[duckdb,dev]
```

## 🔧 Requirements

- Python 3.10+
- Modern dependencies: Polars, Rich, Fiona, PyArrow, Shapely

## 🎯 Quick Start

### Basic Usage

```python
from esri_converter.api import convert_gdb_to_parquet

# Convert a single GDB file
result = convert_gdb_to_parquet("data.gdb")
print(f"Converted {result['total_records']:,} records")
print(f"Output size: {result['output_size_mb']:.1f} MB")
```

### Advanced Usage

```python
from esri_converter.api import (
    convert_gdb_to_parquet,
    convert_multiple_gdbs,
    discover_gdb_files,
    get_gdb_info
)

# Discover GDB files in a directory
gdb_files = discover_gdb_files("data/")
print(f"Found {len(gdb_files)} GDB files")

# Get information about a GDB without converting
info = get_gdb_info("large_dataset.gdb")
print(f"GDB has {info['total_layers']} layers with {info['total_records']:,} records")

# Convert specific layers with custom settings
result = convert_gdb_to_parquet(
    gdb_path="data.gdb",
    output_dir="my_output/",
    layers=["Parcels", "Buildings"],
    chunk_size=10000,
    show_progress=True
)

# Convert multiple GDB files
results = convert_multiple_gdbs(
    gdb_paths=["data1.gdb", "data2.gdb", "data3.gdb"],
    output_dir="batch_output/"
)
print(f"Successfully converted {results['gdbs_converted']}/{results['total_gdbs']} GDBs")
```

## 📚 API Reference

### Core Functions

#### `convert_gdb_to_parquet()`

Convert a File Geodatabase to GeoParquet format.

**Parameters:**
- `gdb_path` (str | Path): Path to the .gdb file
- `output_dir` (str | Path, optional): Output directory (default: "geoparquet_output")
- `layers` (List[str], optional): Specific layers to convert (default: all layers)
- `chunk_size` (int): Records to process at once (default: 15000)
- `show_progress` (bool): Show Rich progress bars (default: True)
- `log_file` (str, optional): Log file path

**Returns:**
```python
{
    'success': bool,
    'gdb_path': str,
    'output_dir': str,
    'layers_converted': [
        {
            'layer': str,
            'output_file': str,
            'record_count': int
        }
    ],
    'layers_failed': [str],
    'total_time': float,
    'total_records': int,
    'processing_rate': float,
    'output_size_mb': float
}
```

#### `convert_multiple_gdbs()`

Convert multiple GDB files in batch.

**Parameters:**
- `gdb_paths` (List[str | Path]): List of GDB file paths
- `output_dir` (str | Path, optional): Output directory
- `chunk_size` (int): Records to process at once (default: 15000)
- `show_progress` (bool): Show progress bars (default: True)
- `log_file` (str, optional): Log file path

**Returns:**
```python
{
    'success': bool,
    'total_gdbs': int,
    'gdbs_converted': int,
    'gdbs_failed': int,
    'results': [/* individual GDB results */],
    'total_time': float,
    'total_records': int,
    'total_output_size_mb': float
}
```

#### `discover_gdb_files()`

Find all .gdb files in a directory.

**Parameters:**
- `directory` (str | Path): Directory to search (default: current directory)

**Returns:**
- `List[Path]`: Sorted list of GDB file paths

#### `get_gdb_info()`

Get information about a GDB file without converting it.

**Parameters:**
- `gdb_path` (str | Path): Path to the .gdb file

**Returns:**
```python
{
    'gdb_path': str,
    'layers': [
        {
            'name': str,
            'record_count': int,
            'geometry_type': str,
            'crs': str,
            'field_count': int,
            'bounds': [minx, miny, maxx, maxy]
        }
    ],
    'total_records': int,
    'total_layers': int
}
```

### Utility Functions

```python
from esri_converter.utils import (
    list_supported_formats,
    get_format_info,
    validate_gdb_file,
    validate_output_path,
    get_recommended_chunk_size,
    estimate_output_size
)

# Get supported formats
formats = list_supported_formats()
print(f"Input formats: {formats['input']}")
print(f"Output formats: {formats['output']}")

# Get format details
info = get_format_info('gdb')
print(f"Description: {info['description']}")

# Validate files
validate_gdb_file("data.gdb")  # Raises ValidationError if invalid
validate_output_path("output/")  # Creates directory if needed

# Get recommendations
chunk_size = get_recommended_chunk_size(1000000, 'complex')
sizes = estimate_output_size(100000, 50, 'Polygon')
print(f"Estimated output size: {sizes['parquet']:.1f} MB")
```

## 🏗️ Architecture

### Package Structure

```
esri_converter/
├── __init__.py           # Main package exports
├── api.py               # Clean API functions
├── exceptions.py        # Custom exceptions
├── converters/
│   ├── __init__.py
│   └── gdb_converter.py # Core conversion logic
└── utils/
    ├── __init__.py
    ├── formats.py       # Format information
    └── validation.py    # Input validation
```

### Key Components

1. **API Layer** (`api.py`): Clean, simple functions for external use
2. **Converter Engine** (`converters/`): Core conversion logic with Rich UI
3. **Utilities** (`utils/`): Validation, format info, and helper functions
4. **Exception Handling** (`exceptions.py`): Comprehensive error types

## 🔧 Technical Details

### Performance Optimizations

- **Streaming Processing**: Handle files larger than RAM
- **Chunked Operations**: Configurable chunk sizes for optimal memory usage
- **Schema Normalization**: Handle mixed data types robustly
- **Compression**: Snappy compression for optimal file sizes
- **Parallel Processing**: Multi-threaded operations where possible

### Data Handling

- **Geometry Storage**: WKT format with spatial bounds for indexing
- **Attribute Preservation**: All original attributes maintained
- **Type Safety**: Robust type normalization and error handling
- **CRS Preservation**: Coordinate reference system information retained

### Memory Management

- **Temporary Files**: Automatic cleanup of intermediate files
- **Lazy Loading**: Process data in streams without loading entire datasets
- **Resource Monitoring**: Track memory usage and processing rates

## 🚨 Error Handling

The package provides comprehensive error handling with custom exception types:

```python
from esri_converter.exceptions import (
    ESRIConverterError,      # Base exception
    ValidationError,         # Input validation errors
    ConversionError,         # Conversion failures
    UnsupportedFormatError,  # Format not supported
    SchemaError,            # Schema-related issues
    FileAccessError         # File I/O problems
)

try:
    result = convert_gdb_to_parquet("data.gdb")
except ValidationError as e:
    print(f"Input validation failed: {e}")
except ConversionError as e:
    print(f"Conversion failed: {e}")
    if hasattr(e, 'source_file'):
        print(f"Source file: {e.source_file}")
```

## 📊 Performance Benchmarks

Typical performance on modern hardware:

| Dataset Size | Records | Processing Rate | Memory Usage | Output Size |
|-------------|---------|----------------|--------------|-------------|
| Small       | 10K     | 50K records/sec | 100MB       | 2-5MB       |
| Medium      | 100K    | 30K records/sec | 200MB       | 20-50MB     |
| Large       | 1M      | 20K records/sec | 300MB       | 200-500MB   |
| Very Large  | 10M+    | 15K records/sec | 400MB       | 2-5GB       |

*Performance varies based on geometry complexity and attribute count.*

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Run the test suite: `pytest`
6. Submit a pull request

### Development Setup

```bash
# Clone the repository
git clone https://github.com/yourusername/esri-converter.git
cd esri-converter

# Install in development mode with all dependencies
pip install -e .[dev,all]

# Run tests
pytest

# Run linting
black esri_converter/
ruff check esri_converter/
mypy esri_converter/
```

## 📄 License

MIT License - see [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- Built with modern Python libraries: [Polars](https://pola.rs/), [Rich](https://rich.readthedocs.io/), [Fiona](https://fiona.readthedocs.io/)
- Inspired by the need for efficient geospatial data processing
- Designed for the cutting-edge open source community of 2025

## 📈 Roadmap

- [ ] Support for additional ESRI formats (Shapefile, MDB, etc.)
- [ ] Multiple output formats (GeoJSON, GeoPackage, CSV)
- [ ] Parallel processing with multiprocessing
- [ ] Cloud storage integration (S3, Azure, GCS)
- [ ] Docker containerization
- [ ] Web API service
- [ ] GUI application

---

**Made with ❤️ for the geospatial community** 