Metadata-Version: 2.4
Name: soildb
Version: 0.7.2
Summary: A Python client for accessing USDA Soil Data Access (SDA) web service
Project-URL: Homepage, https://github.com/brownag/py-soildb
Project-URL: Repository, https://github.com/brownag/py-soildb
Project-URL: Documentation, https://py-soildb.readthedocs.io
Project-URL: Bug Tracker, https://github.com/brownag/py-soildb/issues
Author-email: Andrew Brown <brown.andrewg@gmail.com>
License: MIT
License-File: LICENSE
Keywords: agriculture,gis,sda,soil,ssurgo,usda
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: GIS
Requires-Python: >=3.9
Requires-Dist: aiosqlite>=0.19.0
Requires-Dist: httpx>=0.24.0
Provides-Extra: all
Requires-Dist: bandit>=1.7.0; extra == 'all'
Requires-Dist: build>=1.0.0; extra == 'all'
Requires-Dist: editables>=0.3; extra == 'all'
Requires-Dist: geopandas>=0.13.0; extra == 'all'
Requires-Dist: hatchling>=1.0.0; extra == 'all'
Requires-Dist: ipython>=8.0.0; extra == 'all'
Requires-Dist: jupyter>=1.0.0; extra == 'all'
Requires-Dist: mypy>=1.0.0; extra == 'all'
Requires-Dist: nest-asyncio>=1.5.0; extra == 'all'
Requires-Dist: pandas>=1.5.0; extra == 'all'
Requires-Dist: polars>=0.18.0; extra == 'all'
Requires-Dist: pre-commit>=3.0.0; extra == 'all'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'all'
Requires-Dist: pytest-cov>=4.0.0; extra == 'all'
Requires-Dist: pytest-httpx>=0.21.0; extra == 'all'
Requires-Dist: pytest-timeout>=2.1.0; extra == 'all'
Requires-Dist: pytest>=7.0; extra == 'all'
Requires-Dist: quarto>=0.1.0; extra == 'all'
Requires-Dist: quartodoc>=0.7.0; extra == 'all'
Requires-Dist: ruff>=0.1.0; extra == 'all'
Requires-Dist: safety>=2.3.0; extra == 'all'
Requires-Dist: shapely>=2.0.0; extra == 'all'
Requires-Dist: soilprofilecollection>=0.2.0; extra == 'all'
Provides-Extra: dataframes
Requires-Dist: pandas>=1.5.0; extra == 'dataframes'
Requires-Dist: polars>=0.18.0; extra == 'dataframes'
Provides-Extra: dev
Requires-Dist: build>=1.0.0; extra == 'dev'
Requires-Dist: editables>=0.3; extra == 'dev'
Requires-Dist: hatchling>=1.0.0; extra == 'dev'
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pre-commit>=3.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest-httpx>=0.21.0; extra == 'dev'
Requires-Dist: pytest-timeout>=2.1.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: quarto>=0.1.0; extra == 'dev'
Requires-Dist: quartodoc>=0.7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: quarto>=0.1.0; extra == 'docs'
Requires-Dist: quartodoc>=0.7.0; extra == 'docs'
Provides-Extra: jupyter
Requires-Dist: ipython>=8.0.0; extra == 'jupyter'
Requires-Dist: jupyter>=1.0.0; extra == 'jupyter'
Requires-Dist: nest-asyncio>=1.5.0; extra == 'jupyter'
Provides-Extra: security
Requires-Dist: bandit>=1.7.0; extra == 'security'
Requires-Dist: safety>=2.3.0; extra == 'security'
Provides-Extra: soil
Requires-Dist: soilprofilecollection>=0.2.0; extra == 'soil'
Provides-Extra: spatial
Requires-Dist: geopandas>=0.13.0; extra == 'spatial'
Requires-Dist: shapely>=2.0.0; extra == 'spatial'
Provides-Extra: test
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'test'
Requires-Dist: pytest-cov>=4.0.0; extra == 'test'
Requires-Dist: pytest-httpx>=0.21.0; extra == 'test'
Requires-Dist: pytest-timeout>=2.1.0; extra == 'test'
Requires-Dist: pytest>=7.0; extra == 'test'
Description-Content-Type: text/markdown

# soildb


[![PyPI
version](https://badge.fury.io/py/soildb.svg)](https://pypi.org/project/soildb/)
[![License:
MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)

Python client for the USDA-NRCS Soil Data Access (SDA) web service and other
National Cooperative Soil Survey data sources.

## Overview

`soildb` provides Python access to:

- **Soil Survey Data**: USDA Soil Data Access (SDA) web service for SSURGO/STATSGO
- **Laboratory Data**: NCSS Kellogg Soil Survey Laboratory (KSSL) characterization data
- **Bulk Downloads**: Complete SSURGO/STATSGO datasets from Web Soil Survey
- **Multiple Backends**: Query data from SDA web service, local SQLite snapshots, or GeoPackage files

Query soil survey data via web service or local database, export to pandas/polars DataFrames,
and handle spatial queries.

## Installation

``` bash
pip install soildb
```

For spatial functionality:

``` bash
pip install soildb[spatial]
```

For all optional features support:

``` bash
pip install soildb[all]
```

## Features

### Soil Survey Data (SDA)

- Query SSURGO/STATSGO data from NRCS Soil Data Access web service
- Build custom SQL queries with fluent interface
- Spatial queries with points, bounding boxes, and polygons
- Bulk data fetching with automatic pagination and chunking
- Export to pandas and polars DataFrames

### Laboratory Characterization Data

- Access NCSS Kellogg Soil Survey Laboratory (KSSL) pedon data
- Query via SDA web service or local SQLite snapshot databases
- Full horizon-level data with lab analyses
- Structured object models for nested pedon data
- Support for flexible column selection

### Web Soil Survey Downloads

- Download complete SSURGO datasets as ZIP files
- Download STATSGO (general soil map) data
- Concurrent downloads with progress tracking
- Automatic file extraction and organization
- State-wide and custom area selections

### Multi-Backend Support

- Query from SDA web service (live data)
- Query from local SQLite snapshots (offline analysis)
- Support for GeoPackage files with spatial features
- Unified interface across all backends
- Async I/O for high performance and concurrency

## Quick Start

### Query Builder

Build and execute custom SQL queries with the fluent interface:

``` python
from soildb import Query

query = (Query()
        .select("mukey", "muname", "musym")
        .from_("mapunit")
        .inner_join("legend", "mapunit.lkey = legend.lkey")
        .where("areasymbol = 'IA109'")
        .limit(5))

# Inspect the generated SQL
print(query.to_sql())

# Execute and get results
import asyncio
from soildb import SDAClient

async def main():
    result = await SDAClient().execute(query)
    return result.to_pandas()

df = asyncio.run(main())
print(df.head())
```

    SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'
        mukey                                             muname  musym
    0  408337  Colo silty clay loam, channeled, 0 to 2 percen...   1133
    1  408339        Colo silty clay loam, 0 to 2 percent slopes    133
    2  408340        Colo silty clay loam, 2 to 4 percent slopes   133B
    3  408345  Clarion loam, 9 to 14 percent slopes, moderate...  138D2
    4  408348          Harpster silt loam, 0 to 2 percent slopes   1595

## Async vs Synchronous Usage

All soildb functions have both async and synchronous versions. For most use cases, the synchronous `.sync()` version is simpler and easier to use.

### Synchronous Usage

For simple scripts and interactive use, soildb provides synchronous versions of all async functions:

``` python
from soildb import get_mapunit_by_areasymbol

# Synchronous usage - no async/await needed!
mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
```

    Found 80 map units

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        vertical-align: right;
    }
</style>

|  | mukey | musym | muname | mukind | muacres | areasymbol | areaname |
|----|----|----|----|----|----|----|----|
| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |
| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |
| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |
| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |
| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |

</div>

The `.sync` methods automatically manage SDA client connections for you. For multiple calls, consider reusing a client:

``` python
from soildb import SDAClient, get_mapunit_by_areasymbol

client = SDAClient()
mapunits1 = get_mapunit_by_areasymbol.sync("IA109", client=client)
mapunits2 = get_mapunit_by_areasymbol.sync("IA113", client=client)
client.close()
```

### Convenience Functions

soildb provides high-level functions for common tasks:

``` python
from soildb import get_mapunit_by_areasymbol

mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
```

    Found 80 map units

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|  | mukey | musym | muname | mukind | muacres | areasymbol | areaname |
|----|----|----|----|----|----|----|----|
| 0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |
| 1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |
| 2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |
| 3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |
| 4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |

</div>

If you have suggestions for new convenience functions please file a
[feature request on
GitHub](https://github.com/brownag/py-soildb/issues/new).

### Spatial Queries

Query soil data by location with points, bounding boxes, or polygons:

``` python
from soildb import spatial_query

# Point query
response = spatial_query.sync(
    geometry="POINT(-93.6 42.0)",
    table="mupolygon"
)
df = response.to_pandas()
print(f"Point query found {len(df)} results")
```

    Point query found 1 results

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|  | mukey | areasymbol | musym | nationalmusym | muname | mukind |
|----|----|----|----|----|----|----|
| 0 | 411278 | IA169 | 1314 | fsz1 | Hanlon-Spillville complex, channeled, 0 to 2 p... | Complex |

</div>

### Bulk Data Fetching

Retrieve large datasets efficiently with automatic pagination and chunking:

``` python
from soildb import fetch_by_keys, get_mukey_by_areasymbol

# Get mukeys for survey areas
areas = ["IA109", "IA113", "IA117"]
all_mukeys = get_mukey_by_areasymbol.sync(areas)

print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")

# Fetch components in chunks automatically
response = fetch_by_keys.sync(
    all_mukeys, 
    "component", 
    key_column="mukey", 
    chunk_size=100,
    columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]
)
df = response.to_pandas()
print(f"Fetched {len(df)} component records")
```

    Found 410 mukeys across 3 areas
    Fetching 410 keys in 5 chunks of 100
    Fetched 1067 component records

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }
&#10;    .dataframe tbody tr th {
        vertical-align: top;
    }
&#10;    .dataframe thead th {
        text-align: right;
    }
</style>

|     | mukey  | cokey    | compname | localphase | comppct_r |
|-----|--------|----------|----------|------------|-----------|
| 0   | 408333 | 25562547 | Kingston | \<NA\>     | 2         |
| 1   | 408333 | 25562548 | Okoboji  | \<NA\>     | 5         |
| 2   | 408333 | 25562549 | Spicer   | \<NA\>     | 90        |
| 3   | 408333 | 25562550 | Madelia  | \<NA\>     | 3         |
| 4   | 408334 | 25562837 | Okoboji  | \<NA\>     | 5         |
| 5   | 408334 | 25562838 | Glencoe  | \<NA\>     | 3         |
| 6   | 408334 | 25562839 | Canisteo | \<NA\>     | 2         |
| 7   | 408334 | 25562840 | Webster  | \<NA\>     | 85        |
| 8   | 408334 | 25562841 | Nicollet | \<NA\>     | 5         |
| 9   | 408335 | 25562135 | Biscay   | \<NA\>     | 1         |

</div>

The `component` table has a hierarchical relationship:

- mukey (map unit key) is the parent
- cokey (component key) is the child

So when fetching components, you typically want to filter by mukey to
get all components for specific map units.

Use the `fetch_by_keys()` function with the `"mukey"` as the
`key_column` to achieve this with automatic pagination over chunks with
`100` rows each (or specify your own `chunk_size`).

### Bulk Downloads (Web Soil Survey)

Download complete SSURGO and STATSGO datasets as ZIP files from the USDA Web Soil Survey portal:

``` python
from soildb import download_wss

# Download specific survey areas
paths = download_wss.sync(
    areasymbols=["IA109", "IA113"],
    dest_dir="./ssurgo_data",
    extract=True
)
print(f"Downloaded {len(paths)} survey areas")

# Download all survey areas for a state
paths = download_wss.sync(
    where_clause="areasymbol LIKE 'IA%'",
    dest_dir="./iowa_ssurgo",
    extract=True,
    remove_zip=True  # Clean up ZIP files after extraction
)

# Download STATSGO (general soil map) data
paths = download_wss.sync(
    areasymbols=["IA"],
    db="STATSGO",
    dest_dir="./iowa_statsgo",
    extract=True
)
```

Each extracted survey area directory contains:

- `tabular/` - Pipe-delimited TXT files with soil data tables
- `spatial/` - ESRI shapefiles with map unit polygons and boundaries

**Use Cases:**

- **SDA**: Live queries, filtered data, programmatic access to current data
- **WSS Downloads**: Complete offline datasets, bulk data for analysis, static snapshots updated annually

## Async Usage

For performance-critical applications, use async functions directly with concurrent requests:

``` python
import asyncio
from soildb import fetch_by_keys, get_mukey_by_areasymbol

async def concurrent_example():
    # Get mukeys for multiple areas concurrently
    areas = ["IA109", "IA113", "IA117"]
    all_mukeys = await get_mukey_by_areasymbol(areas)
    
    # Fetch components concurrently with automatic pagination
    response = await fetch_by_keys(
        all_mukeys,
        "component",
        key_column="mukey",
        chunk_size=100,
        columns=["mukey", "cokey", "compname", "comppct_r"]
    )
    return response.to_pandas()

# Run async function
df = asyncio.run(concurrent_example())
```

For more async patterns, see the [Async Programming Guide](docs/async.qmd).

# Examples

See the [`examples/` directory](examples/) and [documentation](docs/)
for detailed usage patterns.

## License

This project is licensed under the MIT License. See the
[LICENSE](LICENSE) file for details.
