Metadata-Version: 2.4
Name: walkingpandas
Version: 0.1.1
Summary: Python tools for walk and path data on networks
License-File: LICENSE
Requires-Python: >=3.13
Requires-Dist: duckdb>=0.10.0
Requires-Dist: geopandas>=0.14.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: shapely>=2.0.0
Requires-Dist: tqdm>=4.67.3
Provides-Extra: dev
Requires-Dist: pytest-cov>=7.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Requires-Dist: mkdocs>=1.6; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24; extra == 'docs'
Provides-Extra: mapmatch
Requires-Dist: osmium>=4.3.0; extra == 'mapmatch'
Requires-Dist: osmnx>=2.0; extra == 'mapmatch'
Requires-Dist: pyvalhalla>=3.6; extra == 'mapmatch'
Requires-Dist: requests>=2.28; extra == 'mapmatch'
Provides-Extra: networkx
Requires-Dist: networkx>=3.0; extra == 'networkx'
Provides-Extra: osmnx
Requires-Dist: osmnx>=2.0; extra == 'osmnx'
Description-Content-Type: text/markdown

# walkingpandas

A Python library for massive-scale walk and path analysis on network topologies.

**walkingpandas** bridges static graph theory (network topologies) with massive-scale sequential data — any process that can be modelled as a walk on a graph. DuckDB is the analytical engine, Parquet the storage layer, and the API feels like pandas while safely handling out-of-core computation on datasets of any size.

**Use cases:** pedestrian and vehicle mobility, clickstream / user-journey analysis, supply-chain and logistics flows, trade and transaction routing, biological pathway traversal, communication network traces — anything where entities move step-by-step through a graph.

## Features

- **Lazy Evaluation**: Build up queries without executing until `.compute()` is called
- **Out-of-Core Processing**: Handle datasets that don't fit in memory using DuckDB and Parquet
- **Integer-Native IDs**: All node/edge IDs are stored as BIGINT for fast joins and compact storage; string IDs are auto-translated transparently
- **Map Matching** (optional): Turn raw GPS traces into walk data via [Valhalla](https://valhalla.github.io/valhalla/) (`pip install walkingpandas[mapmatch]`)
- **Spatial Filtering**: Reverse spatial filtering for fast geographic queries
- **Temporal Queries**: Handle time-less, single-timestamp, and dwell-time data scenarios
- **Graph Validation**: Filter simple paths, cycles, and complex walks
- **Pandas-like API**: Familiar interface for data scientists

## Installation

```bash
pip install walkingpandas
```

## Quick Start

```python
import walkingpandas as wp

# 1. Network (static topology)
network = wp.Network(
    nodes="data/network/nodes.parquet",
    edges="data/network/edges.parquet"
)

# 2. WalkFrame (walks + network)
walks = wp.WalkFrame.from_parquet("data/walks/*.parquet", network=network)

# 3. Lazy query chain; nothing runs until .compute()
result = (
    walks
    .only_simple_paths()
    .passing_through([42])
    .filter(time_range=('08:00', '09:00'))
    .edge_frequencies()
    .compute()
)
```

Or as a one-liner when you have nodes, edges, and walks as separate Parquet paths:

```python
walks = wp.read_dataset(
    nodes="data/network/nodes.parquet",
    edges="data/network/edges.parquet",
    walks="data/walks/*.parquet"
)
traffic = walks.edge_frequencies().compute()
```

## Documentation

Full documentation is available in the `docs/` directory. To build and view locally:

```bash
pip install walkingpandas[docs]
mkdocs serve
```

Then open [http://127.0.0.1:8000](http://127.0.0.1:8000) in your browser.

## License

MIT License - see [LICENSE](LICENSE) for details.

## Contributing

Contributions welcome! Please open an issue or submit a pull request.

## Citation

If you use walkingpandas in your research, please cite:

```bibtex
@software{walkingpandas,
  title = {walkingpandas: Massive-Scale Walk and Path Analysis on Network Topologies},
  author = {Jürgen Hackl},
  year = {2026},
  url = {https://github.com/cisgroup/walkingpandas}
}
```
