Metadata-Version: 2.4
Name: urban-worm
Version: 0.2.0
Summary: Workflow of reproducible multimodal inference for urban environment evaluation.
Author-email: Xiaohao Yang <xiaohaoy111@gmail.com>
License: MIT License
        
        Copyright (c) 2025 Xiaohao Yang
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/billbillbilly/urbanworm
Project-URL: Documentation, https://billbillbilly.github.io/urbanworm/
Project-URL: Repository, https://github.com/billbillbilly/urbanworm
Project-URL: Issues, https://github.com/billbillbilly/urbanworm/issues
Project-URL: Changelog, https://github.com/billbillbilly/urbanworm/blob/main/CHANGELOG.md
Keywords: urbanworm,urban-worm,street-view,mapillary,flickr,freesound,mllm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: GIS
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: huggingface_hub>=0.20
Requires-Dist: pandas>=2.0
Requires-Dist: geopandas>=0.14
Requires-Dist: numpy>=1.24
Requires-Dist: pydantic>=2.0
Requires-Dist: Pillow>=10.0
Requires-Dist: opencv-python>=4.8
Requires-Dist: matplotlib>=3.7
Requires-Dist: requests>=2.31
Requires-Dist: tqdm>=4.66
Requires-Dist: pyproj>=3.6
Requires-Dist: shapely>=2.0
Requires-Dist: mercantile>=1.2
Provides-Extra: ollama
Requires-Dist: ollama>=0.3; extra == "ollama"
Provides-Extra: audio
Requires-Dist: pydub>=0.25; extra == "audio"
Provides-Extra: llamacpp
Provides-Extra: unsloth
Requires-Dist: unsloth>=2025.1.1; extra == "unsloth"
Requires-Dist: transformers>=4.45; extra == "unsloth"
Requires-Dist: accelerate>=0.34; extra == "unsloth"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-cov>=5.0; extra == "dev"
Requires-Dist: ruff>=0.6; extra == "dev"
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Requires-Dist: pre-commit>=3.7; extra == "dev"
Provides-Extra: all
Requires-Dist: ollama>=0.3; extra == "all"
Requires-Dist: pydub>=0.25; extra == "all"
Dynamic: license-file

[![image](https://img.shields.io/pypi/v/urban-worm.svg)](https://pypi.python.org/pypi/urban-worm)
[![PyPI Downloads](https://static.pepy.tech/badge/urban-worm)](https://pepy.tech/project/urban-worm)
[![PyPI Downloads](https://static.pepy.tech/badge/urban-worm/week)](https://pepy.tech/projects/urban-worm)
[![Docs](https://img.shields.io/badge/docs-latest-blue)](https://billbillbilly.github.io/urbanworm/)
[![image](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/billbillbilly/urbanworm/blob/main/docs/example_colab.ipynb)

<picture>
  <img alt="logo" src="docs/images/urabn_worm_logo.png" width="100%">
</picture>

# Urban-WORM

## Introduction
Urban-**WORM** (**W**orkflow **O**f **R**eproducible **M**ultimodal Inference) is a user-friendly high-level interface that 
is designed for adding rich and meaningful captions for crowdsourced data with geotags using multimodal models. 
Urban-WORM can support the batched analysis of images and sounds for investigating urban environments at scales. 
The investigation may cover topics about building conditions, street appearance, people's activities, etc.

- Free software: MIT license
- Website/Documentation: [https://billbillbilly.github.io/urbanworm/](https://billbillbilly.github.io/urbanworm/)

<picture>
  <img alt="workflow" src="docs/images/urabn_worm_diagram.png" width="90%">
</picture>

## Features
- Collect geotagged data (Mapillary street views, Flickr photos, and Freesound audios) via APIs 
within the proximity of building footprints (or other POIs)
- Calibrate the orientation of the panorama street views to look at given locations
- Filter out personal photo using face detection
- Divide sound recording to multiple clips with given duration
- Support (batched) multiple data input with multimodal models

## Installation

### 1 install the package
The package `urban-worm` can be installed with `pip`:
```sh
pip install urban-worm
```

For optional features, install the appropriate extras:
```sh
pip install "urban-worm[ollama]"        # adds the Ollama Python client
pip install "urban-worm[audio]"         # adds pydub for audio slicing
pip install "urban-worm[all]"           # everything above
pip install "urban-worm[dev]"           # dev tools (pytest, ruff, build)
```

### 2 Inference with llama.cpp
To run more pre-quantized models with vision capabilities, please install pre-built version of llama.cpp:
``` sh
# Windows
winget install llama.cpp

# Mac and Linux
brew install llama.cpp
```
More information about the installation 
[here](https://github.com/ggml-org/llama.cpp/blob/master/docs/install.md)

More GGUF models can be found at the Hugging Face pages 
[here](https://huggingface.co/collections/ggml-org/multimodal-ggufs-68244e01ff1f39e5bebeeedc) and [here](https://huggingface.co/models?pipeline_tag=image-text-to-text&sort=trending&search=gguf)

### 3 Inference with Unsloth (fast local VLM)

For faster local VLM inference on a CUDA GPU (typically 2–4× faster than
Ollama, with optional GPU batching), install the `unsloth` extra:

```sh
pip install "urban-worm[unsloth]"
```

Then:

```python
from urbanworm import InferenceUnsloth

infer = InferenceUnsloth(
    llm="unsloth/Qwen3-VL-3B-Instruct",   # or Qwen3-VL-8B, gemma-3-4b-it, etc.
    load_in_4bit=True,
    schema={"answer": (bool, ...), "explanation": (str, ...)},
    images=["docs/data/img_1.jpg", "docs/data/img_2.jpg"],
)
df = infer.batch_inference(
    system="You are analyzing urban scenes.",
    prompt="Is there a tree?",
    batch_size=4,            # batch >1 trades VRAM for throughput
    max_new_tokens=256,
)
```

Tested small VLMs: `unsloth/Qwen3-VL-3B-Instruct`, `unsloth/Qwen3-VL-8B-Instruct`,
`unsloth/gemma-3-4b-it`, `unsloth/Qwen2-VL-2B-Instruct`,
`unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit`. Any vision checkpoint that
`unsloth.FastVisionModel` can load should work.

Audio inference is not supported via Unsloth.

### 4 Inference with Ollama client

Please make sure [Ollama](https://ollama.com/) is installed before using urban-worm if you plan to rely on Ollama

For Linux, users can also install ollama by running in the terminal:
```sh
curl -fsSL https://ollama.com/install.sh | sh
```
For MacOS, users can also install ollama using `brew`:
```sh
brew install ollama
```

To install `brew`, run in the terminal:
```sh
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```

Windows users should directly install the [Ollama client](https://ollama.com/)

To install the development version from this repo:
``` sh
pip install -e git+https://github.com/billbillbilly/urbanworm.git#egg=urban-worm
```

## Usage

```python
from urbanworm.inference.llama import InferenceOllama

data = InferenceOllama(image = 'docs/data/img_1.jpg')
system = '''
    Your answer should be based only on your observation. 
    The format of your response must include answer (yes/True or no/False), explanation (within 50 words)
'''
prompt = '''
    Is there a tree?
'''

data.llm = "hf.co/ggml-org/InternVL3-8B-Instruct-GGUF:Q8_0"
data.schema = {
    "answer": (bool, ...),
    "explanation": (str, ...)
}
data.one_inference(system=system, prompt=prompt)
```
More examples can be found [here](docs/1_basic_inference.ipynb).

## To do
v0.1.x:
- [x] A module for collecting social media data (Flickr and Freesound)
- [x] A method for inferencing sound recordings

v0.2.x:
- [ ] A web UI providing interactive operation and data visualization

## Legal Notice
This repository and its content are provided for educational and research purposes only. By using the information and 
code provided, users acknowledge that they are using the APIs and models at their own risk and agree to comply with any 
applicable laws and regulations. 

## Acknowledgements
The package is heavily built on llama.cpp and Ollama. Credit goes to the developers of these projects.
- [llama.cpp](https://github.com/ggml-org/llama.cpp/tree/master)
- [ollama](https://github.com/ollama/ollama)
- [ollama-python](https://github.com/ollama/ollama-python)
- [unsloth]()


The functionality about sourcing and processing GIS data and image processing is built on the following open projects. 
Credit goes to the developers of these projects.
- [GlobalMLBuildingFootprints](https://github.com/microsoft/GlobalMLBuildingFootprints)
- [Equirec2Perspec](https://github.com/fuenwang/Equirec2Perspec)
- [Mapillary API](https://www.mapillary.com/developer/api-documentation)
- [Flickr API](https://www.flickr.com/services/api/)
- [Freesound API](https://freesound.org/apiv2/apply)

The development of this package is supported and inspired by the city of Detroit.
