Metadata-Version: 2.4
Name: safesignal-geo
Version: 0.2.0
Summary: Nigerian-language geo-tagging and gazetteer resolution (Pidgin, Hausa, Igbo, Yoruba, English).
Project-URL: Homepage, https://github.com/mr-tanta/safesignal-geo
Project-URL: Repository, https://github.com/mr-tanta/safesignal-geo
Project-URL: Issues, https://github.com/mr-tanta/safesignal-geo/issues
Author-email: Abraham Esandayinze Tanta <sir.tanta@gmail.com>
License: Apache-2.0
License-File: LICENSE
Keywords: african-languages,gazetteer,geo,geocoding,low-resource,ner,nigeria,nlp
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Requires-Dist: rank-bm25>=0.2.2
Provides-Extra: tagger
Requires-Dist: huggingface-hub>=0.24; extra == 'tagger'
Requires-Dist: peft>=0.11; extra == 'tagger'
Requires-Dist: tokenizers>=0.15; extra == 'tagger'
Requires-Dist: torch>=2.0; extra == 'tagger'
Requires-Dist: transformers>=4.40; extra == 'tagger'
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == 'test'
Description-Content-Type: text/markdown

<div align="center">

# SafeSignal-Geo

**An open-source geolocation library for informal Nigerian place names.**

*"Oshodi under-bridge"* · *"behind Shoprite Surulere"* · *"Mile 2 last bus stop"* · *"Berger junction"*

[![License: Apache 2.0](https://img.shields.io/badge/Code-Apache%202.0-blue.svg)](LICENSE)
[![Data: CC-BY 4.0](https://img.shields.io/badge/Data-CC--BY%204.0-lightgrey.svg)](LICENSE-DATA)
[![Weights: OpenRAIL-M](https://img.shields.io/badge/Weights-OpenRAIL--M-orange.svg)](LICENSE-WEIGHTS)
[![Status: Pre-alpha](https://img.shields.io/badge/Status-Pre--alpha-red.svg)](#status)

</div>

---

## The problem

Across Nigerian apps — preventive safety, ride-hailing, last-mile logistics, journalism, humanitarian response — **the place names that matter most are not in any gazetteer**.

```text
Input:   "trouble for Berger this morning, dem don block road"
Want:    [{name: "Berger Junction", lat: 6.5841, lng: 3.3790, confidence: 0.91}]
```

Existing geoparsers ([Mordecai 3](https://github.com/ahalterman/mordecai), [Edinburgh Geoparser](https://www.ltg.ed.ac.uk/software/geoparser/), [Nominatim](https://nominatim.org)) collapse on Nigerian vernacular geography. They rely on [GeoNames](https://www.geonames.org), which is sparse below the LGA level and has zero entries for *under-bridges*, *junctions*, *last bus stops*, *behind-landmark* references, or named markets.

**SafeSignal-Geo is the first open library to treat informal Nigerian geolocation as the primary problem, not an edge case.**

## What you get

- **`safesignal-geo`** — a Python package: `pip install safesignal-geo`. Ships today with a 41K-row Nigerian gazetteer, BM25 resolver, and a context-aware reranker (Top-1 0.93, P95 65 ms on CPU).
- **`safesignal-geo-base`** — a ~270M-parameter AfroXLMR + LoRA span tagger. Trained on the v0.2 corpus (partial F1 0.749 on v0.2 dev gold); weights publishing to Hugging Face in v0.2.0.
- **`safesignal-geo-gazetteer`** — the bundled Nigerian gazetteer, CC-BY 4.0. v0.2 is a 42,315-row national extract with an OSM-mined informal-place index (junctions, bus stops, motor parks, markets) for Lagos / FCT / Rivers / Kano.
- **A public eval leaderboard** — planned for v1.0; submit your own model, see how it ranks.

## Quickstart

```bash
pip install safesignal-geo
```

```python
from safesignal_geo import Geo

geo = Geo()  # bundled v0.2 Nigerian gazetteer (~42k records)

for hit in geo.resolve("Traffic don jam for Ojuelegba this morning."):
    print(hit.canonical_name, hit.admin_state, hit.admin_lga, hit.lat, hit.lng)

# Ojuelegba Lagos Surulere 6.5093742 3.3665407
```

Or from the command line:

```bash
safesignal-geo resolve "incident at Computer Village, Lagos"
safesignal-geo resolve --span "Bauchi" --context "today in Bauchi state"
```

The library bundles the gazetteer, BM25 resolver, and a context-aware
reranker that uses Nigerian-state priors and co-mention signals. The
fine-tuned span tagger is an optional `[tagger]` extra; weights publish
to Hugging Face in v0.2.0.

## Status

Pre-alpha. Project started **May 2026**. Target v1.0: **August 2026**.

| Milestone | Target | Status |
|---|---|---|
| Gazetteer schema + 500 Lagos seed | May 2026 | ✅ Done (42,315 rows in v0.2 — Lagos/FCT/Rivers/Kano informal-place index growing) |
| Annotation guidelines v0.1 | May 2026 | ✅ Done |
| Span tagger v0.2 (LoRA fine-tune) | June 2026 | ✅ Trained (partial F1 0.749); weights publishing in v0.2.0 |
| Resolver + reranker | July 2026 | ✅ Done (Top-1 0.93, P95 65 ms; heuristic reranker — learned cross-encoder is a v1.1 candidate) |
| Pip-installable package | July 2026 | ✅ Done (v0.1.0) |
| Public Gradio demo | July 2026 | 🚧 In progress (rewrite to call the package + HF Space deploy) |
| **v1.0 public release** | **August 2026** | ⏳ Planned |

### v0.2 numbers vs. spec

| Metric                              | Target | v0.2 actual | status |
|-------------------------------------|-------:|------------:|--------|
| Span F1 (partial)                   | ≥ 0.82 | 0.749 | working toward |
| Top-1 resolution accuracy           | ≥ 0.70 | **0.931** | pass (+0.231) |
| Top-3 recall                        | ≥ 0.88 | **0.970** | pass (+0.090) |
| Latency P95 (CPU)                   | ≤ 200 ms | **65.30 ms** | pass (3.1× headroom) |

Reproducibility commands and full slice breakdowns:
[`docs/benchmarks/v0.2.md`](docs/benchmarks/v0.2.md). Design doc:
[`docs/design-doc.md`](docs/design-doc.md).

## How to contribute

You don't need to be an ML engineer. The single most valuable thing you can do is **add 10 places from your neighborhood** to the gazetteer. That takes 30 minutes and meaningfully moves v1.0 forward.

1. **Add a place** — open an issue with the [`Add a place`](.github/ISSUE_TEMPLATE/add-a-place.md) template, or submit a PR to `gazetteer/contributions/`.
2. **Annotate text** — once the [Label Studio](https://labelstud.io) instance is live (Month 1), pick up annotation tasks.
3. **Flag an error** — wrong coordinates? duplicate? missing alias? File an issue with [`Flag an error`](.github/ISSUE_TEMPLATE/flag-an-error.md).
4. **Code & model contributions** — see [CONTRIBUTING.md](CONTRIBUTING.md).

All contributions are credited in the dataset's per-row `source` field.

## Coverage

**v0.2 (bundled):** Nigeria-wide gazetteer (42,315 rows). Per-state informal-place counts in the four priority cities:

| city | junctions | bus_stops | motor_parks | markets |
|---|---:|---:|---:|---:|
| Lagos | 100 | 114 | 23 | 73 |
| FCT (Abuja) | 1 | 16 | 4 | 52 |
| Rivers (Port Harcourt) | 1 | 19 | 9 | 18 |
| Kano | 0 | 2 | 1 | 13 |

**v1.0 (August 2026):** grow informal-place rows via Nominatim retries and community contributions; OSM PBF mining was largely exhausted in v0.2.

**v1.1+:** Ibadan, Benin City, Onitsha, Aba, Enugu, Kaduna. Native Yoruba/Hausa/Igbo support with [Masakhane](https://www.masakhane.io) collaboration. Learned cross-encoder reranker.

## Scope: what this is *not*

- Not a routing engine.
- Not reverse geocoding (lat/lng → name).
- Not a safety / incident / crime model. SafeSignal-Geo is **only geolocation**. Domain logic lives downstream.
- Not trained on [ACLED](https://acleddata.com/eula) data (their EULA prohibits ML training).
- Not used for security-force tracking. The library does not include surveillance categories.

## License stack

| Artifact | License |
|---|---|
| Code | [Apache 2.0](LICENSE) |
| Data (gazetteer + spans) | [CC-BY 4.0](LICENSE-DATA) |
| Model weights | [OpenRAIL-M](LICENSE-WEIGHTS) |
| Documentation | CC-BY 4.0 |

OpenRAIL-M restricts use of the weights for surveillance, military, and discriminatory applications. We chose this deliberately given Nigeria's [surveillance context](https://www.amnesty.org/en/latest/news/2024/04/nigeria-authorities-must-stop-the-misuse-of-cybercrime-act-to-target-journalists-and-activists/).

## Acknowledgements

SafeSignal-Geo is incubated by [Jyv Tech LLC](https://chipon.io) and built around [Chipon](https://chipon.io)'s preventive-safety platform as the anchor user. We build on top of:

- [AfroXLMR](https://huggingface.co/Davlan/afro-xlmr-base) — Adelani et al., 2022
- [MasakhaNER](https://github.com/masakhane-io/masakhane-ner) — Adelani et al., 2021 (evaluation only)
- [OpenStreetMap](https://www.openstreetmap.org) Nigeria contributors (ODbL)
- The [Masakhane](https://www.masakhane.io), [Data Science Nigeria](https://www.datasciencenigeria.org), and [AfricaNLP](https://sites.google.com/view/africanlp2025) communities

## Citation

```bibtex
@software{safesignal_geo_2026,
  author  = {Tanta, Abraham Esandayinze},
  title   = {SafeSignal-Geo: An Open-Source Geolocation Library for Informal Nigerian Place Names},
  year    = {2026},
  url     = {https://github.com/mr-tanta/safesignal-geo},
  organization = {Jyv Tech LLC}
}
```

## Contact

- Maintainer: Abraham Esandayinze Tanta · [abraham@jyvtechllc.com](mailto:abraham@jyvtechllc.com)
- Issues: [github.com/mr-tanta/safesignal-geo/issues](https://github.com/mr-tanta/safesignal-geo/issues)
- Discussions: [github.com/mr-tanta/safesignal-geo/discussions](https://github.com/mr-tanta/safesignal-geo/discussions)
