Metadata-Version: 2.4
Name: indian-address-parser
Version: 0.1.2
Summary: Parse raw Indian address strings into structured fields using a fine-tuned Qwen3 LoRA adapter
Project-URL: Homepage, https://github.com/innerkorehq/indian-address-parser
Project-URL: Repository, https://github.com/innerkorehq/indian-address-parser
Project-URL: Model, https://huggingface.co/gagan1985/qwen3-0.6b-indian-address-parser
Author: Gagandeep
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: address-parsing,india,lora,named-entity-recognition,nlp,qwen
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Requires-Dist: huggingface-hub>=0.25.0
Requires-Dist: peft>=0.12.0
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.51.0
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == 'test'
Description-Content-Type: text/markdown

# indian-address-parser

Parse raw, unstructured Indian address strings into 13 structured fields using a
Qwen3-0.6B model fine-tuned with LoRA. Model weights are downloaded automatically from
[Hugging Face](https://huggingface.co/gagan1985/qwen3-0.6b-indian-address-parser) — this
package ships only inference code, no weights.

```
Input:  "FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029"
Output: {"houseNumber": "FLAT NO.32", "houseName": "UTTARA TOWERS", "poi": null,
         "street": "MG ROAD", "subsubLocality": null, "subLocality": null, "locality": null,
         "village": null, "subDistrict": null, "district": "Kamrup", "city": "GUWAHATI",
         "state": "AS", "pincode": "781029"}
```

## Install

```bash
pip install indian-address-parser
```

## Usage

### Python

```python
from indian_address_parser import AddressParser

parser = AddressParser()  # downloads model weights from HF on first use
result = parser.parse("FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029")
print(result)

# Batch
results = parser.parse_batch([addr1, addr2, addr3])
```

### CLI

```bash
# Single address
indian-address-parser "FLAT NO.32, UTTARA TOWERS, MG ROAD GUWAHATI , Kamrup Unclassified AS 781029"

# Batch from stdin
cat addresses.txt | indian-address-parser --stdin

# Batch from a file, JSONL output
indian-address-parser --file addresses.txt --out results.jsonl
```

## Fields

```
houseNumber, houseName, poi, street, subsubLocality, subLocality,
locality, village, subDistrict, district, city, state, pincode
```

Any field not present in the address is `null`. If the model output can't be parsed as
JSON, all fields are `null` and a `_parse_error` key holds the raw model output.

## Model details, evaluation metrics, and known limitations

See the [model card](https://huggingface.co/gagan1985/qwen3-0.6b-indian-address-parser)
for training data, LoRA config, per-field evaluation results (100% JSON parse rate, 82.4%
mean field accuracy on held-out test data), and known limitations (locality/subLocality/
subsubLocality/village field-boundary ambiguity, etc.).

## Apple Silicon (MLX) users

This package uses `transformers`+`peft`, which works on CUDA, MPS, and CPU but is not the
fastest path on Apple Silicon. For MLX-native inference, see the `mlx/` subfolder of the
[Hugging Face repo](https://huggingface.co/gagan1985/qwen3-0.6b-indian-address-parser/tree/main/mlx)
instead.

## License

Apache 2.0 (matching the base model, Qwen/Qwen3-0.6B).
