Metadata-Version: 2.4
Name: name2nat-onnx
Version: 1.0.0
Summary: Nationality prediction from name using ONNX Runtime
Author: Shafayat Hossain Shifat
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/shhossain/name2nat-onnx
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: onnxruntime
Dynamic: license-file

[![image](https://img.shields.io/pypi/v/name2nat-onnx.svg)](https://pypi.org/project/name2nat-onnx/)
[![image](https://img.shields.io/pypi/l/name2nat-onnx.svg)](https://pypi.org/project/name2nat-onnx/)
[![image](https://img.shields.io/pypi/pyversions/name2nat-onnx.svg)](https://pypi.org/project/name2nat-onnx/)

# name2nat-onnx

`name2nat-onnx` is a lightweight inference-focused fork of `name2nat`.

The original project by Kyubyong Park predicts nationality from a name written in Roman letters. This fork keeps that same goal, but changes the deployment path so inference does not need Torch or Flair at runtime.

## Why this project exists

The original package is built around a Flair model checkpoint. That works well for experimentation and training, but it pulls in a heavy runtime stack for simple prediction.

This fork exists to make the model easier to deploy when you care about:

* smaller runtime dependencies
* faster startup
* CPU-friendly batch inference
* processing very large name lists without shipping a Torch stack

## What changed

This project converts the original trained bidirectional GRU classifier into an ONNX model and serves it through ONNX Runtime.

At a high level, inference now works like this:

1. Normalize a name into a character sequence.
2. Convert each character into an index from the original vocabulary.
3. Run the ONNX model on a batch of encoded names.
4. Return the top predicted nationalities.
5. If the name exists in the bundled lookup table, return the exact dictionary hit with score `1.0`.

The shipped runtime path is optimized for prediction only. Training provenance still comes from the original project.

## How it was created

The ONNX model in this repository was produced from the original `name2nat` checkpoint.

The conversion flow is:

1. Load the original Flair checkpoint.
2. Rebuild the equivalent model in plain PyTorch.
3. Copy the learned weights into the rebuilt model.
4. Export that model to ONNX.
5. Save the original vocabulary, labels, and dictionary lookup data in runtime-friendly formats.

The conversion script is in [convert_to_onnx.py](convert_to_onnx.py).

## Disclaimer

The original author's disclaimer still applies: this project is not intended as a political statement. It is a statistical name-classification model, not a definitive statement of identity.

## Installation

```bash
pip install name2nat-onnx
```

With `uv`:

```bash
uv init
uv add name2nat-onnx
```

## Usage

```python
from name2nat import Name2nat

predictor = Name2nat()

results = predictor(
  ["Kyubyong Park", "Takeshi Yamamoto", "Francois Dupont"],
  top_n=3,
)

for item in results:
  print(item)
```

For large input sets:

```python
results = predictor(names, top_n=1, batch_size=4096)
```

## Project Scope

This fork is mainly about runtime packaging and deployment.

If you want the full background on:

* dataset construction
* training details
* the NaNa dataset
* the original research motivation
* the original Flair-based implementation

see the original repository:

* Original repo: [Kyubyong/name2nat](https://github.com/Kyubyong/name2nat)
* Dataset reference: [NaNa dataset on Kaggle](https://www.kaggle.com/bryanpark/nana-dataset)
* Training framework: [Flair](https://github.com/flairNLP/flair)

## Credit

Model idea, dataset creation, training pipeline, and original package design are from Kyubyong Park's `name2nat` project.

This fork focuses on converting that trained model into a faster, lighter ONNX Runtime package for inference.

