Metadata-Version: 2.4
Name: ne-embed
Version: 1.0.0
Summary: Multilingual text embeddings for Northeast Indian languages
Author-email: MWire Labs <connect@mwirelabs.com>
License: CC-BY-4.0
Project-URL: Homepage, https://mwirelabs.com
Project-URL: Repository, https://github.com/mwirelabs/ne-embed
Project-URL: HuggingFace, https://huggingface.co/MWirelabs/ne-embed
Keywords: embeddings,nlp,northeast-india,low-resource,multilingual,RAG
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: sentence-transformers>=2.7.0
Requires-Dist: numpy>=1.21.0

# ne-embed

Multilingual text embeddings for Northeast Indian languages.

**10 languages · 768 dimensions · Built on LaBSE · CC-BY-4.0**

## Install

```bash
pip install ne-embed
```

## Usage

```python
from ne_embed import NEEmbed

model = NEEmbed()  # downloads from HuggingFace on first run

sentences = [
    'Where is the nearest hospital?',
    'Ngi la pynjot ia ki shnong baroh',  # Khasi
    'Pilakchin an senganiko man na.',     # Garo
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (3, 768)

model.languages()  # print supported languages
```

## Supported Languages

| Code | Language | Tier |
|------|----------|------|
| asm | Assamese | Supported |
| brx | Bodo | Supported |
| grt | Garo | Supported |
| kha | Khasi | Supported |
| lus | Mizo | Supported |
| mni | Meitei | Supported |
| njz | Nyishi | Supported |
| trp | Kokborok | Limited |
| pbv | Pnar | Limited |
| nag | Nagamese | Limited |

## Links
- [HuggingFace](https://huggingface.co/MWirelabs/ne-embed)
- [MWire Labs](https://mwirelabs.com)
