Metadata-Version: 2.4
Name: langchain-funasr
Version: 0.1.0
Summary: LangChain integration for FunASR (SenseVoice / Paraformer / Fun-ASR-Nano) speech-to-text
Author: FunASR
License: Apache-2.0
Project-URL: Homepage, https://github.com/modelscope/FunASR
Project-URL: Source, https://github.com/FunAudioLLM/langchain-funasr
Keywords: langchain,funasr,sensevoice,paraformer,speech-to-text,asr,audio
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: langchain-core>=0.3
Requires-Dist: funasr>=1.1.0
Requires-Dist: librosa

# 🦜🔗 langchain-funasr

[FunASR](https://github.com/modelscope/FunASR) integration for [LangChain](https://github.com/langchain-ai/langchain) — transcribe audio to LangChain `Document`s with **self-hosted** speech-to-text.

Powered by [SenseVoice](https://github.com/FunAudioLLM/SenseVoice) / Paraformer / Fun-ASR-Nano: runs **locally, no cloud API**, strong on Chinese and 50+ languages.

## Install

```bash
pip install langchain-funasr
```

## Usage

```python
from langchain_funasr import FunASRLoader

loader = FunASRLoader("meeting.wav", model="iic/SenseVoiceSmall", device="cuda")
docs = loader.load()
print(docs[0].page_content)
```

Use the parser directly with blob pipelines:

```python
from langchain_core.document_loaders import Blob
from langchain_funasr import FunASRParser

parser = FunASRParser(model="FunAudioLLM/SenseVoiceSmall", hub="hf", device="cuda")
docs = list(parser.lazy_parse(Blob.from_path("audio.wav")))
```

## Options

| Arg | Default | Notes |
|-----|---------|-------|
| `model` | `iic/SenseVoiceSmall` | Any FunASR model (SenseVoice / Paraformer / Fun-ASR-Nano) |
| `hub` | `ms` | `ms` (ModelScope) or `hf` (HuggingFace) |
| `device` | `cpu` | e.g. `cuda`, `cuda:0` |
| `language` | `auto` | SenseVoice: `auto`/`zh`/`en`/`yue`/`ja`/`ko` |
| `vad_model` | `fsmn-vad` | Built-in VAD handles long audio of any length |

## Why FunASR

- **Self-hosted** — no API keys, no data leaving your machine.
- **Fast** — SenseVoice is non-autoregressive, far faster than Whisper.
- **Strong on Chinese** + 50+ languages.

⭐ If this helps, star [FunASR](https://github.com/modelscope/FunASR) and [SenseVoice](https://github.com/FunAudioLLM/SenseVoice).

## License

Apache-2.0
