Metadata-Version: 2.4
Name: fasr-asr-funasr
Version: 0.5.2
Summary: Fun ASR model for fasr
Author-email: fasr <wangmengdi06@58.com>
Requires-Python: <3.13,>=3.10
Description-Content-Type: text/markdown
Requires-Dist: fasr
Requires-Dist: funasr

# fasr-asr-funasr

[Chinese documentation](README_ZH.md)

Fun-ASR-Nano speech recognition for fasr. This plugin wraps `funasr.AutoModel`
and writes whole-utterance text to `AudioSpan.raw_text`. It does not produce
word or character timestamps.

## Install

```bash
pip install fasr-asr-funasr
```

## Registered Model

| Registry name | Class | Best for |
|---|---|---|
| `fun_asr_nano` | `FunASRNano` | Lightweight FunASR recognition without timestamps |

The default checkpoint is `FunAudioLLM/Fun-ASR-Nano-2512`.

## Pipeline Usage

```python
from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe("detector", model="fsmn")
    .add_pipe(
        "recognizer",
        model="fun_asr_nano",
        device="cuda:0",
        language="中文",
        itn=True,
    )
    .add_pipe("sentencizer", model="ct_transformer")
)
```

Quick choices:

| Goal | Use | Result |
|---|---|---|
| CPU inference | `device="cpu"` | Runs without CUDA, usually slower |
| First GPU | `device="cuda:0"` | Uses GPU 0 |
| Disable number/date normalization | `itn=False` | Keeps raw spoken text form |
| Force language hint | `language="中文"` | Passes the language label to FunASR |

## Confection Config

```toml
[asr_model]
@asr_models = "fun_asr_nano"
device = "cuda:0"
language = "中文"
itn = true
batch_size = 1
```

Inside a pipeline:

```toml
[pipeline]
@pipelines = "AudioPipeline.v1"
pipe_order = ["recognizer"]

[pipeline.pipes]

[pipeline.pipes.recognizer]
@pipes = "thread_pipe"
batch_size = 4

[pipeline.pipes.recognizer.component]
@components = "recognizer"

[pipeline.pipes.recognizer.component.model]
@asr_models = "fun_asr_nano"
device = "cuda:0"
language = "中文"
itn = true
batch_size = 1
```

## Direct Model Usage

```python
from fasr.config import registry

model = registry.asr_models.get("fun_asr_nano")()
model.device = "cuda:0"

# `transcribe` mutates and returns the input AudioSpan list.
spans = model.transcribe(audio_spans, hotwords=["fasr"], language="中文")
for span in spans:
    print(span.text)
```

Use local weights:

```python
model.load_checkpoint("/path/to/Fun-ASR-Nano")
```

## Parameters

| Parameter | Type / range | Default | Higher / true | Lower / false | Change when |
|---|---|---|---|---|---|
| `device` | `str` | `"cuda:0"` | Selects a CUDA device such as `"cuda:1"` | `"cpu"` runs on CPU | Deployment target changes |
| `language` | `str` | `"中文"` | Stronger language hint | Use another supported label | Audio language is known |
| `itn` | `bool` | `True` | Normalizes numbers, dates, and similar text | Keeps less-normalized text | You need spoken-form output |
| `batch_size` | `int >= 1` | `1` | Better throughput, more memory | Lower memory, less throughput | Batch inference needs tuning |
| `trust_remote_code` | `bool` | `True` | Allows custom checkpoint code | Safer, but some checkpoints may fail | Running untrusted checkpoints |

Generic checkpoint fields such as `checkpoint`, `cache_dir`, `endpoint`,
`revision`, and `force_download` are inherited from the base model.

## Output

- `span.raw_text` is populated.
- `span.tokens` is not populated because Fun-ASR-Nano does not return timestamps.
- `span.text` reads the recognized text.

## Dependencies

- `fasr`
- `funasr`
- Python 3.10-3.12
