Metadata-Version: 2.3
Name: fasr-punc-ct-transformer
Version: 0.5.2
Summary: ct-transformer punctuation model for fasr
Author: osc
Author-email: osc <790990241@qq.com>
Requires-Dist: fasr
Requires-Dist: funasr
Requires-Python: >=3.10, <3.13
Description-Content-Type: text/markdown

# fasr-punc-ct-transformer

[Chinese documentation](README_ZH.md)

CT-Transformer punctuation restoration for fasr. Use it as the `sentencizer`
stage after ASR to split raw recognized text into punctuated `AudioSpan`
sentences.

## Install

```bash
pip install fasr-punc-ct-transformer
```

## Registered Model

| Registry name | Class | Best for |
|---|---|---|
| `ct_transformer` | `CTTransformerForPunc` | Chinese and mixed Chinese-English punctuation restoration |

The default checkpoint is
`iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch`.

## Pipeline Usage

```python
from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe("detector", model="fsmn")
    .add_pipe("recognizer", model="paraformer")
    .add_pipe(
        "sentencizer",
        model="ct_transformer",
        disable_log=True,
        disable_pbar=True,
    )
)
```

## Confection Config

```toml
[punc_model]
@punc_models = "ct_transformer"
disable_update = true
disable_log = true
disable_pbar = true
```

Inside a pipeline:

```toml
[pipeline]
@pipelines = "AudioPipeline.v1"
pipe_order = ["sentencizer"]

[pipeline.pipes]

[pipeline.pipes.sentencizer]
@pipes = "thread_pipe"

[pipeline.pipes.sentencizer.component]
@components = "sentencizer"

[pipeline.pipes.sentencizer.component.model]
@punc_models = "ct_transformer"
disable_update = true
disable_log = true
disable_pbar = true
```

## Direct Model Usage

```python
from fasr.config import registry

model = registry.punc_models.get("ct_transformer")()
sentences = model.restore("今天天气真好我想出去玩你觉得呢")
for sentence in sentences:
    print(sentence.text)
```

Use local weights:

```python
model.load_checkpoint("/path/to/ct-transformer")
```

## Parameters

| Parameter | Type / range | Default | `true` | `false` | Change when |
|---|---|---|---|---|---|
| `disable_update` | `bool` | `True` | Skips FunASR checkpoint update checks | Lets FunASR check for updates | You need reproducible startup or want update checks |
| `disable_log` | `bool` | `True` | Suppresses backend logs | Shows backend logs | Debugging model loading or inference |
| `disable_pbar` | `bool` | `True` | Hides progress bars | Shows progress bars | Interactive scripts where progress output is useful |

Generic checkpoint fields such as `checkpoint`, `cache_dir`, `endpoint`,
`revision`, and `force_download` are inherited from the base model.

## Notes

- `restore(text)` returns an `AudioSpanList`, not a plain string.
- Input text should already be recognized text. This plugin does not run ASR.
- For pipeline usage, put this model on the `sentencizer` component.

## Dependencies

- `fasr`
- `funasr`
- Python 3.10-3.12
