Metadata-Version: 2.4
Name: findsylls
Version: 3.1.0
Summary: Unsupervised syllable segmentation, evaluation, and embedding extraction toolkit for speech audio
Author: Héctor Javier Vázquez Martínez
License-Expression: MIT
Project-URL: Homepage, https://github.com/hjvm/findsylls
Project-URL: Issues, https://github.com/hjvm/findsylls/issues
Keywords: speech,syllable,segmentation,audio,phonetics,prosody,unsupervised
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.22
Requires-Dist: scipy>=1.9
Requires-Dist: librosa>=0.10
Requires-Dist: pandas>=1.5
Requires-Dist: soundfile>=0.12
Requires-Dist: matplotlib>=3.7
Requires-Dist: textgrid>=1.5
Requires-Dist: findpeaks>=2.5
Requires-Dist: gammatone>=1.0.0
Requires-Dist: joblib>=1.3
Requires-Dist: tqdm>=4.65
Provides-Extra: viz
Requires-Dist: matplotlib; extra == "viz"
Requires-Dist: seaborn; extra == "viz"
Provides-Extra: end2end
Requires-Dist: sylber>=0.1.2; extra == "end2end"
Requires-Dist: vg-hubert>=1.0.0; extra == "end2end"
Provides-Extra: storage
Requires-Dist: h5py>=3.8; extra == "storage"
Provides-Extra: embedding
Requires-Dist: torch>=2.0; extra == "embedding"
Requires-Dist: torchaudio>=2.0; extra == "embedding"
Requires-Dist: transformers>=4.30; extra == "embedding"
Provides-Extra: all
Requires-Dist: matplotlib; extra == "all"
Requires-Dist: seaborn; extra == "all"
Requires-Dist: sylber>=0.1.2; extra == "all"
Requires-Dist: vg-hubert>=1.0.0; extra == "all"
Requires-Dist: h5py>=3.8; extra == "all"
Requires-Dist: torch>=2.0; extra == "all"
Requires-Dist: torchaudio>=2.0; extra == "all"
Requires-Dist: transformers>=4.30; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Dynamic: license-file

# findsylls

[![PyPI version](https://img.shields.io/pypi/v/findsylls.svg)](https://pypi.org/project/findsylls/)
[![Python versions](https://img.shields.io/pypi/pyversions/findsylls.svg)](https://pypi.org/project/findsylls/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

Language-agnostic toolkit for unsupervised syllable-level speech segmentation, embedding extraction, and evaluation.

findsylls provides a full pipeline from raw audio to clustered syllable embeddings:

- **Envelope computation** — RMS, Hilbert, low-pass, SBS, theta, and neural pseudo-envelopes
- **Syllable segmentation** — classical peak detection and neural end-to-end methods (Sylber, VG-HuBERT)
- **Feature extraction** — MFCC, mel spectrogram, HuBERT, Sylber, VG-HuBERT
- **Syllable embedding** — pooled per-syllable vectors for downstream tasks
- **Unsupervised discovery** — k-means, mini-batch k-means, agglomerative clustering
- **Evaluation** — F1 against TextGrid annotations at phone, syllable, and word granularity
- **Visualization** — waveform, envelope, segmentation, and feature-matrix plots

---

## Install

```bash
pip install findsylls                  # core (classical methods)
pip install 'findsylls[embedding]'     # neural feature extraction (HuBERT, VG-HuBERT)
pip install 'findsylls[end2end]'       # neural segmenters (Sylber, VG-HuBERT)
pip install 'findsylls[viz]'           # plotting extras
pip install 'findsylls[storage]'       # HDF5 corpus storage
pip install 'findsylls[all]'           # everything
```

---

## Quick Start

### 1 — Segment audio into syllables

```python
from findsylls import segment_audio

# Classical: peak detection on an SBS amplitude envelope
syllables, envelope, times = segment_audio(
    "audio.wav",
    method="peakdetect",
    segmentation_kwargs={"envelope_method": "sbs"},
    return_envelope=True,
)

print(f"Found {len(syllables)} syllables")
# syllables: [(start_s, nucleus_s, end_s), ...]
```

![Syllable segmentation on a sample utterance](docs/images/quickstart_segmentation.png)

*Waveform (gray), SBS amplitude envelope (blue), syllable boundaries (green), and detected nuclei (red dots) for a sample utterance.*

---

## Module Guide

### Envelope (`findsylls.envelope`)

The envelope module converts a raw audio waveform into a 1-D amplitude signal. All computers implement `EnvelopeComputer.compute(audio, sr) → (envelope, times)`.

```python
from findsylls.audio.utils import load_audio
from findsylls.envelope import (
    RMSEnvelope, HilbertEnvelope, ThetaEnvelope, SBSEnvelope,
    LowpassEnvelope, CLSAttentionEnvelope, GreedyCosineEnvelope,
)
from findsylls.plotting import plot_multiple_envelopes

audio, sr = load_audio("audio.wav")

envelopes = {}
for name, computer in [
    ("RMS",     RMSEnvelope()),
    ("Hilbert", HilbertEnvelope()),
    ("Theta",   ThetaEnvelope()),
    ("SBS",     SBSEnvelope()),
]:
    env, times = computer.compute(audio, sr)
    envelopes[name] = (env, times)

fig = plot_multiple_envelopes(audio, sr, envelopes)
```

![Envelope method comparison](docs/images/envelope_comparison.png)

*Four classical envelope methods on the same utterance. SBS and Theta track syllabic rhythm most closely; Hilbert and RMS give a more continuous energy contour.*

You can also call the functional dispatch directly:

```python
from findsylls import get_amplitude_envelope

envelope, times = get_amplitude_envelope(audio, sr, method="theta")
```

**Available envelope methods:** `rms`, `hilbert`, `lowpass`, `sbs`, `theta`, `cls_attention`, `greedy_cosine`, `mincut`

---

### Segmentation (`findsylls.segmentation`)

All segmenters return `List[(start_s, nucleus_s, end_s)]`.

#### Classical — peak detection

```python
from findsylls import segment_audio
from findsylls.plotting import plot_multiple_envelope_segmentations
from findsylls.audio.utils import load_audio
from findsylls.envelope import HilbertEnvelope, ThetaEnvelope, SBSEnvelope
from findsylls.segmentation import get_segmenter

audio, sr = load_audio("audio.wav")

results = {}
for name, env_method in [("Hilbert", "hilbert"), ("Theta", "theta"), ("SBS", "sbs")]:
    env_computer = {"hilbert": HilbertEnvelope, "theta": ThetaEnvelope, "sbs": SBSEnvelope}[env_method]()
    env, times = env_computer.compute(audio, sr)
    segmenter = get_segmenter("peakdetect", envelope_method=env_method)
    segments = segmenter.segment(audio=audio, sr=sr)
    results[name] = (env, times, segments)

fig = plot_multiple_envelope_segmentations(audio, sr, results)
```

![Peak detection with three envelope methods](docs/images/peakdetect_segmentation.png)

*The same audio segmented by `peakdetect` using three different envelope methods. Each panel shows how the chosen envelope shape influences where boundaries fall.*

#### Preset segmenters (paper-replication configurations)

Preset classes replicate the exact configurations from published papers. Each carries a `REFERENCE` attribute and a `cite()` method — see [Preset Citations](#preset-citations) below.

```python
from findsylls.segmentation.presets import (
    ThetaOscillatorSegmenter,  # Räsänen et al. 2018 — gammatone + oscillator (no GPU)
    SylberSegmenter,           # Cho et al. 2025 — greedy cosine on Sylber HuBERT
    VGHubertMinCutSegmenter,   # Peng et al. 2023 — SSM MinCut on VG-HuBERT
    VGHubertCLSSegmenter,      # Peng & Harwath 2022 — CLS attention on VG-HuBERT
)
from findsylls.audio.utils import load_audio

audio, sr = load_audio("audio.wav")

# Theta oscillator (no model download, paper defaults: f=5, Q=0.5, N=8)
theta = ThetaOscillatorSegmenter()
syllables = theta.segment(audio, sr=sr)

# Sylber (requires findsylls[end2end])
sylber = SylberSegmenter()
syllables = sylber.segment(audio, sr=sr)

# VG-HuBERT MinCut (syllable mode, layer 8; requires findsylls[end2end])
vgh_mincut = VGHubertMinCutSegmenter(mode="syllable")
syllables = vgh_mincut.segment(audio, sr=sr)

# VG-HuBERT CLS attention (word mode, layer 9; requires findsylls[end2end])
vgh_cls = VGHubertCLSSegmenter(mode="word")
words = vgh_cls.segment(audio, sr=sr)
```

#### Speech Activity Detection and utterance boundaries

Real recordings often contain silence between utterances. Without SAD, peakdetect can
place spurious boundaries in silent regions; neural segmenters waste computation on
silence. Pass `sad=` to any segmenter to restrict segmentation to detected speech regions.

```python
from findsylls.segmentation.presets import ThetaOscillatorSegmenter, SylberSegmenter
from findsylls.audio.utils import load_audio

audio, sr = load_audio("recording_with_pauses.wav")

# Energy-based VAD — fast, no model download
theta = ThetaOscillatorSegmenter(sad="energy")
syllables = theta.segment(audio, sr)

# Silero VAD — more accurate on noisy audio (requires findsylls[end2end])
sylber = SylberSegmenter(sad="silero")
syllables = sylber.segment(audio, sr)
```

`sad="energy"` uses a simple energy threshold; `sad="silero"` uses a small neural VAD
model. Both chunk the audio into speech regions, run the segmenter on each chunk
independently, and reassemble the results with global timestamps.

**`add_utterance_boundaries`** controls whether the segmenter inserts boundary markers
at the onset and offset of each speech region. For envelope-based segmenters (SBS,
Theta) this is done in-algorithm — a valley is injected at both edges before peak
detection, so the algorithm can produce segments that cover the full speech region even
when there is no natural acoustic valley near the edge. Default is `True` for all
segmenters.

```python
# Without SAD but with boundaries: ensures the full audio is covered even when
# the first or last peak has no natural valley on its outer side.
theta = ThetaOscillatorSegmenter(sad=None, add_utterance_boundaries=True)

# With SAD + boundaries (recommended for multi-utterance recordings):
# each speech chunk gets boundary valleys, so no syllable is dropped at a
# region edge due to a missing valley.
theta = ThetaOscillatorSegmenter(sad="energy", add_utterance_boundaries=True)

# Disable if you are passing pre-chunked single-utterance audio and want
# strict valley-only boundaries.
theta = ThetaOscillatorSegmenter(add_utterance_boundaries=False)
```

For neural segmenters (`SylberSegmenter`, `VGHubertMinCutSegmenter`,
`VGHubertCLSSegmenter`) `add_utterance_boundaries` is accepted for API consistency
but has no effect — those algorithms already produce contiguous segmentation of
whatever chunk they receive.

![SAD and boundary insertion comparison](docs/images/sad_boundary_comparison.png)

*SBS (left) and Theta (right) on a Kono recording with 6 speech events separated by silence (~12.4 s total). **Top row:** no SAD, no boundaries — both methods miss entire utterances where no valley bridges the silence gap. **Middle row:** no SAD, boundaries enabled — adds coverage only at the very start and end of the full recording, not between utterances. **Bottom row:** SAD + boundaries (recommended) — each speech region is segmented independently with boundary valleys at its edges; all speech events are captured.*

#### Generic dispatch

```python
from findsylls.segmentation import get_segmenter, list_segmenters, list_segmenter_presets

print(list_segmenters())
# ['peakdetect', 'cls_attention', 'mincut', 'greedy_cosine']

print(list_segmenter_presets())
# {'theta_oscillator': ThetaOscillatorSegmenter, 'sylber': SylberSegmenter, ...}

segmenter = get_segmenter("mincut")
syllables = segmenter.segment(audio, sr=sr)
```

---

### Feature Extraction (`findsylls.features`)

Feature extractors implement `FeatureExtractor.extract(audio, sr) → np.ndarray` (shape: `[T, D]`).

```python
from findsylls.audio.utils import load_audio
from findsylls.features import MFCCExtractor, MelSpectrogramExtractor, HuBERTExtractor
from findsylls.plotting import plot_multiple_feature_matrices
import numpy as np

audio, sr = load_audio("audio.wav")

mfcc    = MFCCExtractor(n_mfcc=13)
melspec = MelSpectrogramExtractor(n_mels=64)

mfcc_feat = mfcc.extract(audio, sr)
mel_feat  = melspec.extract(audio, sr)

feature_results = {
    "MFCC (13 coeffs)":        (mfcc_feat,  np.linspace(0, len(audio)/sr, mfcc_feat.shape[0])),
    "Mel Spectrogram (64 bins)": (mel_feat, np.linspace(0, len(audio)/sr, mel_feat.shape[0])),
}

fig = plot_multiple_feature_matrices(audio, sr, feature_results)
```

![Feature matrix comparison](docs/images/feature_matrices.png)

*MFCC and mel spectrogram feature matrices for the same utterance. Color encodes feature value; brighter = higher activation.*

**Available extractors:** `mfcc`, `melspectrogram`, `hubert`, `sylber`, `vghubert`

```python
from findsylls.features import get_extractor

extractor = get_extractor("hubert")          # vanilla HuBERT base (layer 9)
features  = extractor.extract(audio, sr)     # shape: [T, 768]
```

---

### Embedding (`findsylls.embedding`)

Embedding wraps feature extraction + segmentation + pooling into a single call.

#### Single file

```python
from findsylls import embed_audio

embeddings, metadata = embed_audio(
    "audio.wav",
    segmentation="peakdetect",
    features="mfcc",
    pooling="mean",                          # mean | max | median | onc
    segmentation_kwargs={"envelope_method": "hilbert"},
    return_metadata=True,
)

print(embeddings.shape)                      # (n_syllables, 13)
print(metadata["num_syllables"])
print(metadata["boundaries"])                # [(start, end), ...]
```

#### Corpus

```python
from findsylls import embed_corpus, save_embeddings

results = embed_corpus(
    audio_files=["a.wav", "b.wav", "c.wav"],
    segmentation="peakdetect",
    features="mfcc",
    pooling="mean",
    segmentation_kwargs={"envelope_method": "hilbert"},
    n_jobs=4,
)

save_embeddings(results, "embeddings.npz")
```

#### Storage-backed corpus (large datasets)

For datasets that don't fit in RAM, write embeddings directly to disk:

```python
from findsylls.embedding import embed_corpus_to_storage

bundle = embed_corpus_to_storage(
    audio_files=["a.wav", "b.wav", ...],
    output_dir="./embeddings",
    segmentation="peakdetect",
    features="mfcc",
    pooling="mean",
    segmentation_kwargs={"envelope_method": "hilbert"},
)

print(f"Embedded {bundle['num_success']}/{bundle['num_files']} files")
# Writes: ./embeddings/embedding_manifest.csv + ./embeddings/000000_*.npz
```

#### Preset-based embedding

```python
from findsylls.embedding import EmbeddingPipeline

pipeline = EmbeddingPipeline(preset="sylber", pooling="mean")
embeddings, metadata = pipeline.embed_audio("audio.wav", return_metadata=True)
```

**Available pooling methods:** `mean`, `max`, `median`, `onc`

---

### Discovery (`findsylls.discovery`)

Discovery clusters syllable embeddings into unsupervised categories.

```python
from findsylls import embed_corpus, save_embeddings
from findsylls.discovery import DiscoveryPipeline
import numpy as np

# Embed a corpus
results = embed_corpus(audio_files=["a.wav", "b.wav", "c.wav"],
                       segmentation="peakdetect", features="mfcc", pooling="mean",
                       segmentation_kwargs={"envelope_method": "hilbert"})
embeddings = np.vstack([r["embeddings"] for r in results if r.get("success")])

# Cluster
pipeline = DiscoveryPipeline(method="kmeans", model_kwargs={"n_clusters": 50})
result   = pipeline.discover(embeddings)

print(result.labels)                          # cluster assignment per syllable
print(result.fit_metrics["silhouette"])
print(result.fit_metrics["davies_bouldin"])
```

#### Streaming clustering (corpus too large for RAM)

```python
from findsylls.embedding import embed_corpus_to_storage
from findsylls.discovery import DiscoveryPipeline

bundle = embed_corpus_to_storage(audio_files=[...], output_dir="./embeddings",
                                  segmentation="peakdetect", features="mfcc", pooling="mean",
                                  segmentation_kwargs={"envelope_method": "hilbert"})

pipeline = DiscoveryPipeline(method="minibatch_kmeans", model_kwargs={"n_clusters": 50})
result   = pipeline.discover_from_storage(manifest_path=bundle["manifest_path"])
```

**Memory comparison:**

| Approach | ~500K syllables × 768-D |
|---|---|
| `embed_corpus` + `vstack` + `KMeans` | ~10 GB RAM |
| `embed_corpus_to_storage` + `discover_from_storage` | ~500 MB RAM |

**Available methods:** `kmeans`, `minibatch_kmeans`, `agglomerative`

---

### Full Corpus Workflow (`findsylls.pipeline`)

`FindSyllsOrchestrator` and `discover_corpus` run the entire pipeline — embed, discover, build manifests — in one call:

```python
from findsylls import discover_corpus

result = discover_corpus(
    audio_files="data/**/*.wav",
    output_dir="./output",
    segmentation_method="peakdetect",
    features_method="mfcc",
    pooling_method="mean",
    discovery_method="kmeans",
    segmentation_kwargs={"envelope_method": "hilbert"},
)

print(result["corpus_manifest"])             # joined DataFrame
print(result["discovery_manifest_path"])
print(result["discovery_metrics"])
```

Or use the class directly:

```python
from findsylls.pipeline.orchestrator import FindSyllsOrchestrator

orch = FindSyllsOrchestrator()

# Single file: segment + embed
embeddings, metadata = orch.segment_and_embed_audio(
    "audio.wav",
    segmentation_method="peakdetect",
    features_method="mfcc",
    pooling_method="mean",
    segmentation_kwargs={"envelope_method": "hilbert"},
)
```

---

### Evaluation (`findsylls.evaluation`)

#### Evaluate segmentation against TextGrid annotations

```python
from findsylls import segment_audio, evaluate_segmentation

syllables, _, _ = segment_audio(
    "audio.wav",
    method="peakdetect",
    segmentation_kwargs={"envelope_method": "hilbert"},
)

peaks = [nucleus for _, nucleus, _ in syllables]
spans = [(start, end) for start, _, end in syllables]

metrics = evaluate_segmentation(
    peaks=peaks,
    spans=spans,
    textgrid_path="annotations.TextGrid",
    tiers={"phone": 2, "syllable": 1, "word": 0},
)

# Keys: nuclei, syllable_boundaries, syllable_spans, word_boundaries, word_spans
print(metrics["syllable_boundaries"])
# {'TP': 12, 'Ins': 2, 'Del': 1, 'Sub': 0, 'Precision': ..., 'Recall': ..., 'F1': ...}
```

#### Batch evaluation over a corpus

```python
from findsylls import run_evaluation

df = run_evaluation(
    textgrid_paths="data/**/*.TextGrid",
    wav_paths="data/**/*.wav",
    tiers={"phone": 2, "syllable": 1, "word": 0},
    method="peakdetect",
    segmentation_kwargs={"envelope_method": "hilbert"},
)

print(df.groupby("method")[["syllable_boundaries_f1", "word_spans_f1"]].mean())
```

#### Discovery label metrics

Connect cluster assignments to ground-truth TextGrid labels:

```python
from findsylls.evaluation import (
    attach_textgrid_labels_to_manifest,
    compute_discovery_label_metrics,
)

labeled = attach_textgrid_labels_to_manifest(
    manifest=corpus_manifest,
    file_manifest=file_manifest_df,
    wav_paths=["a.wav", "b.wav"],
    textgrid_paths=["a.TextGrid", "b.TextGrid"],
    textgrid_tier_index=2,                       # phone tier
)

metrics = compute_discovery_label_metrics(labeled)
print(f"Cluster purity:  {metrics['cluster_purity']:.3f}")
print(f"Label purity:    {metrics['label_purity']:.3f}")
print(f"Normalized MI:   {metrics['label_norm_mutual_info']:.3f}")
print(f"Macro F1:        {metrics['macro_f1']:.3f}")
```

#### Visualize evaluation results

```python
from findsylls import plot_segmentation_result

# df = output of run_evaluation(), file_id = stem of the audio file
fig, ax = plot_segmentation_result(
    df,
    file_id="SP20_117",
    envelope_fn="sbs",
    syll_tier=1,
    phone_tier=2,
    word_tier=0,
)
```

---

### Preset System (`findsylls.presets`)

Named presets bundle segmentation + feature + pooling configurations from published papers:

```python
from findsylls import get_preset, resolve_preset, list_presets

print(list_presets())
# ['sylber', 'vg_hubert_cls', 'vg_hubert_mincut']

cfg = get_preset("sylber")
# {'segmentation': 'greedy_cosine', 'features': 'sylber', 'pooling': 'mean', ...}

# Merge a preset with user overrides
cfg = resolve_preset("sylber", pooling="onc")

# Use directly with EmbeddingPipeline
from findsylls.embedding import EmbeddingPipeline
pipeline = EmbeddingPipeline(preset="sylber", pooling="mean")
```

---

## CLI

```bash
# Segment audio into syllable boundaries
findsylls segment audio.wav --envelope hilbert --method peakdetect --out syllables.json

# Batch evaluation against TextGrid annotations
findsylls evaluate "data/**/*.wav" "data/**/*.TextGrid" \
  --phone-tier 2 --syllable-tier 1 --word-tier 0 \
  --envelope hilbert --method peakdetect \
  --out results.csv --aggregate summary.csv
```

---

## Methods Reference

### Envelope methods
`rms` · `hilbert` · `lowpass` · `sbs` · `theta` · `cls_attention` · `greedy_cosine` · `mincut`

### Segmentation methods (dispatch strings)
`peakdetect` · `cls_attention` · `mincut` · `greedy_cosine`

### Preset segmenters (paper-replication classes)
`SBSPeakdetectSegmenter` · `ThetaOscillatorSegmenter` · `SylberSegmenter` · `VGHubertMinCutSegmenter` · `VGHubertCLSSegmenter`

### Feature extractors
`mfcc` · `melspectrogram` · `hubert` · `sylber` · `vghubert`

### Pooling methods
`mean` · `max` · `median` · `onc`

### Discovery methods
`kmeans` · `minibatch_kmeans` · `agglomerative`

---

## Preset Citations

Every preset segmenter ships with the full citation for its source paper. Access it programmatically without loading any model:

```python
from findsylls.segmentation.presets import list_segmenter_presets

for name, cls in list_segmenter_presets().items():
    print(f"[{name}]")
    print(cls.REFERENCE)
    print()
```

Or on an instance (useful when you already have the object):

```python
seg = ThetaOscillatorSegmenter()
seg.cite()
```

---

**Theta Oscillator** — Räsänen, Doyle & Frank (2018)

> Räsänen, O., Doyle, G., & Frank, M. C. (2018). "Pre-linguistic segmentation of speech into syllable-like units." *Cognition*, 171, 130–150. https://doi.org/10.1016/j.cognition.2017.11.003
>
> MATLAB implementation: https://github.com/orasanen/thetaOscillator

**Sylber** — Cho et al. (2025)

> Cho, C. J., Lee, N., Gupta, A., Agarwal, D., Chen, E., Black, A. W., & Anumanchipalli, G. K. (2025). "Sylber: Syllabic Embedding Representation of Speech from Raw Audio." *ICLR 2025*. https://arxiv.org/abs/2410.07168

**VG-HuBERT MinCut** — Peng et al. (2023)

> Peng, P., Shang, Z., Harwath, D., & others (2023). "Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model." *Interspeech 2023*. https://doi.org/10.21437/Interspeech.2023-1430
>
> Code: https://github.com/jasonppy/syllable-discovery

**VG-HuBERT CLS Attention** — Peng & Harwath (2022)

> Peng, P., & Harwath, D. (2022). "Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling." *Interspeech 2022*. https://doi.org/10.21437/Interspeech.2022-10631
>
> Code: https://github.com/jasonppy/word-discovery

---

## Citation

```bibtex
@misc{martinez2026findsyllslanguageagnostictoolkitsyllablelevel,
  title={findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding},
  author={Héctor Javier Vázquez Martínez},
  year={2026},
  eprint={2603.26292},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2603.26292},
}
```

## License

MIT. See [LICENSE](LICENSE).
