Metadata-Version: 2.3
Name: sonata-asr
Version: 0.0.7
Summary: SONATA: SOund and Narrative Advanced Transcription Assistant
License: GPL-3.0-or-later
Keywords: audio,transcription,speech,emotion,ASR,whisper
Author: hwk06023
Author-email: hwk06023@github.com
Requires-Python: >=3.8,<4.0
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: hf_xet (>=0.1.0)
Requires-Dist: huggingface_hub (>=0.12.0)
Requires-Dist: librosa (>=0.9.0)
Requires-Dist: matplotlib (>=3.5.0)
Requires-Dist: numpy (>=1.20.0)
Requires-Dist: pydub (>=0.25.1)
Requires-Dist: scikit-learn (>=1.0.0)
Requires-Dist: scipy (>=1.7.0)
Requires-Dist: soundfile (>=0.10.3)
Requires-Dist: torch (>=1.10.0)
Requires-Dist: torchaudio (>=0.10.0)
Requires-Dist: tqdm (>=4.62.0)
Requires-Dist: transformers (>=4.25.1)
Requires-Dist: whisperx (>=3.1.0)
Project-URL: Bug Tracker, https://github.com/hwk06023/SONATA/issues
Project-URL: Documentation, https://github.com/hwk06023/SONATA
Project-URL: Repository, https://github.com/hwk06023/SONATA
Description-Content-Type: text/markdown

# SONATA 🎵🔊

[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![PyPI version](https://badge.fury.io/py/sonata-asr.svg)](https://badge.fury.io/py/sonata-asr)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![GitHub stars](https://img.shields.io/github/stars/hwk06023/SONATA?style=social)](https://github.com/hwk06023/SONATA/stargazers)

**SOund and Narrative Advanced Transcription Assistant**

SONATA is an advanced Automatic Speech Recognition (ASR) system that captures the symphony of human expression by recognizing and transcribing both verbal content and emotive sounds.

## ✨ Features

- 🎙️ High-accuracy speech-to-text transcription using WhisperX
- 😀 Recognition of 523+ emotive sounds and non-verbal cues
- 🌍 Multi-language support with 10 languages
- 👥 Speaker diarization for multi-speaker transcription
- ⏱️ Rich timestamp information at the word level
- 🔄 Audio preprocessing capabilities

[📚 See detailed features documentation](docs/FEATURES.md)

## 🚀 Installation

Install the package from PyPI:

```bash
pip install sonata-asr
```

Or install from source:

```bash
git clone https://github.com/hwk06023/SONATA.git
cd SONATA
pip install -e .
```

## 📖 Quick Start

### Basic Transcription

```python
from sonata.core.transcriber import IntegratedTranscriber

# Initialize the transcriber
transcriber = IntegratedTranscriber(asr_model="large-v3", device="cpu")

# Transcribe an audio file
result = transcriber.process_audio("path/to/audio.wav", language="en")
print(result["integrated_transcript"]["plain_text"])
```

### CLI Usage

```bash
# Basic usage
sonata-asr path/to/audio.wav

# With speaker diarization
sonata-asr path/to/audio.wav --diarize --hf-token YOUR_HUGGINGFACE_TOKEN
```

[📚 See full usage documentation](docs/USAGE.md)  
[⌨️ See complete CLI documentation](docs/CLI.md)

## 🗣️ Supported Languages

SONATA supports 10 languages including English, Korean, Chinese, Japanese, French, German, Spanish, Italian, Portuguese, and Russian.

[🌐 See languages documentation](docs/LANGUAGES.md)

## 🔊 Audio Event Detection

SONATA can detect over 500 different audio events, from laughter and applause to ambient sounds and music.

[🎵 See audio events documentation](docs/AUDIO_EVENTS.md)

## 🛣️ Roadmap

- 🌐 Enhanced multilingual support
- 🧠 Advanced ASR model diversity
- 😢 Improved emotive detection
- 🔊 Better speaker diarization
- ⚡ Performance optimization

[📋 See full roadmap](docs/ROADMAP.md)

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

[📝 See contribution guidelines](docs/CONTRIBUTING.md)

## 📄 License

This project is licensed under the GNU General Public License v3.0.

## 🙏 Acknowledgements

- [WhisperX](https://github.com/m-bain/whisperX) - Fast speech recognition
- [AudioSet AST](https://github.com/YuanGongND/ast) - Audio event detection
- [PyAnnote Audio](https://github.com/pyannote/pyannote-audio) - Speaker diarization
- [HuggingFace Transformers](https://github.com/huggingface/transformers) - NLP tools
