Metadata-Version: 2.1
Name: fast-whisper-diarizer
Version: 0.1.2
Summary: A package for audio transcription and speaker diarization using Whisper and NeMo toolkit
Home-page: https://github.com/salimkt/fast-whisper-diarizer
Author: Salim
Author-email: salimkt25@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: faster-whisper==1.1.0
Requires-Dist: ctranslate2==4.4.0
Requires-Dist: nemo-toolkit==2.1.0rc0
Requires-Dist: torch==2.5.1
Requires-Dist: torchaudio==2.5.1
Requires-Dist: omegaconf==2.3.0
Requires-Dist: nltk==3.9.1
Requires-Dist: wget==3.2
Requires-Dist: deepmultilingualpunctuation==1.0.1
Requires-Dist: demucs==4.0.1
Requires-Dist: numpy==1.26.4

# Fast Whisper Diarizer

Fast Whisper Diarizer is a Python package for audio transcription and speaker diarization using the Whisper model and NeMo toolkit.

## Installation

To install the package, run:

```sh
pip install fast-whisper-diarizer
```

## Usage

To use the `process_audio` function, you can follow the example below. This function allows you to process audio data for transcription and speaker diarization, accepting both file paths and in-memory bytes data as input.

### Example

```python
from fast_whisper_diarizer import process_audio

# Example usage with a file path
process_audio(
    audio_data="path/to/audio/file.wav",
    output_directory="path/to/output/directory",
    whisper_model_name="tiny.en",
    separate_vocals=True,
    processing_batch_size=8,
    language_code="en",
    suppress_numeric_tokens=True,
    computation_device="cuda"
)

# Example usage with in-memory bytes data
with open("path/to/audio/file.wav", "rb") as f:
    audio_bytes = f.read()

process_audio(
    audio_data=audio_bytes,
    output_directory="path/to/output/directory",
    whisper_model_name="tiny.en",
    separate_vocals=True,
    processing_batch_size=8,
    language_code="en",
    suppress_numeric_tokens=True,
    computation_device="cuda"
)
```

### Parameters
- **audio_data (str or bytes)**: The input audio, either as a file path (str) or in-memory bytes data (bytes).
- **output_directory (str, optional)**: The directory where output files will be saved. Defaults to the directory of the input file if not specified.
- **whisper_model_name (str)**: The name of the Whisper model to use for transcription. Defaults to "tiny.en".
- **separate_vocals (bool)**: Whether to perform vocal separation from the background music. Defaults to True.
- **processing_batch_size (int)**: The batch size for processing the audio. Defaults to 8.
- **language_code (str)**: The language code for transcription. Defaults to "en".
- **suppress_numeric_tokens (bool)**: Whether to suppress numeric tokens during transcription. Defaults to True.
- **computation_device (str)**: The device to use for computation, either "cuda" or "cpu". Defaults to "cuda" if available.

This function does not return a value but saves output files to the specified directory.
