Metadata-Version: 2.1
Name: tokensynth
Version: 0.0.3
Summary: tokensynth
Author-Email: Kyungsu Kim <kyungsu.kim@snu.ac.kr>
License: MIT
Project-URL: Repository, https://github.com/KyungsuKim42/tokensynth
Requires-Python: >=3.10
Requires-Dist: audiofile
Requires-Dist: appdirs
Requires-Dist: descript-audio-codec
Requires-Dist: huggingface-hub
Requires-Dist: laion-clap==1.1.6
Requires-Dist: librosa
Requires-Dist: numpy
Requires-Dist: pretty-midi
Requires-Dist: pytest
Requires-Dist: setuptools
Requires-Dist: torch<2.6.0,>=2.0.0
Requires-Dist: torchaudio>=2.0.0
Requires-Dist: torchvision>=0.15.0
Requires-Dist: tqdm
Requires-Dist: transformers
Description-Content-Type: text/markdown

# TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument


This is the **official implementation** of "TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument", accepted to **ICASSP 2025** (in press).

## Installation

To install TokenSynth, simply run:

```bash
pip install tokensynth
```

## Quickstart

```python
from tokensynth import TokenSynth, CLAP, DACDecoder
import audiofile
import torch

# Set file paths
ref_audio = "media/reference_audio.wav"
midi = "media/input_midi.mid"

# Initialize models
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
synth = TokenSynth.from_pretrained(aug=True)
clap = CLAP(device=device)
decoder = DACDecoder(device=device)

with torch.no_grad():
    # Extract timbre embeddings from audio and text
    timbre_audio = clap.encode_audio(ref_audio)
    timbre_text = clap.encode_text("warm smooth electronic bass")
    timbre_audio_text = 0.5 * timbre_audio + 0.5 * timbre_text

    # Generate audio tokens
    tokens_audio = synth.synthesize(timbre_audio, midi, top_k=10)
    tokens_text = synth.synthesize(timbre_text, midi, top_p=0.6, guidance_scale=1.6)
    tokens_audio_text = synth.synthesize(timbre_audio_text, midi, top_p=0.6, guidance_scale=1.6)

    # Decode tokens into audio waveforms
    audio_audio = decoder.decode(tokens_audio) 
    audio_text = decoder.decode(tokens_text)
    audio_audio_text = decoder.decode(tokens_audio_text)

# Save audio files
audiofile.write("media/output_audio.wav", audio_audio.cpu().numpy(), 16000)
audiofile.write("media/output_text.wav", audio_text.cpu().numpy(), 16000)
audiofile.write("media/output_audio_text.wav", audio_audio_text.cpu().numpy(), 16000)
```

You can also run `python quickstart.py` from the project root directory.

## Citation

A formal citation (BibTeX) will be available once this work is published.

For now, please cite this repository as:

```markdown
Kyungsu Kim, Junghyun Koo, Sungho Lee, Haesun Joung, Kyogu Lee.
TokenSynth: A Token-Based Neural Synthesizer for Instrument Cloning and Text-to-Instrument.
GitHub repository, 2024. Available at: https://github.com/kyungsukim42/tokensynth
```