Metadata-Version: 2.4
Name: vggsounder
Version: 0.1.1
Summary: A Python package for accessing VGGSounder dataset labels and metadata
Project-URL: Homepage, https://vggsounder.github.io/
Project-URL: Repository, https://github.com/Bizilizi/VGGSounder
Project-URL: Issues, https://github.com/Bizilizi/VGGSounder/issues
Author-email: Daniil Zverev <daniil.zverev@tum.de>
License-File: LICENCE
Keywords: audio,classification,dataset,machine-learning,vgg,video
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Multimedia :: Video :: Display
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Provides-Extra: dev
Requires-Dist: crowd-kit>=1.4.1; extra == 'dev'
Requires-Dist: jupyter>=1.1.1; extra == 'dev'
Requires-Dist: matplotlib>=3.10.5; extra == 'dev'
Requires-Dist: numpy>=2.3.2; extra == 'dev'
Requires-Dist: pandas>=2.3.1; extra == 'dev'
Requires-Dist: scikit-learn>=1.7.1; extra == 'dev'
Requires-Dist: tqdm>=4.67.1; extra == 'dev'
Description-Content-Type: text/markdown

<h1 align="center"><a href="https://vggsounder.github.io/static/workshop_paper.pdf">
VGGSounder: Audio-Visual Evaluations for Foundation Models</a></h1>
<h5 align="center"> If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏</h2>


<h5 align="center">

<!-- [![arXiv](https://img.shields.io/badge/Arxiv-2501.13106-AD1C18.svg?logo=arXiv)](https://arxiv.org/abs/2501.13106)  -->
[![Project page](https://img.shields.io/badge/Project_page-https-blue)](https://vggsounder.github.io) 
<br>

[![License](https://img.shields.io/badge/License-Apache%202.0-yellow)](https://github.com/DAMO-NLP-SG/VideoLLaMA3/blob/main/LICENSE) 
![Badge](https://hitscounter.dev/api/hit?url=https%3A%2F%2Fgithub.com%2FBizilizi%2Fvggsounder&label=HITs&icon=fire&color=%23198754)
[![GitHub issues](https://img.shields.io/github/issues/Bizilizi/vggsounder?color=critical&label=Issues)](https://github.com/Bizilizi/vggsounder/issues?q=is%3Aopen+is%3Aissue)
[![GitHub closed issues](https://img.shields.io/github/issues-closed/Bizilizi/vggsounder?color=success&label=Issues)](https://github.com/Bizilizi/vggsounder/issues?q=is%3Aissue+is%3Aclosed)
</h5>

## 📰 News

* **[11.06.2025]**  📃 Released technical report of VGGSounder. Contains detailed discussion on how we built the first multimodal benchmark for video tagging with complete per-modality annotations for every class.


## 🌟 Introduction
**VGGSounder** is a re-annotated benchmark built upon the [VGGSound dataset](https://www.robots.ox.ac.uk/~vgg/data/vggsound/), designed to rigorously evaluate audio-visual foundation models and understand how they utilize modalities. VGGSounder introduces:

- 🔍 Per-label modality tags (audible / visible / both) for all classes in the sample
- 🎵 Meta labels for background music, voice-over, and static images
- 📊 Multiple classes per one sample


## 🚀 Installation

The VGGSounder dataset is now available as a Python package! Install it via pip:

```bash
pip install vggsounder
```

Or install from source using uv:

```bash
git clone https://github.com/bizilizi/vggsounder.git
cd vggsounder
uv build
pip install dist/vggsounder-*.whl
```

## 🐍 Python Package Usage

### Quick Start

```python
import vggsounder

# Load the dataset
labels = vggsounder.VGGSounder()

# Access video data by ID
video_data = labels["--U7joUcTCo_000000"]
print(video_data.labels)        # List of labels for this video
print(video_data.meta_labels)   # Metadata (background_music, static_image, voice_over)
print(video_data.modalities)    # Modality for each label (A, V, AV)

# Get dataset statistics
stats = labels.stats()
print(f"Total videos: {stats['total_videos']}")
print(f"Unique labels: {stats['unique_labels']}")

# Search functionality
piano_videos = labels.get_videos_with_labels("playing piano")
voice_over_videos = labels.get_videos_with_meta(voice_over=True)
```

### Advanced Usage

```python
# Dict-like interface
print(len(labels))                    # Number of videos
print("video_id" in labels)           # Check if video exists
for video_id in labels:               # Iterate over video IDs
    video_data = labels[video_id]

# Get all unique labels
all_labels = labels.get_all_labels()

# Complex queries
static_speech_videos = labels.get_videos_with_meta(
    static_image=True, voice_over=True
)
```

## 🏷️ Label Format

VGGSounder annotations are stored in a CSV file located at `data/vggsounder.csv`. Each row corresponds to a single label for a specific video sample. The dataset supports **multi-label**, **multi-modal** classification with additional **meta-information** for robust evaluation.

### Columns

- **`video_id`**: Unique identifier for a 10-second video clip.
- **`label`**: Human-readable label representing a sound or visual category (e.g. `male singing`, `playing timpani`).
- **`modality`**: The modality in which the label is perceivable:
  - `A` = Audible
  - `V` = Visible
  - `AV` = Both audible and visible
- **`background_music`**: `True` if the video contains background music.
- **`static_image`**: `True` if the video consists of a static image.
- **`voice_over`**: `True` if the video contains voice-over narration.

### Example

| video_id           | label             | modality | background_music | static_image | voice_over |
|--------------------|------------------|----------|------------------|--------------|------------|
| `---g-f_I2yQ_000001` | `male singing`     | A        | True             | False        | False      |
| `---g-f_I2yQ_000001` | `people crowd`     | AV       | True             | False        | False      |
| `---g-f_I2yQ_000001` | `playing timpani`  | A        | True             | False        | False      |

## 📦 Publishing to PyPI

To publish this package to PyPI:

1. **Prepare your environment:**
   ```bash
   # Install uv if you haven't already
   curl -LsSf https://astral.sh/uv/install.sh | sh
   ```

2. **Build the package:**
   ```bash
   uv build
   ```

3. **Set up PyPI credentials:**
   - Create a PyPI account at https://pypi.org
   - Generate an API token in your PyPI account settings
   - Set the token: `export UV_PUBLISH_TOKEN=your_pypi_token`

4. **Publish to PyPI:**
   ```bash
   # Test on Test PyPI first (recommended)
   uv publish --index-url https://test.pypi.org/legacy/
   
   # Then publish to main PyPI
   uv publish
   ```

For more details, see the [uv publishing guide](https://docs.astral.sh/uv/guides/package/).


## 📑 Citation

If you find VGGSounder useful for your research and applications, please consider citing us using this BibTeX:

```bibtex
@article{zverevwiedemer2025vggsounder,
  author    = {Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu, Matthias Bethge, Wieland Brendel, A. Sophia Koepke},
  title     = {VGGSounder: Audio-Visual Evaluations for Foundation Models},
  year      = {2025},
}
```

## ❤️ Acknowledgement
The authors would like to thank [Felix Förster](https://www.linkedin.com/in/felix-f%C3%B6rster-316010235/?trk=public_profile_browsemap&originalSubdomain=de), [Sayak Mallick](https://scholar.google.fr/citations?user=L_0KSXUAAAAJ&hl=en), and [Prasanna Mayilvahananan](https://scholar.google.fr/citations?user=3xq1YcYAAAAJ&hl=en) for their help with data annotation, as well as [Thomas Klein](https://scholar.google.de/citations?user=3WfC0yMAAAAJ&hl=en) and [Shyamgopal Karthik](https://scholar.google.co.in/citations?user=OiVCfscAAAAJ&hl=en) for their help in setting up MTurk. They also thank numerous MTurk workers for labelling. This work was in part supported by the [BMBF](https://www.bmbf.de/DE/Home/home_node.html) (FKZ: 01IS24060, 01I524085B), the [DFG](https://www.dfg.de/) (SFB 1233, TP A1, project number: 276693517), and the [Open Philanthropy Foundation](https://www.openphilanthropy.org/) funded by the [Good Ventures Foundation](https://www.goodventures.org/). The authors thank the IMPRS-IS for supporting TW.


## 👮 License

This project is released under the Apache 2.0 license as found in the LICENSE file. Please get in touch with us if you find any potential violations. 