Metadata-Version: 2.4
Name: audio_xai_fragility
Version: 0.0.5
Summary: A project about Audio models and it's fragility
Project-URL: bugs, https://github.com/cncPomper/Audio-XAI-Fragility/issues
Project-URL: changelog, https://github.com/cncPomper/Audio-XAI-Fragility/releases
Project-URL: documentation, https://cncPomper.github.io/Audio-XAI-Fragility/
Project-URL: homepage, https://github.com/cncPomper/Audio-XAI-Fragility
Author-email: Piotr Kitłowski <piotr.kitlowski@gmail.com>
Maintainer-email: Piotr Kitłowski <piotr.kitlowski@gmail.com>
License: MIT
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: ipython
Requires-Dist: jupyterlab
Requires-Dist: loguru
Requires-Dist: matplotlib
Requires-Dist: mkdocs
Requires-Dist: notebook
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: pip
Requires-Dist: pre-commit
Requires-Dist: pytest
Requires-Dist: python-dotenv
Requires-Dist: rich
Requires-Dist: scikit-learn
Requires-Dist: tensorboard
Requires-Dist: torch
Requires-Dist: torchaudio
Requires-Dist: torchvision
Requires-Dist: tqdm
Requires-Dist: typer
Description-Content-Type: text/markdown

# Reference repo

# The real repo is here [Audio-XAI](https://github.com/cncPomper/Audio-XAI)

# Milestone

- 02.04.2026: acquiring access Athena super computer (GPU A100) 🌐🎉🎉

[![PyPI version](https://badge.fury.io/py/Audio-XAI-Fragility.svg)](https://badge.fury.io/py/Audio-XAI-Fragility)

A project about Audio models and it's fragility

* [GitHub](https://github.com/cncPomper/Audio-XAI-Fragility/) | [PyPI](https://pypi.org/project/Audio-XAI-Fragility/) | [Documentation](https://cncPomper.github.io/Audio-XAI-Fragility/)
* Created by [Piotr Kitłowski](https://github.com/cncPomper) | GitHub [@cncPomper](https://github.com/cncPomper) | PyPI [@pkitlo](https://pypi.org/user/pkitlo/)
* MIT License

## Features

* TODO

## Documentation

Documentation is built with [Zensical](https://zensical.org/) and deployed to GitHub Pages.

* **Live site:** https://cncPomper.github.io/Audio-XAI-Fragility/
* **Preview locally:** `just docs-serve` (serves at http://localhost:8000)
* **Build:** `just docs-build`

API documentation is auto-generated from docstrings using [mkdocstrings](https://mkdocstrings.github.io/).

Docs deploy automatically on push to `master` via GitHub Actions. To enable this, go to your repo's Settings > Pages and set the source to **GitHub Actions**.

## Development

To set up for local development:

```bash
# Clone your fork
git clone git@github.com:your_username/Audio-XAI-Fragility.git
cd Audio-XAI-Fragility

# Install in editable mode with live updates
uv tool install --editable .
```

This installs the CLI globally but with live updates - any changes you make to the source code are immediately available when you run `audio_xai_fragility`.

Run tests:

```bash
uv run pytest
```

Run quality checks (format, lint, type check, test):

```bash
just qa
```

## Author

Audio XAI Fragility was created in 2026 by Piotr Kitłowski.

Built with [Cookiecutter](https://github.com/cookiecutter/cookiecutter) and the [audreyfeldroy/cookiecutter-pypackage](https://github.com/audreyfeldroy/cookiecutter-pypackage) project template.

## 1. General Information and Project Objective
The main objective of the project is to investigate the perceptual fragility of explanations (XAI methods) for deep learning models in the audio domain while keeping predictions unchanged.

## 2. Planned scope of experiments

- Datasets: Public datasets such as the Speech Commands Dataset (speech) and Sonics (synthetic/real music) will be used. The project will strictly ensure the immutability of the original data.
- Research models: Utilization and adaptation of audio recognition architectures: Audio Spectrogram Transformer, VGGish, Spectra, and ViT.
- XAI methods: Investigation of the vulnerability of gradient-based methods such as Grad-CAM and Integrated Gradients.
- Perceptual constraints: Instead of optimizing attacks against standard metrics, perceptual metrics will be considered (PESQ and STOI for speech, PEAQ for music).
- Computational resources and training: The project will require hardware acceleration (GPUs with a minimum of 16 GB VRAM). The estimated training and fine-tuning time for the base models is approximately 15 hours, while the main process of optimizing perceptual perturbations (XAI attack) for the entire test set is estimated to take an additional 25–30 hours of computation.

## 3. Planned Program Features

- **Classification and Attribution Module**: Reading models and generating explanation maps for them.
- **Perturbation module**: Generating subtle modifications to the audio signal with optimization that preserves high perceptual metrics (e.g., maintaining a PESQ score above 4.0).
- **Deployment and Automation**: Scripted building, testing, and deployment of applications using tools such as just and Python scripts built with typer or argparse.
- **Final deliverables**: The project will include clear documentation, user instructions, and tests relevant to the project’s scope.

## 4. Planned Technology Stack

The project will implement a robust base structure, automatically generated by tools such as `cookiecutter` or `copier`.
- Environment management: Use of an isolated virtual environment managed by `uv` or `conda`.
- Code cleanliness: Enforced PEP8-compliant coding style with an increased line length limit. Syntax checking provided by an autoformatter (e.g., `black` or `ruff`) and a linter (`ruff`).
- Version control: Rigorous use of a code repository with the `conventional commits` specification implemented.
- Frameworks and AI: Implementation of learning logic in dedicated frameworks such as `PyTorch Lightning` in conjunction with `Huggingface` libraries. Code used for experiments will be continuously exported from `Jupyter Lab` notebooks into structured library code.
- Experiments and configuration: Tracking progress, metrics, and logs using the `Weights & Biases` or `Tensorboard` platform. The configuration of model parameters and experiments will be completely separated from the execution code.
- Documentation: Use of `mkdocs` to fast and simple write documentation


## 5. Project schedule
<div align="center">

| **Deadline dates** | **Planned scope of work and progress** |
| :---: | :--- |
| **30.03.2026 - 05.04.2026** | Repository configuration (Cookiecutter, Ruff, Uv). Defining the directory structure and ensuring that audio files remain immutable. |
| **06.04.2026 - 12.04.2026** | Connecting W&B/TensorBoard. Training base classifiers using the PyTorch Lightning framework. (Estimated resource requirements: 15 hours of GPU computation) |
| **13.04.2026 - 19.04.2026** | Implementation of explanation-generating (XAI) modules in clean code, after first exporting experiments from notebooks. Writing the first tests. |
| **20.04.2026 - 26.04.2026** | Separating configuration from executable code. Preparing baseline attacks on attribution maps using standard distance metrics. |
| **27.04.2026 - 03.05.2026** | Implementation of PESQ/STOI/PEAQ metric approximations directly into the attack optimization loop (generation of perceptual perturbations). |
| **04.05.2026 - 10.05.2026** | Launch of the main research experiments on a dedicated cluster. (Estimated resource requirements: 25–30 hours of GPU computing for iterative processes). |
| **11.05.2026 - 17.05.2026** | Scripting the execution of the entire experiment using the `just` tool and CLI libraries (e.g., `typer`). Aggregating tables containing the results. |
| **18.05.2026 - 24.05.2026** | Finalization of the work: creating documentation and clear instructions for using the finished system. Organizing the code in accordance with PEP8. Preparation of the paper(?) |

</div>

---

## 6. Bibliography Review

| **Paper**                | **Notes**                                                                 |
|--------------------------|---------------------------------------------------------------------------|
| [Interpretation of neural networks is fragile](https://arxiv.org/pdf/1710.10547) | TODO |
| [Explanations can be manipulated and geometry is to blame](https://arxiv.org/pdf/1906.07983) | TODO |
| [Constructing adversarial examples to investigate the plausibility of explanations in deep audio and image classifiers](https://link.springer.com/article/10.1007/s00521-022-07918-7) | TODO |
| [Perceptual Coding In Python](https://github.com/stephencwelch/Perceptual-Coding-In-Python) | TODO |
| [EVALUATING FAKE MUSIC DETECTION PERFORMANCE UNDER AUDIO AUGMENTATIONS](https://arxiv.org/pdf/2507.10447) | TODO |
