Metadata-Version: 2.4
Name: av-analysis-toolbox
Version: 0.1.0
Summary: Installable audio, video, and audio-visual analysis toolbox
Author-email: Peng <yanp@zhaw.ch>
Maintainer-email: Peng <yanp@zhaw.ch>
License-Expression: MIT
Project-URL: Homepage, https://github.com/yanpeng0520/av-toolbox
Project-URL: Repository, https://github.com/yanpeng0520/av-toolbox
Project-URL: Documentation, https://github.com/yanpeng0520/av-toolbox/tree/main/docs
Project-URL: Issues, https://github.com/yanpeng0520/av-toolbox/issues
Project-URL: Changelog, https://github.com/yanpeng0520/av-toolbox/releases
Keywords: audio,video,audio-visual,analysis,computer-vision,ffmpeg,denseav
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Multimedia :: Video
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pyyaml>=6.0
Provides-Extra: video
Requires-Dist: numpy>=1.24; extra == "video"
Requires-Dist: opencv-python-headless>=4.8; extra == "video"
Provides-Extra: audio
Requires-Dist: librosa>=0.10; extra == "audio"
Requires-Dist: soundfile>=0.12; extra == "audio"
Provides-Extra: transcription
Requires-Dist: faster-whisper>=1.0; extra == "transcription"
Provides-Extra: pose
Requires-Dist: mediapipe>=0.10; extra == "pose"
Provides-Extra: action
Requires-Dist: torch>=2.0; extra == "action"
Requires-Dist: torchvision>=0.15; extra == "action"
Requires-Dist: pytorchvideo>=0.1; extra == "action"
Requires-Dist: fvcore>=0.1.5; extra == "action"
Requires-Dist: iopath>=0.1.10; extra == "action"
Provides-Extra: vision-models
Requires-Dist: ultralytics>=8.0; extra == "vision-models"
Requires-Dist: pillow>=9.0; extra == "vision-models"
Requires-Dist: torch>=2.0; extra == "vision-models"
Requires-Dist: torchvision>=0.15; extra == "vision-models"
Requires-Dist: transformers>=4.40; extra == "vision-models"
Provides-Extra: cut-detection
Requires-Dist: transnetv2-pytorch>=1.0; extra == "cut-detection"
Requires-Dist: scenedetect>=0.6; extra == "cut-detection"
Provides-Extra: av
Requires-Dist: av>=12.0; extra == "av"
Provides-Extra: torch
Requires-Dist: torch>=2.0; extra == "torch"
Requires-Dist: torchvision>=0.15; extra == "torch"
Requires-Dist: torchaudio>=2.0; extra == "torch"
Provides-Extra: denseav
Requires-Dist: av>=12.0; extra == "denseav"
Requires-Dist: pillow>=9.0; extra == "denseav"
Requires-Dist: torch>=2.0; extra == "denseav"
Requires-Dist: torchvision>=0.15; extra == "denseav"
Requires-Dist: torchaudio>=2.0; extra == "denseav"
Provides-Extra: web
Requires-Dist: streamlit>=1.35; extra == "web"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Requires-Dist: tomli>=2.0; python_version < "3.11" and extra == "dev"
Provides-Extra: all
Requires-Dist: numpy>=1.24; extra == "all"
Requires-Dist: opencv-python-headless>=4.8; extra == "all"
Requires-Dist: librosa>=0.10; extra == "all"
Requires-Dist: soundfile>=0.12; extra == "all"
Requires-Dist: faster-whisper>=1.0; extra == "all"
Requires-Dist: av>=12.0; extra == "all"
Requires-Dist: pillow>=9.0; extra == "all"
Requires-Dist: torch>=2.0; extra == "all"
Requires-Dist: torchvision>=0.15; extra == "all"
Requires-Dist: torchaudio>=2.0; extra == "all"
Requires-Dist: pytorchvideo>=0.1; extra == "all"
Requires-Dist: fvcore>=0.1.5; extra == "all"
Requires-Dist: iopath>=0.1.10; extra == "all"
Requires-Dist: ultralytics>=8.0; extra == "all"
Requires-Dist: transformers>=4.40; extra == "all"
Requires-Dist: mediapipe>=0.10; extra == "all"
Requires-Dist: transnetv2-pytorch>=1.0; extra == "all"
Requires-Dist: scenedetect>=0.6; extra == "all"
Requires-Dist: streamlit>=1.35; extra == "all"
Requires-Dist: pytest>=7.0; extra == "all"
Requires-Dist: ruff>=0.1; extra == "all"
Dynamic: license-file

# av-toolbox

[![PyPI version](https://badge.fury.io/py/av-analysis-toolbox.svg)](https://pypi.org/project/av-analysis-toolbox/)
[![Python Versions](https://img.shields.io/pypi/pyversions/av-analysis-toolbox.svg)](https://pypi.org/project/av-analysis-toolbox/)
[![Build Status](https://github.com/yanpeng0520/av-toolbox/actions/workflows/ci.yml/badge.svg)](https://github.com/yanpeng0520/av-toolbox/actions)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Upload a video, get visual/audio/AV diagnostics with overlay videos.

**Live demo:** [demo.yan-peng.com](https://demo.yan-peng.com) - choose a Video, Audio, or Audio-Visual tool; use the sample clip or upload a short non-sensitive file; then view/download the overlay MP4, metrics, and artifacts.

`av-toolbox` is an installable audio, video, and audio-visual analysis toolbox with one Python registry, one CLI, and a Streamlit demo UI.

PyPI distribution name: `av-analysis-toolbox` (the import package remains `av_toolbox`, and the CLI remains `av-toolbox`).

## Tool Catalog

See [docs/tool-catalog.md](docs/tool-catalog.md) for detailed per-tool
instructions, CLI and Python examples, UI notes, generated config files, input
types, output artifacts, optional dependency extras, and GPU/model requirements.

## Overlay Examples

The overlays below are rendered on demo footage from [this YouTube video](https://www.youtube.com/watch?v=THjXkQLy4wE). All rights to the original footage remain with its creator; it is included here for demonstration only. See [Credits](#credits).

**Video editing**

| [Cut Detection](docs/tool-catalog.md#video-cut-detection) | [Shot Type](docs/tool-catalog.md#video-shot-type) |
| --- | --- |
| ![Cut detection](docs/assets/gallery/video-cut-detection.gif) | ![Shot type](docs/assets/gallery/video-shot-type.gif) |

**Video quality**

| [Image Quality](docs/tool-catalog.md#video-image-quality) | [Camera Shake](docs/tool-catalog.md#video-camera-shake) |
| --- | --- |
| ![Image quality](docs/assets/gallery/video-quality.gif) | ![Camera shake](docs/assets/gallery/video-camera-shake.gif) |

**Motion detection**

| [Motion](docs/tool-catalog.md#video-motion) | [Optical Flow](docs/tool-catalog.md#video-optical-flow) | [Foreground Motion](docs/tool-catalog.md#video-foreground-motion) |
| --- | --- | --- |
| ![Motion](docs/assets/gallery/video-motion.gif) | ![Optical flow](docs/assets/gallery/video-optical-flow.gif) | ![Foreground motion](docs/assets/gallery/video-foreground-motion.gif) |

**Object and action understanding**

| [Object Detection](docs/tool-catalog.md#video-object-detection) | [Segmentation](docs/tool-catalog.md#video-segmentation) | [Action Recognition](docs/tool-catalog.md#video-action-recognition) | [Pose Detection](docs/tool-catalog.md#video-pose) |
| --- | --- | --- | --- |
| ![Object detection](docs/assets/gallery/video-object-detection.gif) | ![Segmentation](docs/assets/gallery/video-segmentation.gif) | ![Action recognition](docs/assets/gallery/video-action-recognition.gif) | ![Pose detection](docs/assets/gallery/video-pose.gif) |

**Audio tools**

| [Beat Detection](docs/tool-catalog.md#audio-beat-detection) | [Audio Energy](docs/tool-catalog.md#audio-energy) | [Audio Events](docs/tool-catalog.md#audio-event-detection) |
| --- | --- | --- |
| ![Beat detection](docs/assets/gallery/audio-beat-detection.gif) | ![Audio energy](docs/assets/gallery/audio-energy.gif) | ![Audio events](docs/assets/gallery/audio-event-detection.gif) |

**Audio-visual foundation model**

| [DenseAV](docs/tool-catalog.md#av-denseav) on CatFu |
| --- |
| ![DenseAV on CatFu](docs/assets/gallery/av-denseav.gif) |

## Happy Path: Local Install And Demo

This path works from a fresh clone without private media, cloud services, or model checkpoints. It needs Python 3.10+ and FFmpeg on `PATH`, installs the local UI plus the lightweight audio/video tools, generates a small synthetic demo clip, runs one CLI tool, and starts the web UI.

```bash
git clone https://github.com/yanpeng0520/av-toolbox.git
cd av-toolbox
python -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install -e ".[web,audio,video]"

av-toolbox generate-demo-media --output-dir data_segments --duration 12

av-toolbox video motion \
  data_segments/synthetic_hiphop_60s.mp4 \
  --output outputs/motion_demo \
  --sample-fps 5 \
  --max-seconds 8

av-toolbox serve \
  --host 127.0.0.1 \
  --port 8501 \
  --output-root outputs/web_runs
```

Open `http://127.0.0.1:8501`, choose a tool, use the generated sample or upload a short local clip, and inspect the overlay, transcript/metrics, and downloadable artifacts in Results.

## Optional Model Tools

Install heavier extras only for the tools you plan to run:

```bash
# YOLO object detection/segmentation/pose and shot-type classification
python -m pip install -e ".[vision-models]"

# TransNetV2/PySceneDetect cut detection backends
python -m pip install -e ".[cut-detection]"

# PyTorchVideo action recognition
python -m pip install -e ".[action]"

# faster-whisper transcription
python -m pip install -e ".[transcription]"
```

DenseAV is a separate heavyweight install because it needs the DenseAV Git package and checkpoint setup:

```bash
python -m pip install -e ".[denseav]"
python -m pip install "git+https://github.com/mhamilton723/DenseAV.git"
```

## GPU And Model Cache

Classical tools run on CPU. Model-backed tools can use GPU when their PyTorch/accelerator stack is installed and the tool supports it.

Recommended cache setup:

```bash
export AV_TOOLBOX_CACHE_DIR=/mnt/models/av_toolbox_cache
mkdir -p "$AV_TOOLBOX_CACHE_DIR/weights"
```

You can also pass `--cache-dir` through CLI/runtime options. The default cache is under `~/.cache/av_toolbox/weights`; if that directory is root-owned or unwritable, set `AV_TOOLBOX_CACHE_DIR` to a writable path before running model-backed tools.

DenseAV checkpoints require explicit setup. See [docs/denseav.md](docs/denseav.md).

## Developer Docs

- [Developer README](docs/developer-readme.md): local development, CLI examples, tests, Docker, Python API, and web UI commands.
- [Tool catalog](docs/tool-catalog.md): registered tools, CLI wrappers, inputs, outputs, and runtime controls.
- [DenseAV setup](docs/denseav.md): optional DenseAV dependencies, checkpoint names, cache paths, and GPU flags.

## Contributing

Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for the dev
setup, how to add a tool, the overlay style guide, and PR expectations. To report
a security issue, follow [SECURITY.md](SECURITY.md) (please do not open a public
issue).

## Credits

- Demo/sample footage is sourced from [this YouTube video](https://www.youtube.com/watch?v=THjXkQLy4wE) and used solely to demonstrate the tools' overlays. All rights to the original footage belong to its creator. The `av-toolbox` source code is licensed separately under the [MIT License](LICENSE).
