Metadata-Version: 2.4
Name: gsclipper
Version: 0.5.1
Summary: Clip long gunshot audio samples into single or windowed audio clips
Author-email: Ryan Quinn <ryan.quinn@certusinnovations.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/Stonewall-Defense/team-ml-gunshot-extractor
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy==2.4
Requires-Dist: scikit-maad==1.5
Requires-Dist: scipy==1.15
Requires-Dist: torch==2.10
Requires-Dist: AudioMlSpecTools==0.5
Requires-Dist: audio-tensor-plotter==0.4
Dynamic: license-file

# GSClipper: A naive gunshot audio parser to generate stand-alone clips

Clip long gunshot audio samples into single or windowed audio clips.

## Prerequisites

- Python 3.12 runtime
- Pip for package installation

## Installation

Install the dependencies into the environment with [pip](https://pypi.org/project/pip/):

```bash
pip install -r requirements.txt
```

## Usage

### Picking an Extractor

This library offers three different methods for identifying candidate gunshots, each developed when the previous one failed on a new use case.

To process somewhat "dirty" audio, the `SpectrumGunshotExtractor` may work well. It has been tested on these datasets to great effect:

- [Cadre Forensics](https://cadreforensics.com/audio/)
- [Gunshot audio dataset](https://www.kaggle.com/datasets/emrahaydemr/gunshot-audio-dataset)
- [Gunshot/Gunfire Audio Dataset](https://doi.org/10.5281/zenodo.7004819)

For various reasons, the `SpectrumGunshotExtractor` did not work well on a dataset derived from the [Free Firearm Sounds Library](https://opengameart.org/content/the-free-firearm-sound-library), nor our own initial field data. For those datasets, the `AmplitudeGunshotExtractor` was developed. It picks up about 99.9% of gunshots from the FFS library, but fails on many Cadre Forensics samples.

The `ImpulseGunshotExtractor` was added to fix shortcomings with the `AmplitudeGunshotExtractor` on certain live data, particularly:

- Windy days where the audio levels vary widely for the same platform/cartridge at the same mic over time
- Quieter platform/cartridge combinations (e.g., .22 LR or suppressed rounds)

Initial testing has shown the latter to be the most robust in the most circumstances, but no extraction method will be 100% accurate.

### Preprocessing

A distinctive feature of gunshots relative to other sounds, even ones very loud in the time domain, is that they have significant high-frequency components. Compare the spectra of the same data before and after high-pass filtering:

#### Raw Data

![Raw Data](docs/raw.png)

#### High-Pass Filtered Data

![Filtered Data](docs/filtered.png)

#### Existing Preprocessors

Currently only a high-pass filter (`HPFilter`) is available. The parameters in `examples/compare.py` work well in most cases.

### Post Processing

Real-world data will almost never be clean enough to automatically extract gunshots with perfect precision and recall. Setting extractor parameters too "tight" will lower recall and "leave data on the table." Conversely, setting them too leniently will allow incorrectly-labeled data into the dataset.

Post-processors allow a semi-automatic method for flagging potentially problematic data. They can be operated in two modes: `PRUNE` to automatically remove any potentially problematic data, and `FLAG` to provide a "vote of no confidence" on an extracted clip. The provided post-processors have unique strengths and weaknesses that are not easy to identify for a dataset _a priori_, and they often do not perfectly agree. Applying the post-processors as shown in `examples/compare.py` to a recent live dataset produced the following results:

| Votes | Count |
| ----- | ----- |
| 0 | 4914 |
| 1 | 341 |
| 2 | 651 |
| 3 | 383 |
| 4 | 345 |

Again, you will need to balance your requirements for precision and recall carefully.

### Examples

You can run the `examples/compare.py` script to ... compare ... the results of the extractors on live data collected under various conditions. The files in `audio/` come from three sources:

- `cadre-*.wav`: Cadre Forensics dataset (see above)
- `ffs-*.wav`: Free Firearm Sounds Library (see above)
- Everything else was collected by Certus Innovations

## Versioning

We use [SemVer](http://semver.org/) for versioning. For the versions available, see the [tags on this repository](https://github.com/Stonewall-Defense/team-ml-gunshot-extractor/tags).

## Authors

- **Ryan Quinn** - _Initial work_

## License

MIT.
