Metadata-Version: 2.4
Name: nophi-av
Version: 0.1.1
Summary: Detect and redact PHI/PII from audio/video recordings, built on nophi
License: MIT
Project-URL: Homepage, https://github.com/kshen3778/no-phi
Project-URL: Repository, https://github.com/kshen3778/no-phi
Keywords: phi,pii,redaction,audio,video,speech,privacy
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: nophi>=0.1
Requires-Dist: typer>=0.9
Requires-Dist: rich>=13
Requires-Dist: openpyxl>=3.1
Requires-Dist: whisperx==3.8.6
Requires-Dist: torch==2.8.0
Requires-Dist: torchaudio==2.8.0
Requires-Dist: pyannote.audio==4.0.4
Requires-Dist: pydub>=0.25
Requires-Dist: imageio-ffmpeg>=0.4
Requires-Dist: ultralytics>=8.3
Requires-Dist: opencv-python>=4.8
Provides-Extra: scrfd
Requires-Dist: insightface>=0.7; extra == "scrfd"
Requires-Dist: onnxruntime>=1.16; extra == "scrfd"
Dynamic: license-file

# nophi-av

Detect and redact **PHI/PII** in **audio/video recordings**, built on top of
[`nophi`](../nophi/).

`nophi-av` is an implementation of the [MedVidDeID](https://github.com/kbjohnson-penn/MedVidDeID)
pipeline with `nophi` as the text PHI-detection engine (in place of philter-ucsf).
It exposes the same CLI shape as `nophi` — `nophi-av scan …` instead of
`nophi scan …`.

## Pipeline

For each input recording:

1. **Extract audio** — for video, the audio track is demuxed to 16 kHz mono WAV
   (audio inputs are used directly).
2. **Transcribe** — [WhisperX](https://github.com/m-bain/whisperX) produces a
   transcript with **word-level timestamps** (Whisper + forced alignment).
3. **Detect PHI** — `nophi`'s text engine (`build_analyzer` / `scan_text`) runs
   over the transcript, so every `nophi` recognizer (names, drugs, addresses,
   dates, …) applies automatically.
4. **Map spans → time** — each detected character span is traced back to the
   words it covers, yielding a media time interval (see `nophi_av/detect.py`).
5. **Scrub audio** — each PHI interval is replaced with a **beep** (default) or
   **silence** (`--redact-mode`), preserving duration so video stays in sync.
6. **Mask faces** (video) — every face found by the detector is masked per
   `--mask-mode` (`box` solid fill by default, `pixelate`, `blur`, or `none` to
   skip), then the scrubbed audio is remuxed back onto the masked video. The
   detector is **swappable** via `--face-model` — YOLO by default, SCRFD as an
   alternative (see [Face detection models](#face-detection-models)).

Outputs per file: the redacted media (`<name>_redacted.<ext>`), a redacted
transcript (`<name>_transcript.txt`), and a styled Excel findings report with
entity type, replacement, and **time interval**.

> Only the audio/transcript path uses `nophi`. Face masking is a separate visual
> step that doesn't touch `nophi` — it masks every detected face (detection, not
> recognition), the conservative choice for PHI removal.

## Supported formats

- **Audio**: `.mp3 .wav .m4a .flac .aac .ogg`
- **Video**: `.mp4 .mov .avi .mkv .webm`

## Install

A single install includes everything — audio scrubbing and video face masking:

```bash
pip install -e packages/nophi     # the text engine it depends on
pip install -e packages/nophi-av  # everything: whisperx, torch, pydub, ultralytics, opencv, pyannote
```

To use the SCRFD face detector (`--face-model scrfd`), install the optional
extra, which adds `insightface` + `onnxruntime`:

```bash
pip install -e 'packages/nophi-av[scrfd]'
```

> This is a heavy install — `torch`, `ultralytics`, and `opencv` are all pulled
> in. **ffmpeg** must also be on your `PATH` (e.g. `brew install ffmpeg`).

The first scan downloads model weights (Whisper, the alignment model, nophi's
spaCy/scispaCy models, and the YOLO face model). Warm every cache up front with:

```bash
nophi-av download-models --whisper-model small
```

### Requirements & caveats

- **ffmpeg** must be on `PATH` (e.g. `brew install ffmpeg`). WhisperX uses it to
  decode audio, and pydub uses it to encode `.mp3`/`.m4a` output. `.wav` in/out
  avoids any external ffmpeg dependency.
- **Face model** — see [Face detection models](#face-detection-models) below for
  the available detectors and how `--face-model` resolves them. Use
  `--mask-mode none` to skip face masking entirely.

## Face detection models

Face masking is **modular**: the pipeline only consumes per-frame bounding boxes,
so the detector behind `--face-model` is swappable. Run `nophi-av list-models`
to see the catalog, or pick from:

| `--face-model` | Backend | Notes |
| --- | --- | --- |
| `yolov12s-face.pt` *(default)* | YOLO (Ultralytics) | Community YOLOv12-s face weight — balanced. |
| `yolov8n-face.pt` | YOLO | Smallest/fastest, lower recall. |
| `yolov11m-face.pt` | YOLO | Larger, higher recall. |
| `scrfd` | SCRFD (insightface) | SCRFD-10G — strong on small/tilted/profile faces. |
| `scrfd-s` | SCRFD | Lighter SCRFD-500M variant. |

How a spec is resolved:

- **YOLO** — the default backend. Community face weights (`yolov{8,10,11,12}{n,s,m,l}-face.pt`)
  aren't official Ultralytics assets, so nophi-av fetches them from the
  [`yolo-face`](https://github.com/YapaLab/yolo-face) release and caches them in
  Ultralytics' weights dir on first use. You can also pass a **local checkpoint
  path** or an official Ultralytics name (e.g. `yolov8n.pt`, which detects people,
  not faces). Requires `ultralytics>=8.3` for the YOLOv12 architecture (included).
- **SCRFD** — selected by `--face-model scrfd` (or `scrfd-s`). insightface
  downloads and caches the model pack on first use. Needs the optional extra:
  `pip install 'nophi-av[scrfd]'`.

For PHI de-identification, recall matters more than precision (a missed face is a
leak; an over-masked box is harmless), so SCRFD is a good upgrade when small or
non-frontal faces are being missed.

**Adding your own backend:** subclass `FaceDetector` in
[`nophi_av/detectors.py`](nophi_av/detectors.py), register it in `_BACKENDS`, and
add a `ModelInfo` entry to `FACE_MODELS` so it appears in `list-models`. Nothing
in `video.py` needs to change — it only calls `detector.detect(frame, conf)`.

## Usage

```bash
# Audio: bleep spoken PHI, write redacted audio + transcript + report
nophi-av scan interview.mp3 -o ./cleaned

# Audio, mute (instead of beep) PHI, larger Whisper model for accuracy
nophi-av scan interview.wav --redact-mode silence --whisper-model medium -o ./cleaned

# A whole folder; detect only names & dates; report without writing media
nophi-av scan ./recordings -e PERSON,DATE_TIME --dry-run

# Map known participants to stable IDs instead of <PERSON>
nophi-av scan interview.mp3 -m participants.csv -o ./cleaned

# Video: black-box faces + bleep spoken PHI (box is the default)
nophi-av scan visit.mp4 -o ./cleaned   # uses default yolov12s-face (auto-downloaded)

# Video, pixelate faces instead of a solid box
nophi-av scan visit.mp4 --mask-mode pixelate -o ./cleaned

# Video, use the SCRFD detector instead of YOLO (needs the [scrfd] extra)
nophi-av scan visit.mp4 --face-model scrfd -o ./cleaned

# Video, audio scrub + remux only (skip face masking)
nophi-av scan visit.mp4 --mask-mode none -o ./cleaned

# See the available face-detection models
nophi-av list-models
```

Bare invocation works too: `nophi-av interview.mp3` is shorthand for
`nophi-av scan interview.mp3`.

### Example

Scanning a ~49s clinical-intake recording with `--whisper-model base` on CPU:

```
Entity type    Count   Examples (with timestamps)
-----------    -----   --------------------------
PERSON           2     James Carter (00:17.7), Emily Rivera (00:24.7)
DATE_TIME        1     October 24th, 1999 (00:33.8)
LOCATION/ORG     1     455 University Avenue (00:41.8)
```

→ `cleaned/interview_redacted.mp3` (PHI windows beeped, duration preserved),
`cleaned/interview_transcript.txt` (*"My name is Dr. `<PERSON>` … I live at
`<ORGANIZATION>`, apartment 3B."*), and `phi_av_report.xlsx`.

### Key options

| Option | Description |
| --- | --- |
| `--redact-mode beep\|silence` | How to remove PHI from audio (default `beep`) |
| `--beep-hz` | Beep tone frequency (default 1000) |
| `--mask-mode box\|pixelate\|blur\|none` | How to mask faces in video (default `box`; `none` skips) |
| `--face-model` | Face detector: a YOLO weight name/path or `scrfd` (see `list-models`) |
| `--whisper-model` | WhisperX size: `tiny/base/small/medium/large-v3` |
| `--device auto\|cpu\|cuda` | Compute device for ASR / face masking |
| `--padding` | Seconds added around each bleeped interval (default 0.15) |
| `--entities`, `--mappings`, `--exclude`, `--dry-run` | Same semantics as `nophi` |

## Development

The transcript↔time mapping core (`nophi_av/detect.py`) is ML-free and unit
tested without torch/whisperx:

```bash
pytest packages/nophi-av/tests
```
