Metadata-Version: 2.4
Name: civic-ai-recap
Version: 0.11.0
Summary: Tools that capture public hearings, committee meetings, and symposiums from YouTube, then convert the recordings into searchable, analyzable transcripts.
Home-page: https://github.com/thoppe/Civic-AI-Recap/
Author: Travis Hoppe
Author-email: travis.hoppe+civic-ai-recap@gmail.com
License: CC-SA
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Other Audience
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Database
Classifier: Topic :: Office/Business
Classifier: Topic :: Scientific/Engineering
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: dspipe
Requires-Dist: wasabi
Requires-Dist: diskcache
Requires-Dist: numpy
Requires-Dist: yt-dlp
Requires-Dist: tiktoken
Requires-Dist: google-api-python-client
Requires-Dist: openai
Requires-Dist: isodate
Requires-Dist: rich
Requires-Dist: requests
Requires-Dist: tqdm
Requires-Dist: boto3
Requires-Dist: pydantic
Provides-Extra: transcription
Requires-Dist: openai-whisper; extra == "transcription"
Requires-Dist: faster-whisper; extra == "transcription"
Requires-Dist: silero-vad; extra == "transcription"
Requires-Dist: torch; extra == "transcription"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Civic-AI-Recap (CAIR)
Tools to digitize, transcribe, and analyze public hearings, committees, and symposiums on youtube.

CAIR also includes a small Granicus downloader for `MediaPlayer.php` clip URLs.

Install from PyPI:

```bash
pip install civic-ai-recap
```

Install with transcription dependencies:

```bash
pip install "civic-ai-recap[transcription]"
```

Granicus downloads require `ffmpeg` to be installed and available on `PATH`, or passed via `ffmpeg_path`.

Install from source:

```bash
git clone https://github.com/thoppe/Civic-AI-Recap/
cd Civic-AI-Recap
pip install .
```

The PyPI project name is `civic-ai-recap`, but the import remains `CAIR`.

Set required environment variables:

- `YOUTUBE_API_KEY` for fetching metadata via the YouTube Data API.
- `OPENAI_API_KEY` for LLM-powered analysis (used by `Analyze`).

The `transcription` extra installs Whisper, faster-whisper, Silero VAD, and Torch support.

Resolve a YouTube channel ID from a handle URL:

``` python
from CAIR import channel_id_from_url

channel_id = channel_id_from_url("https://www.youtube.com/@hanovercountyva")
print(channel_id)

'''
UCg0poGd4dTMOKXEXL4xPi4g
'''
```


``` python
from CAIR import Channel, Video, Transcription, Analyze

video_id = "P0rxq42sckU"
vid = Video(video_id)
channel = Channel(vid.channel_id)
uploads = channel.get_uploads()

print(vid.title)
print(channel.title, channel.n_videos)
print(uploads[["video_id", "title", "publishedAt"]].head())

'''
SEP 30, 2025 | City Council
City of San Jose, CA 1741
      video_id                                              title           publishedAt
0  h1sCi9oiBSc  NOV 6, 2025 | Police & Fire Department Retirem...  2025-11-08T07:05:34Z
1  4mvGLqa-G70                       NOV 18, 2025 | City Council  2025-11-05T22:27:04Z
2  BAvwrwjsnZM      18 NOVIEMBRE 2025 | Reunión del Ayuntamiento  2025-11-05T22:24:28Z
3  KGeDIw6vUDo  NOV 5, 2025 | Rules & Open Government/Committe...  2025-11-05T22:16:10Z
4  itaRH6GLzBw       4 NOVIEMBRE 2025 | Reunión del Ayuntamiento  2025-11-05T12:59:25Z
'''

f_audio = f"{video_id}.mp3"
vid.download_audio(f_audio)
df = Transcription().transcribe(f_audio, text_only=False)
print(df)

'''
      start    end                                               text
0      0.00  29.28                                         All right.
1     29.28  30.28                                    Good afternoon.
2     30.28  31.28                                 Welcome, everyone.
3     31.28  36.40  I'm pleased to call to order this meeting of t...
4     36.40  38.10                                 of September 30th.
...     ...    ...                                                ...
3682  19872.78  19874.34  I thought he was waiting to speak. Back to cou...
3683  19876.92  19879.92  Thank you. That concludes our meeting. Thank you.
3684  19881.48  19911.46                                         Thank you.
3685  19911.48  19912.20                                         Thank you.
'''

model = Analyze(model_name="gpt-5-mini")
text = model.preprocess_text(df)
summary = model(
    prompt=text,
    system_prompt="Provide a concise executive summary of this hearing.",
)
print(summary)

'''
1. Bottom Line Up Front (BLUF)
San Jose’s council advanced an ambitious, data-driven “Focus Area 2.0” performance model while
approving near-term actions with statewide implications: significant police labor concessions
to stabilize staffing, a city amicus joining litigation in defense of Planned Parenthood, an
ordinance limiting masked identities for law-enforcement/immigration agents, major downtown
land acquisition to preserve future convention/sports options, and a large subsidized downtown
workforce housing loan — all overlapping statewide priorities on public safety, homelessness,
housing affordability, labor enforcement, and immigrant/community trust.

2. Key State-Level Themes and Implications
- Homelessness and shelter operations are shifting from capacity-building to systems/integration
issues (throughput, CalAIM billing, HMIS integration, county coordination). San Jose’s
[...]
'''
```

Download a Granicus clip:

``` python
from CAIR import download_granicus_video, resolve_granicus_video_url

url = "https://loudoun.granicus.com/MediaPlayer.php?view_id=92&clip_id=1366"
resolved = resolve_granicus_video_url(url)
print(resolved.media_playlist_url)

result = download_granicus_video(url, output_dir="downloads")
print(result.output_path)
```

If `ffmpeg` is not installed, `download_granicus_video(...)` raises a `RuntimeError`
that explicitly tells you to install `ffmpeg` or pass `ffmpeg_path`.

Minimal local demo:

```bash
python demo_granicus_download.py
```

Channel metadata and videos can be accessed:

``` python
uploads = channel.get_uploads()
print(uploads[['video_id', 'title', 'publishedAt']])

'''
         video_id                                              title           publishedAt
0     h1sCi9oiBSc  NOV 6, 2025 | Police & Fire Department Retirem...  2025-11-08T07:05:34Z
1     4mvGLqa-G70                       NOV 18, 2025 |  City Council  2025-11-05T22:27:04Z
2     BAvwrwjsnZM       18 NOVIEMBRE 2025 | Reunión del Ayuntamiento  2025-11-05T22:24:28Z
3     KGeDIw6vUDo  NOV 5, 2025 | Rules & Open Government/Committe...  2025-11-05T22:16:10Z
4     itaRH6GLzBw        4 NOVIEMBRE 2025 | Reunión del Ayuntamiento  2025-11-05T12:59:25Z
...           ...                                                ...                   ...
1747  BV2WEzVDrLw                    Fireworks Prevention en Español  2016-11-04T16:43:10Z
1748  nQWZLit5Kn0       Fireworks Prevention with Firefighter Alfred  2016-11-04T16:41:21Z
1749  2jH3dEH8gK0              SJ Journey To Fiscal SustainabilityHD  2016-11-04T00:02:36Z
1750  i2I98YY8btQ                   Bike Sharing arrives in San José  2016-11-03T23:59:27Z
1751  BpJ911ynFN0                     Parks & Rec. 2013 Junior Games  2016-11-03T23:57:56Z

'''
```

`channel.get_metadata()`

```json
{
  "kind": "youtube#channelListResponse",
  "etag": "I-t6Dq6TbsrHZb-C8Tvw3iLjn-0",
  "pageInfo": {
    "totalResults": 1,
    "resultsPerPage": 5
  },
  "items": [
    {
      "kind": "youtube#channel",
      "etag": "4WmqmG5PoRLHq5DgHM_Iix4UEJE",
      "id": "UCeDiMzJEUbPgaruDcXnD4Cg",
      "snippet": {
        "title": "City of San Jose, CA",
        "description": "With almost one million residents, San José is one of the safest, and most diverse cities in the United States. It is Northern California’s largest city and the 13th largest in the nation. Colloquially known as the Capital of Silicon Valley, San José’s transformation into a global innovation center has resulted in one of the nation’s highest concentrations of technology companies and expertise in the world.",
        "customUrl": "@cityofsanjosecalifornia",
        "publishedAt": "2013-07-15T19:52:00Z",
        "localized": {
          "title": "City of San Jose, CA",
          "description": "With almost one million residents, San José is one of the safest, and most diverse cities in the United States. It is Northern California’s largest city and the 13th largest in the nation. Colloquially known as the Capital of Silicon Valley, San José’s transformation into a global innovation center has resulted in one of the nation’s highest concentrations of technology companies and expertise in the world."
        },
        "country": "US"
      },
      "contentDetails": {
        "relatedPlaylists": {
          "likes": "",
          "uploads": "UUeDiMzJEUbPgaruDcXnD4Cg"
        }
      },
      "statistics": {
        "viewCount": "1428701",
        "subscriberCount": "5340",
        "hiddenSubscriberCount": false,
        "videoCount": "1741"
      },
      "topicDetails": {
        "topicIds": [
          "/m/098wr",
          "/m/05qt0"
        ],
        "topicCategories": [
          "https://en.wikipedia.org/wiki/Society",
          "https://en.wikipedia.org/wiki/Politics"
        ]
      },
      "status": {
        "privacyStatus": "public",
        "isLinked": true,
        "longUploadsStatus": "longUploadsUnspecified"
      },
      "brandingSettings": {
        "channel": {
          "title": "City of San Jose, CA",
          "description": "With almost one million residents, San José is one of the safest, and most diverse cities in the United States. It is Northern California’s largest city and the 13th largest in the nation. Colloquially known as the Capital of Silicon Valley, San José’s transformation into a global innovation center has resulted in one of the nation’s highest concentrations of technology companies and expertise in the world.",
          "unsubscribedTrailer": "mEd25UErtPw",
          "country": "US"
        },
        "image": {
          "bannerExternalUrl": "https://yt3.googleusercontent.com/Vp-n5GjLp9EkbgaWcJntExB2442KAHU3zYqo5NTMsJpiY2vCIxIlZwlLJxkeEE-EzvQ8oabm"
        }
      },
      "contentOwnerDetails": {}
    }
  ],
  "download_date": "2025-11-10T11:25:38.516298"
}


```

## Usage notes

Analyze module:

- `Analyze` wraps OpenAI chat calls and records per-call usage in `Analyze.usage`.
- Caching uses `cache/<model_name>`. Set `force=True` to skip cache reads, and `cache_result=False`
  to skip writes; both can be overridden per call.
- Set `websearch=True` on `Analyze(...)` to include the OpenAI `web_search` tool in requests.
- Per-call overrides include `seed`, `timeout`, `force`, and `cache_result`.

``` python
from CAIR import Analyze

model = Analyze(model_name="gpt-5-mini", force=True, websearch=True)
content = model(
    prompt="Summarize the hearing in 5 bullets.",
    system_prompt="You are a concise analyst.",
    cache_result=True,
)
print(model.usage)
```

Transcription module:

- `Transcription.transcribe_s3(s3_location, text_only=...)` streams audio directly from S3 and reuses
  the same post-processing as `transcribe(...)`.
- `s3_location` must be a full S3 URI like `s3://my-bucket/path/to/audio.mp3`.
- `Transcription(method=...)` supports `whisper` and `faster_whisper`.
- `compute_vad=True` enables Silero VAD and adds `is_vad` to row-based transcript output.
- Silero VAD prefers CUDA when available and falls back to CPU automatically.
- Progress bars are enabled by default for Silero VAD, VAD stitching, and
  `faster_whisper` segment consumption. Set `vad_progress=False`,
  `stitch_progress=False`, or `output_progress=False` to disable them.
- `force=True` skips cache reads for that call while still writing fresh results.

``` python
from CAIR import Transcription

t = Transcription()
df = t.transcribe_s3("s3://my-bucket/path/to/audio.mp3", text_only=False)
print(df[["start", "end", "text"]].head())
```

``` python
from CAIR import Transcription

t = Transcription(
    method="faster_whisper",
    model_size="distil-large-v3",
    compute_vad=True,
    vad_progress=True,
    stitch_progress=True,
    output_progress=True,
)
df = t.transcribe("meeting_audio.wav", text_only=False)
print(df[["start", "end", "text", "is_vad"]].head())
```
