Metadata-Version: 2.4
Name: transcribe-it
Version: 0.3.0
Summary: Lightweight CLI for ingesting, enriching, and storing meeting transcripts
Project-URL: Homepage, https://github.com/psousa50/transcribe-it
Project-URL: Repository, https://github.com/psousa50/transcribe-it
Project-URL: Issues, https://github.com/psousa50/transcribe-it/issues
Author-email: Pedro Sousa <pedronsousa@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: cli,gmail,llm,meetings,slack,transcripts
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business
Classifier: Topic :: Text Processing
Requires-Python: >=3.12
Requires-Dist: google-api-python-client>=2.150
Requires-Dist: google-auth-oauthlib>=1.2
Requires-Dist: litellm>=1.60
Requires-Dist: python-dotenv>=1.0.1
Requires-Dist: pyyaml>=6.0
Requires-Dist: questionary>=2.1.1
Requires-Dist: slack-sdk>=3.41.0
Requires-Dist: typer>=0.15
Description-Content-Type: text/markdown

# transcribe-it

A lightweight CLI for ingesting meeting transcripts (Gmail or Slack), enriching them with an LLM, and storing the results as local files.

```
Source -> Extract -> LLM Enrich -> Local Files
```

## Prerequisites

- Python 3.12+
- A Google Cloud project with Gmail API + Google Drive API enabled (for the Gmail source), or a Slack bot token (for the Slack source)
- An API key for one of the supported LLM providers (Anthropic, OpenAI, or Groq) — only needed if you want LLM enrichment

## Install

```bash
uv tool install transcribe-it
```

Or with pipx:

```bash
pipx install transcribe-it
```

## Setup

Setup is split into two steps: a one-off global step for credentials, and a per-project step for what to ingest.

### Step 1: Configure credentials (once per machine)

```bash
transcribe-it setup
```

Pick which credentials to set up — Google OAuth (for Gmail), Slack bot token, and/or LLM provider — and the values are written to `~/.config/transcript/env`. Re-run any time to add or rotate values; existing values are preserved unless you confirm overwrite (or pass `--force`).

### Step 2: Initialise a project (per directory)

From the directory where you want transcripts to land:

```bash
transcribe-it init
```

This asks which sources to enable, source-specific config (sender filter, channel ID, etc.), output path, and lookback window. Writes `.transcripts/config.yaml`. No secrets prompts — it'll warn if the credentials a chosen source needs aren't set yet.

### Gmail credentials

`setup` asks for `GOOGLE_OAUTH_CLIENT_ID` and `GOOGLE_OAUTH_CLIENT_SECRET`. Two options:

1. **Reuse someone else's OAuth client** — ask a teammate for the values and have them add your Google account as a Test user on their OAuth consent screen.
2. **Create your own** — in Google Cloud Console, create an OAuth 2.0 Client ID of type *Desktop app*, then copy the client ID and secret from the resulting credentials.

After `setup` and `init`, authenticate:

```bash
transcribe-it auth gmail
```

### Slack credentials

`setup` asks for `SLACK_BOT_TOKEN` (`xoxb-...`). The bot needs to be a member of the channels you want to ingest from. The channel ID itself is configured per-project in `init`.

## Usage

By default, ingestion only extracts the raw transcript — no LLM call, no API key required. Pass `--enrich` to also generate a summary, topics, and participants via LLM.

```bash
# Last N days, raw extraction only (default)
transcribe-it ingest gmail --days 7

# With LLM enrichment
transcribe-it ingest gmail --days 7 --enrich

# Enrichment + cleaned transcript variant (--clean implies --enrich)
transcribe-it ingest gmail --days 7 --clean

# Specific date range
transcribe-it ingest gmail --from 2026-04-01 --to 2026-04-05

# Preview matching emails without fetching or writing
transcribe-it ingest gmail --days 1 --dry-run

# Ingest a single transcript file directly
transcribe-it ingest file path/to/transcript.txt
```

### Output

Raw mode (default) writes a single `.txt` file per transcript:

```
.transcripts/
  2026-04-09-ai-labs-daily.txt
```

With `--enrich`, each transcript becomes a folder:

```
.transcripts/
  2026-04-09-ai-labs-daily/
    raw.txt          # Original transcript (immutable)
    metadata.json    # Source, date, participants, topics, summary
```

With `--clean`, an additional `clean.md` is written (structured: title, summary, topics, cleaned transcript).

### Prompts

LLM prompts are bundled with the package under `transcribe_it/prompts/`. To customise, fork the repo and edit `prompts/enrich.md`.

## Commands

| Command | Description |
|---------|-------------|
| `transcribe-it setup` | Configure global credentials (OAuth, LLM, Slack token) |
| `transcribe-it init` | Initialise project config (sources, output path, lookback) |
| `transcribe-it auth gmail` | Authenticate with Gmail (OAuth) |
| `transcribe-it ingest gmail` | Ingest transcripts from Gmail |
| `transcribe-it ingest file PATH` | Ingest a single transcript file |

### Ingest options (Gmail)

| Flag | Description |
|------|-------------|
| `--days N` | How many days back to search |
| `--from YYYY-MM-DD` | Start date |
| `--to YYYY-MM-DD` | End date |
| `--profile NAME` | Gmail auth profile |
| `--dry-run` | List matching emails without processing |
| `--enrich` | Run LLM enrichment (summary, topics, participants) |
| `--clean` | Also generate a cleaned version of the transcript (implies `--enrich`) |

## Configuration files

| Path | Purpose |
|------|---------|
| `.transcripts/config.yaml` | Per-project: sources, lookback, output destinations |
| `~/.config/transcript/env` | Global: API keys and OAuth credentials |
