Metadata-Version: 2.4
Name: mailsense
Version: 0.1.1
Summary: Automated mail intelligence pipeline: Gmail → image extraction → Gemini AI analysis
Author-email: Samapriya Roy <samapriya.roy@gmail.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/samapriya/mailsense
Project-URL: Bug Tracker, https://github.com/samapriya/mailsense/issues
Keywords: gmail,usps,informed-delivery,gemini,mail,ocr,ai
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: End Users/Desktop
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Communications :: Email
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: tqdm>=4.66
Requires-Dist: Pillow>=10.0
Requires-Dist: google-generativeai>=0.7
Requires-Dist: rich>=13.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"

# mailsense

> **Automated mail intelligence pipeline.**
> Gmail → `.mbox` → image extraction → Gemini AI analysis — all from one CLI.

[![PyPI version](https://img.shields.io/pypi/v/mailsense.svg)](https://pypi.org/project/mailsense/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE)

---

<p align="center">
  <img src="https://i.imgur.com/1sixczI.png" alt="mailsense icon" width="150">
</p>

## What is mailsense?

**mailsense** turns your USPS Informed Delivery emails (or any image-heavy Gmail label) into clean, structured JSON — automatically.

The pipeline has three stages:
---

The pipeline has three stages:

```
Gmail label
    │
    ▼  mailsense download
.mbox files
    │
    ▼  mailsense extract
images + metadata.json
    │
    ▼  mailsense analyze
structured JSON (sender, recipient, postage, document type, summary…)
```

Run each stage independently, or fire them all at once with **`mailsense pipeline`**.

---

## Installation

### From PyPI

```bash
pip install mailsense
```

### From source

```bash
git clone https://github.com/samapriya/mailsense.git
cd mailsense
pip install -e .
```

### Build a wheel for local distribution

```bash
pip install build
python -m build          # produces dist/mailsense-*.whl and dist/mailsense-*.tar.gz
pip install dist/mailsense-*.whl
```

---

## Prerequisites

| Requirement | How to get it |
|---|---|
| **Python 3.10+** | [python.org](https://www.python.org/) |
| **Gmail IMAP enabled** | Gmail → Settings → See all settings → Forwarding and POP/IMAP → Enable IMAP |
| **Gmail App Password** | [myaccount.google.com](https://myaccount.google.com) → Security → 2-Step Verification → App Passwords |
| **Gemini API key** | [aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey) — free tier: 15 RPM / 1500 RPD |

---

## Quick Start

### 1. Store your credentials once

Run the interactive setup wizard — it walks through every setting, masks sensitive input with `*` as you type (paste works too), and lets you press Enter to keep the current or default value:

```bash
mailsense config configure
```

Or set values individually:

```bash
mailsense config set gmail_email     you@gmail.com
mailsense config set gmail_password  YOUR_APP_PASSWORD
mailsense config set gmail_label     "USPS Informed Delivery"
mailsense config set gemini_api_key  YOUR_GEMINI_KEY
```

Credentials are stored in `~/.mailsense` (mode `0600`, owner-readable only).

### 2. Run the full pipeline

```bash
# If gmail_label is saved in config, --label can be omitted:
mailsense pipeline --work-dir ./my_mail

# Or specify the label explicitly:
mailsense pipeline --label "USPS Informed Delivery" --work-dir ./my_mail
```

Results land in:

```
my_mail/
  mbox/          ← grouped .mbox files from Gmail
  images/        ← extracted mail images + metadata.json
  analyzed/      ← one .json per image with AI-extracted data
```

---

## Commands

### `mailsense config`

Manage credentials and defaults stored in `~/.mailsense`.

```bash
# Interactive wizard — prompts for all settings with masked input for secrets
mailsense config configure

# Reconfigure specific keys only
mailsense config configure gmail_email gmail_label

# Show current config (secrets masked)
mailsense config show

# Set a single value
mailsense config set gmail_label     "USPS Informed Delivery"
mailsense config set api_delay       2

# Remove a value
mailsense config unset gmail_password

# List all recognised keys and descriptions
mailsense config keys
```

**Available config keys:**

| Key | Description |
|-----|-------------|
| `gmail_email` | Gmail address used for IMAP |
| `gmail_password` | Gmail App Password |
| `gmail_label` | Gmail label to download (e.g. `USPS Informed Delivery`) |
| `gemini_api_key` | Google Gemini API key |
| `sender_filter` | From-header substring filter used during extraction (default: `usps`) |
| `gemini_model` | Gemini model name (default: `gemini-2.0-flash`) |
| `api_delay` | Seconds between Gemini requests (default: `4`) |

> **`gmail_label` vs `sender_filter`:** `gmail_label` is the Gmail folder you download from. `sender_filter` is a substring matched against the `From:` header of each email within those downloads to identify the right messages — the default `usps` matches addresses like `informeddelivery@usps.com`. You rarely need to change `sender_filter`.

---

### `mailsense download`

Download emails from a Gmail label to `.mbox` files.

```bash
# Uses gmail_label from config if --label is not specified
mailsense download --output-dir ./mbox

# Override the label explicitly
mailsense download --label "USPS Informed Delivery" --output-dir ./mbox

# Last 90 days only, grouped by week
mailsense download --start 90d --group-by week

# Specific date range, 14-day chunks
mailsense download \
  --start 01-01-2025 \
  --end   12-31-2025 \
  --group-by days \
  --days-per-file 14

# List all available Gmail labels
mailsense download --list-labels
```

**Options:**

| Flag | Default | Description |
|------|---------|-------------|
| `--label`, `-l` | config `gmail_label` / prompted | Gmail label to download |
| `--email`, `-e` | config `gmail_email` / prompted | Gmail address |
| `--password`, `-p` | config `gmail_password` / prompted (masked) | App Password |
| `--output-dir`, `-o` | `mbox_export` | Directory for `.mbox` files |
| `--start` | — | Start date (`MM-DD-YYYY`, `YYYY-MM-DD`, or `90d`) |
| `--end` | — | End date (inclusive) |
| `--group-by` | `month` | `month` / `week` / `days` / `single` / `individual` |
| `--days-per-file` | `7` | Window size when `--group-by=days` |
| `--list-labels` | — | Print all Gmail labels and exit |
| `--no-resume` | — | Re-download everything, ignoring previous state |

**Date formats:** `02-25-2026` · `2026-02-25` · `02/25/2026` · `90d` (relative)

---

### `mailsense extract`

Extract images from `.mbox` files, writing `metadata.json`.

```bash
# Single .mbox file
mailsense extract --input-dir inbox.mbox --output-dir ./images

# Entire folder of .mbox files (batch mode — one subdirectory per mbox)
mailsense extract --input-dir ./mbox_export --output-dir ./images

# Dry run — scan without writing
mailsense extract --input-dir ./mbox_export --dry-run
```

**Options:**

| Flag | Default | Description |
|------|---------|-------------|
| `--input-dir`, `-i` | required | `.mbox` file or directory of `.mbox` files |
| `--output-dir`, `-o` | `img_extracts` | Root output directory |
| `--dry-run`, `-n` | — | Scan without writing |
| `--log-level` | `INFO` | `DEBUG` / `INFO` / `WARNING` / `ERROR` |
| `--log-file` | — | Write full debug log to a file |

The sender filter is read from config (`sender_filter`, default `usps`) and applied automatically — no flag needed.

**Image filtering rules (applied in order):**

1. Extension must be `.jpg` `.jpeg` `.png` `.gif` `.webp` `.bmp` `.tiff`
2. Filename must **not** start with `content`, `mailer`, or `ra` (email chrome/trackers)
3. File must be ≥ 1 KB (eliminates tracking pixels)

---

### `mailsense analyze`

Analyze mail images with Gemini AI, producing structured JSON.

```bash
mailsense analyze \
  --input-dir  ./images \
  --output-dir ./analyzed

# Custom model and delay (paid tier — faster)
mailsense analyze \
  --input-dir  ./images \
  --output-dir ./analyzed \
  --model gemini-1.5-pro \
  --delay 1

# Dry run
mailsense analyze --input-dir ./images --output-dir ./analyzed --dry-run
```

**Options:**

| Flag | Default | Description |
|------|---------|-------------|
| `--input-dir`, `-i` | required | Output directory from the extract stage |
| `--output-dir`, `-o` | required | Directory for analyzed JSON files |
| `--api-key`, `-k` | config `gemini_api_key` / env | Gemini API key |
| `--model`, `-m` | config `gemini_model` (`gemini-2.0-flash`) | Gemini model |
| `--delay`, `-d` | config `api_delay` (`4`) | Seconds between API calls |
| `--dry-run`, `-n` | — | Show what would be processed |

**Rate limits:**

| Tier | RPM | RPD | Recommended `--delay` |
|------|-----|-----|-----------------------|
| Free | 15 | 1500 | `4` (default) |
| Paid | 1000+ | — | `1` or lower |

**Output format (per image):**

```json
{
  "status": "Processed",
  "is_marketing": false,
  "sender": {
    "name": "Capital One",
    "organization": "Capital One Bank",
    "address": { "street": "PO Box 30281", "city": "Salt Lake City", "state": "UT", "zip_code": "84130" }
  },
  "recipient": { "name": "Jane Smith", "address": { ... } },
  "postage_details": { "type": "First Class Mail", "status": "Delivered" },
  "document_info": { "document_type": "Credit Card Statement", "reference_numbers": ["..."] },
  "content_summary": "Monthly credit card statement showing account balance and minimum payment due.",
  "filename": "image_3_a1b2c3d4.jpg",
  "mail_metadata": {
    "date": "Thu, 12 Feb 2026 08:00:00 -0500",
    "subject": "Your USPS Informed Delivery Daily Digest",
    "from": "USPS Informed Delivery <InformedDelivery@informeddelivery.usps.com>"
  }
}
```

---

### `mailsense pipeline`

Run all three stages end-to-end from a single command.

```bash
# Uses gmail_label from config — no --label needed
mailsense pipeline --work-dir ./mail_run

# Override the label explicitly
mailsense pipeline --label "USPS Informed Delivery" --work-dir ./mail_run

# Last 90 days, skip re-downloading if mbox already exists
mailsense pipeline \
  --start    90d \
  --work-dir ./mail_run \
  --skip-download

# Dry run of the entire pipeline
mailsense pipeline --dry-run
```

**Workflow output structure:**

```
mail_run/
  mbox/         ← grouped .mbox files
  images/       ← images + metadata.json per mbox
  analyzed/     ← one .json per image
```

**Resume behaviour:** Each stage is individually resumable — already-processed files are skipped automatically on re-runs. Use `--no-resume` to force a clean download, or `--skip-download` / `--skip-extract` / `--skip-analyze` to selectively re-run only the stages you need.

**All options:**

| Flag | Default | Description |
|------|---------|-------------|
| `--work-dir`, `-w` | `mailsense_run` | Root directory for all pipeline outputs |
| `--label`, `-l` | config `gmail_label` / prompted | Gmail label to download |
| `--email`, `-e` | config `gmail_email` / prompted | Gmail address |
| `--password`, `-p` | config `gmail_password` / prompted | App Password |
| `--start` | — | Start date |
| `--end` | — | End date (inclusive) |
| `--group-by` | `month` | `.mbox` grouping strategy |
| `--days-per-file` | `7` | Window size when `--group-by=days` |
| `--no-resume` | — | Re-download everything |
| `--api-key`, `-k` | config / env | Gemini API key |
| `--model` | config / `gemini-2.0-flash` | Gemini model |
| `--delay` | config / `4` | Seconds between Gemini requests |
| `--dry-run`, `-n` | — | Dry run all stages |
| `--skip-download` | — | Skip download, use existing .mbox files |
| `--skip-extract` | — | Skip extract, use existing images |
| `--skip-analyze` | — | Skip analyze |

---

## Building for Distribution

```bash
# Install build tools
pip install build twine

# Build sdist + wheel
python -m build

# Inspect the wheel contents
python -m zipfile -l dist/mailsense-0.1.0-py3-none-any.whl

# Upload to PyPI
twine upload dist/*

# Upload to TestPyPI first (recommended)
twine upload --repository testpypi dist/*
```

---

## Project Structure

```
mailsense/
├── mailsense/
│   ├── __init__.py          # version, metadata
│   ├── cli.py               # argparse root + dispatcher
│   ├── config.py            # ~/.mailsense read/write + interactive wizard
│   └── commands/
│       ├── __init__.py
│       ├── config_cmd.py    # mailsense config
│       ├── download.py      # mailsense download — Gmail IMAP → .mbox
│       ├── extract.py       # mailsense extract — .mbox → images
│       ├── analyze.py       # mailsense analyze — images → JSON
│       └── pipeline.py      # mailsense pipeline — end-to-end
├── tests/
│   └── test_config.py
├── pyproject.toml
├── LICENSE
└── README.md
```

---

## Environment Variables

| Variable | Equivalent config key |
|----------|-----------------------|
| `GEMINI_API_KEY` | `gemini_api_key` |

CLI flags always take precedence over both config file and environment variables.

---

## Contributing

1. Fork the repository
2. Create a feature branch: `git checkout -b feature/my-feature`
3. Make your changes and add tests
4. Run tests: `pytest`
5. Submit a pull request

---

## License

Copyright 2026 Samapriya Roy. Licensed under the [Apache License 2.0](LICENSE).
