Metadata-Version: 2.4
Name: wst-library
Version: 0.8.2
Summary: CLI tool for organizing books and PDFs with AI-powered metadata
Author: cnexans
License-Expression: LicenseRef-Proprietary
Project-URL: Homepage, https://github.com/cnexans/wst
Project-URL: Repository, https://github.com/cnexans/wst
Project-URL: Issues, https://github.com/cnexans/wst/issues
Keywords: pdf,books,library,metadata,cli,organizer
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: End Users/Desktop
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Utilities
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0
Requires-Dist: pymupdf>=1.24
Requires-Dist: pydantic>=2.0
Requires-Dist: InquirerPy>=0.3
Requires-Dist: PyYAML>=6.0
Provides-Extra: ocr
Requires-Dist: ocrmypdf>=16.0; extra == "ocr"
Provides-Extra: s3
Requires-Dist: boto3>=1.28; extra == "s3"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Dynamic: license-file

# wst — Wan Shi Tong

<div align="center">

<img src="docs/images/wan-shi-tong.png" alt="Wan Shi Tong" width="300">

*"I am Wan Shi Tong, he who knows ten thousand things."*

<sub>Character from Avatar: The Last Airbender. Avatar: The Last Airbender is a trademark of Viacom International Inc. Image used for illustrative purposes only.</sub>

</div>

---

CLI tool for organizing books and PDFs with AI-powered metadata generation.

Named after **Wan Shi Tong**, the ancient spirit who collected every piece of knowledge in the world and guarded the great library in the desert. This tool aspires to do the same for your PDFs — just with less hostility toward humans.

## Features

- **AI-powered metadata**: Automatically extracts and completes metadata (title, author, type, year, summary, tags, etc.) using Claude CLI with web search for missing fields (year, ISBN, publisher)
- **OCR support**: Optionally OCR scanned PDFs before ingestion to extract text from image-based documents
- **Metadata enrichment**: Fill in missing fields (ISBN, table of contents, publisher, year) on existing documents using AI + web search, individually or in batch
- **Organized library**: Files sorted by type (`books/`, `papers/`, `notes/`, `exercises/`, `guides/`) with consistent naming (`Author - Title (Year).pdf`)
- **SQLite search index**: Full-text search across title, author, tags, subject, and summary via FTS5
- **Coverage stats**: See metadata completeness across your library, broken down by document type and field
- **Interactive browser**: Fuzzy-search your library, view and edit metadata interactively
- **Cloud backup**: Backup files to iCloud Drive or S3, with extensible provider system
- **Extensible backends**: Abstract layers for AI (Claude CLI, future API/SDK) and storage (local filesystem, S3)

## Installation

### pipx (recommended, all platforms)

```bash
pipx install wst-library
```

### pip

```bash
pip install wst-library
```

### Homebrew (macOS/Linux)

```bash
brew tap cnexans/tap
brew install wst
```

### Chocolatey (Windows)

```powershell
choco install wst
```

### From source

```bash
git clone https://github.com/cnexans/wst.git
cd wst
make install
```

## Quick Start

```bash
# Ingest PDFs from a folder
wst ingest ~/Documents/papers/

# Ingest from default inbox (~/wst/inbox/)
wst ingest

# Ingest with OCR for scanned PDFs
wst ingest --ocr

# Ingest with manual confirmation for each file
wst ingest --confirm

# Re-ingest files with fresh AI metadata
wst ingest --reprocess

# Search
wst search "machine learning"
wst search --author "Knuth"
wst search --type textbook

# List and show
wst list
wst list --type paper --sort year
wst show 1

# Edit metadata
wst edit 1
wst edit "Player's Handbook"
wst edit 42 --enrich              # fill missing fields with AI + web search

# Enrich missing metadata in batch
wst fix --dry-run                 # preview what needs fixing
wst fix --type textbook           # fix all textbooks
wst fix --field isbn --field toc  # only fill ISBN and TOC
wst fix -y                        # auto-accept all changes

# Metadata coverage stats
wst stats
wst stats --type textbook

# Interactive browser
wst browse

# Backup
wst backup icloud
wst backup s3
```

## Commands

| Command | Description |
|---------|-------------|
| `wst ingest [PATH]` | Ingest PDFs, generate metadata with AI. Options: `--ocr`, `--confirm`, `--reprocess`, `--verbose` |
| `wst search <query>` | Full-text search. Options: `--author`, `--type`, `--subject` |
| `wst list` | List all documents. Options: `--type`, `--sort` |
| `wst show <id-or-title>` | Show complete metadata for a document |
| `wst edit <id-or-title>` | Edit metadata interactively, or `--enrich` to fill missing fields with AI |
| `wst fix` | Batch enrich documents with missing metadata. Options: `--type`, `--field`, `--dry-run`, `-y` |
| `wst stats` | Show metadata coverage statistics. Options: `--type` |
| `wst browse` | Interactive TUI for browsing and editing documents |
| `wst ocr <id-or-path>` | Run OCR on scanned PDFs |
| `wst backup [provider]` | Backup files to iCloud or S3 |

## How Ingestion Works

```
PDF file → [OCR (optional)] → Extract text + PDF metadata → AI generates metadata → Store + Index
```

1. **OCR** (optional, `--ocr`): Scanned PDFs are processed with `ocrmypdf` to extract text from images before metadata generation.
2. **Text extraction**: Reads existing PDF metadata and text from the first pages using PyMuPDF.
3. **AI metadata generation**: Sends the text sample to Claude CLI, which analyzes the content and uses web search to find ISBN, publisher, year, and other fields.
4. **Storage**: Files are moved to the library, organized by document type with consistent naming (`Author - Title (Year).pdf`).
5. **Indexing**: Metadata is stored in SQLite with full-text search (FTS5).

After ingestion, use `wst fix` to batch-enrich documents that are missing fields (ISBN, table of contents, etc.) — this is especially useful for scanned books where the initial AI pass may not have found all metadata.

## Library Structure

```
~/wst/
├── inbox/           # PDFs pending ingestion
└── library/
    ├── books/       # book, novel, textbook
    ├── papers/      # paper
    ├── notes/       # class-notes
    ├── exercises/   # exercises
    ├── guides/      # guide-theory, guide-practice
    └── wst.db       # SQLite index
```

## Documentation

See [docs/README.md](docs/README.md) for architecture details and diagrams.

## Requirements

- Python 3.11+
- AI backend (at least one):
  - `claude` CLI (authenticated) — default backend
  - `codex` CLI (authenticated) — use with `wst -b codex`
- macOS, Windows, or Linux

## Releasing

To publish a new version to PyPI:

```bash
# 1. Bump version in pyproject.toml
# 2. Trigger the release workflow from GitHub Actions:
gh workflow run "Create Tag and Release" \
  --field version="X.Y.Z" \
  --field release_notes="Release notes here"
```

This creates a git tag, a GitHub Release, and publishes to PyPI automatically.

## License

MIT with Commons Clause — free to use, modify, and distribute. Commercial sale rights reserved to the author. See [LICENSE](LICENSE).
