Metadata-Version: 2.4
Name: cleanmybooks
Version: 0.1.0
Summary: Clean and standardize messy book filenames using LLM + Google Books
Author-email: Tenali Rama <tenalirama.krishna125@gmail.com>
License: MIT
Keywords: books,ebooks,filenames,rename,llm,cli
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28
Requires-Dist: tqdm>=4.64
Requires-Dist: python-dotenv>=1.0
Dynamic: license-file

# 📚 CleanMyBooks

**Clean and standardize messy book filenames using LLM + Google Books API.**

CleanMyBooks takes chaotic ebook filenames like `python.crash.course.2ndEd_FINAL_v2.pdf` and renames them to a clean, consistent format:

```
Eric Matthes - Python Crash Course (2019).pdf
```

---

## Features

- 🧠 **LLM-powered parsing** via OpenRouter (GPT-4o-mini by default)
- 🔍 **Google Books verification** for authoritative metadata
- 📊 **Confidence scoring** — falls back to LLM if Google match is weak
- ⚡ **Parallel processing** with configurable thread workers
- 💾 **JSON caching** — avoids re-processing the same file twice
- 🛡️ **Safe renaming** — dry-run mode, collision-safe, no overwrites
- 📝 **Supports**: `.pdf`, `.epub`, `.mobi`, `.azw`, `.azw3`

---

## Installation

### From source

```bash
git clone https://github.com/yourusername/cleanmybooks.git
cd cleanmybooks
pip install -e .
```

### From PyPI *(once published)*

```bash
pip install cleanmybooks
```

---

## Setup

1. Copy the example env file:

```bash
cp .env.example .env
```

2. Add your OpenRouter API key to `.env`:

```
OPENROUTER_API_KEY=sk-or-v1-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```

Get a free API key at [openrouter.ai/keys](https://openrouter.ai/keys).

---

## Usage

### Basic — rename all books in a folder

```bash
cleanmybooks /path/to/my/ebooks
```

### Dry run — preview changes without renaming

```bash
cleanmybooks /path/to/my/ebooks --dry-run
```

### Verbose output with more workers

```bash
cleanmybooks /path/to/my/ebooks --workers 8 --verbose
```

### Adjust confidence threshold

```bash
cleanmybooks /path/to/my/ebooks --confidence-threshold 0.75
```

Lower threshold = trust LLM more. Higher threshold = require stronger Google Books match.

### Clear the cache

```bash
cleanmybooks --clear-cache
```

---

## CLI Options

| Option | Default | Description |
|--------|---------|-------------|
| `folder` | *(required)* | Directory containing book files |
| `--dry-run` | `False` | Preview changes without renaming |
| `--workers N` | `4` | Number of parallel threads |
| `--confidence-threshold FLOAT` | `0.6` | Min similarity score to use Google result |
| `--verbose` | `False` | Enable debug logging |
| `--cache-file PATH` | `~/.cleanmybooks_cache.json` | Custom cache file location |
| `--clear-cache` | — | Clear cache and exit |

---

## Example Input/Output

| Original Filename | Cleaned Filename |
|---|---|
| `python.crash.course.2ndEd.pdf` | `Eric Matthes - Python Crash Course (2019).pdf` |
| `DUNE_frank_herbert_scanned.epub` | `Frank Herbert - Dune (1965).epub` |
| `clean_code_uncle_bob.pdf` | `Robert C. Martin - Clean Code (2008).pdf` |
| `atomic_habits_james_clear_2018.epub` | `James Clear - Atomic Habits (2018).epub` |
| `unknown_book_v3_FINAL.pdf` | `Unknown Author - Unknown Title.pdf` *(graceful fallback)* |

---

## Output Format

```
Author - Title (Year).ext
```

Multi-author books are collapsed to:

```
First Author et al. - Title (Year).ext
```

---

## How It Works

```
filename.pdf
    │
    ▼
[LLM via OpenRouter]
    │  Parse: title, authors, year
    ▼
[Google Books API]
    │  Verify and enrich metadata
    ▼
[Confidence Score]
    │  Token-overlap similarity (Jaccard)
    │  ≥ threshold → use Google result
    │  < threshold → fall back to LLM result
    ▼
[Rename]
    │  Sanitize characters
    │  Resolve collisions
    │  Author - Title (Year).ext
    ▼
[Cache] → skip on next run
```

---

## Environment Variables

| Variable | Description |
|---|---|
| `OPENROUTER_API_KEY` | **Required.** Your OpenRouter API key |

---

## License

MIT
