Metadata-Version: 2.4
Name: contextweave
Version: 0.2.0
Summary: ContextWeave: context-aware document translation with glossary management
Project-URL: Homepage, https://github.com/bot-32142/context-aware-translation
Project-URL: Documentation, https://bot-32142.github.io/context-aware-translation/
Project-URL: Repository, https://github.com/bot-32142/context-aware-translation
Project-URL: Issues, https://github.com/bot-32142/context-aware-translation/issues
License-Expression: GPL-3.0-only
License-File: LICENSE
Keywords: documents,glossary,llm,ocr,pdf,translation
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Environment :: X11 Applications :: Qt
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.12
Requires-Dist: defusedxml>=0.7
Requires-Dist: faiss-cpu>=1.8.0
Requires-Dist: google-genai>=1.0.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: jellyfish>=1.0.0
Requires-Dist: numpy<2.0.0,>=1.26.4
Requires-Dist: openai>=1.0.0
Requires-Dist: opencc-python-reimplemented>=0.1.7
Requires-Dist: pikepdf>=9.0.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: platformdirs>=4.5.1
Requires-Dist: protobuf>=4.25.0
Requires-Dist: pydantic>=2.9.2
Requires-Dist: pypandoc-binary>=1.13
Requires-Dist: pypdfium2>=4.0.0
Requires-Dist: pyside6>=6.6.0
Requires-Dist: pysubs2>=1.8.1
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: requests>=2.31.0
Requires-Dist: semchunk>=3.2.3
Requires-Dist: superqt>=0.8.0
Requires-Dist: tenacity>=8.0.0
Requires-Dist: tokenizers>=0.15.0
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.57.3
Description-Content-Type: text/markdown

**English** | [中文](README_ZH.md)

# ContextWeave

**Context-aware document translation / 语境感知文档翻译**

ContextWeave (语络译) is a fully automatic desktop translation app for long novels, books, PDFs, scanned documents, manga, and subtitles. It aims to preserve source formatting while keeping terminology and translation style consistent across the whole work.

[Advanced documentation](https://bot-32142.github.io/context-aware-translation/) covers glossary memory, previous-context injection, format preservation, CLI automation, and advanced use cases.

## Who ContextWeave Is For

- Novel, web novel, and light novel translation
- Long books and documents that need consistent naming and terminology
- Scanned books, PDFs, and manga that need OCR before translation
- People who want a desktop workflow instead of managing prompts by hand

## Why ContextWeave

- Builds a glossary from your source material
- Carries context forward across chapters and pages, with useful summaries injected alongside glossary context
- Preserves original formatting for text-native files
- Handles text, EPUB, PDF, scanned pages, manga, and subtitles in one app

## Install

Current desktop builds are unsigned, so the first launch may show an OS security warning.

### macOS

- Download the latest `.dmg`
- Open it and drag `ContextWeave.app` into `Applications`
- Launch `ContextWeave.app` from `Applications`
- If macOS blocks it because the developer cannot be verified, go to `System Settings` -> `Privacy & Security`
- In the `Security` section, click `Open Anyway` for `ContextWeave.app`, then confirm `Open`

### Windows

- Download the latest `.zip`
- Unzip it anywhere
- Run `ContextWeave.exe`
- If Windows SmartScreen warns that the app is unrecognized, click `More info` -> `Run anyway`

<details>
<summary><strong>Setup</strong></summary>

### 1. Open the projects screen and click `Setup Wizard`

This is the home screen for projects. Use `Setup Wizard` for the quickest first-time setup.

![Projects overview](docs/screenshots/EN/latest_projects_overview.png)

### 2. Choose providers and paste API keys

The wizard collects the providers it needs up front. For most users, `DeepSeek` + `Gemini` is the most practical starting point.

![Setup wizard provider selection](docs/screenshots/EN/latest_setup_wizard_provider_selection.png)

### 3. Review the workflow profile

The review step shows which connection and model ContextWeave will use for each workflow step.

![Workflow profile review](docs/screenshots/EN/latest_setup_wizard_workflow_profile_review.png)

`Quality` spends more for better reasoning and can get very, very expensive unless you are using only `DeepSeek`. `Balanced` is the safest default. `Budget` is the cheapest option when you want to minimize cost.

</details>

## Translation

### 1. Create a project

Pick a project name, target language, and workflow profile.

![New project dialog](docs/screenshots/EN/latest_new_project_dialog.png)

### 2. Open the project work page

Import files in reading order so terminology and context stay consistent across the whole book, then click `Translate and Export` to start. Double-click a file if you want to inspect each step manually or retouch images.

![Project work page](docs/screenshots/EN/latest_project_work_overview.png)

### 3. Optional: import existing term translations

Open `Terms`, then use `Import Terms` if you already have a terminology list you want ContextWeave to reuse. A simple JSON object like `{"original": "translated"}` is enough.

![Terms overview](docs/screenshots/EN/latest_terms_overview.png)

## Demo EPUBs

This sample EPUB was generated with `Translate and Export` directly from the French Project Gutenberg EPUB for [The Count of Monte Cristo, Tome I](https://www.gutenberg.org/ebooks/17989), using `DeepSeek`.

Quality can be dramatically better with `Gemini` or `GPT`, but the cost is also significantly higher.

- [The Count of Monte Cristo.epub](demo/The%20Count%20of%20Monte%20Cristo.epub) - English output. Cost: under `$2.5`.

## CLI

ContextWeave also includes a small CLI for config-driven one-shot translation and basic book management. From a source checkout, use `uv run contextweave-cli`; from an installed package, use `contextweave-cli`.

```bash
contextweave-cli config path
contextweave-cli config init
contextweave-cli config validate

contextweave-cli run ./book.epub --output ./translated/book.epub
contextweave-cli run ./chapter.txt --output ./translated/chapter.txt --json
contextweave-cli run ./episode.srt --output ./translated/episode.srt --no-polish

contextweave-cli books list
contextweave-cli books show BOOK_ID
contextweave-cli books delete BOOK_ID --yes
```

The CLI resolves config from `--config`, then `CONTEXTWEAVE_CONFIG`, then the nearest `contextweave.yaml`/`.contextweave.yaml` walking upward, then the platform default shown by `contextweave-cli config path`. The config mirrors the setup UI: `connections` define provider endpoints and `workflow_profiles` route each translation step. Prefer `api_key_env` so API keys stay in environment variables instead of config files or task snapshots. Use `--no-polish` when a one-shot run should skip the polish pass, which can be useful for timing-sensitive subtitle output.

A commented starting point is available at [docs/examples/contextweave-cli.yaml](docs/examples/contextweave-cli.yaml).

## What To Know Before Using ContextWeave

- The setup wizard path is mainly tested with `DeepSeek` + `Gemini`. `Claude` and `GPT` should also work well, but I do not recommend going below `DeepSeek`-class models.
- Image editing is expensive, and hallucinations are still common. For image reembedding, `GPT Image 2` is recommended when available.
- OCR does not preserve original layout for PDFs and scanned books. It rebuilds from content instead. Manga is the exception.
- Import in reading order if you want the glossary and context to build correctly.
- Samples are still limited because testing across formats is expensive. Bug reports are very welcome.

## Supported Formats

| Type | Import | Export | OCR needed before translation? |
| --- | --- | --- | --- |
| Text | `.txt`, `.md` | `txt` | No |
| PDF | `.pdf` | `epub`, `md` | Yes |
| Scanned book | image files or folders | `epub`, `md` | Yes |
| Manga | `.cbz`, image folders | `cbz` | Yes |
| EPUB | `.epub` | `epub`, `md`, `docx`, `html` | No, but image OCR is supported |
| Subtitle | `.srt`, `.vtt`, `.ass`, `.ssa` | `srt`, `vtt`, `ass`, `ssa` | No |
