Metadata-Version: 2.4
Name: novel-downloader
Version: 3.0.1
Summary: A command-line tool for downloading Chinese web novels from Qidian and similar platforms.
Author-email: Saudade Z <saudadez217@gmail.com>
License: MIT License
        
        Copyright (c) 2025 Saudade Z
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/saudadez21/novel-downloader
Project-URL: Source, https://github.com/saudadez21/novel-downloader
Project-URL: Issues, https://github.com/saudadez21/novel-downloader/issues
Keywords: novel,web novel,qidian,epub,light novel,crawler,scraper,ebook
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: Chinese (Simplified)
Classifier: Natural Language :: Chinese (Traditional)
Classifier: Topic :: Utilities
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: rich
Requires-Dist: requests
Requires-Dist: pycryptodome
Requires-Dist: aiohttp
Requires-Dist: lxml
Requires-Dist: platformdirs
Provides-Extra: web-ui
Requires-Dist: nicegui; extra == "web-ui"
Provides-Extra: image-utils
Requires-Dist: brotli; extra == "image-utils"
Requires-Dist: numpy; extra == "image-utils"
Requires-Dist: fonttools; extra == "image-utils"
Requires-Dist: pillow; extra == "image-utils"
Provides-Extra: httpx
Requires-Dist: httpx[http2]; extra == "httpx"
Provides-Extra: curl-cffi
Requires-Dist: curl_cffi; extra == "curl-cffi"
Provides-Extra: all-backends
Requires-Dist: httpx[http2]; extra == "all-backends"
Requires-Dist: curl_cffi; extra == "all-backends"
Provides-Extra: all
Requires-Dist: nicegui; extra == "all"
Requires-Dist: brotli; extra == "all"
Requires-Dist: numpy; extra == "all"
Requires-Dist: fonttools; extra == "all"
Requires-Dist: pillow; extra == "all"
Requires-Dist: httpx[http2]; extra == "all"
Requires-Dist: curl_cffi; extra == "all"
Provides-Extra: tests
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pytest-asyncio; extra == "tests"
Requires-Dist: pytest-cov; extra == "tests"
Requires-Dist: pytest-mock; extra == "tests"
Requires-Dist: types-requests; extra == "tests"
Requires-Dist: types-lxml; extra == "tests"
Provides-Extra: ci
Requires-Dist: pytest; extra == "ci"
Requires-Dist: pytest-asyncio; extra == "ci"
Requires-Dist: pytest-cov; extra == "ci"
Requires-Dist: pytest-mock; extra == "ci"
Requires-Dist: types-requests; extra == "ci"
Requires-Dist: types-lxml; extra == "ci"
Requires-Dist: nicegui; extra == "ci"
Requires-Dist: black; extra == "ci"
Requires-Dist: mypy; extra == "ci"
Requires-Dist: ruff; extra == "ci"
Requires-Dist: pre-commit; extra == "ci"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: pytest-mock; extra == "dev"
Requires-Dist: types-requests; extra == "dev"
Requires-Dist: types-lxml; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: commitizen; extra == "dev"
Requires-Dist: babel; extra == "dev"
Dynamic: license-file

# novel-downloader

[![PyPI](https://img.shields.io/pypi/v/novel-downloader.svg)](https://pypi.org/project/novel-downloader/)
[![Python](https://img.shields.io/pypi/pyversions/novel-downloader.svg)](https://www.python.org/downloads/)
[![CI](https://github.com/saudadez21/novel-downloader/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/saudadez21/novel-downloader/actions/workflows/ci.yml)
[![Hits-of-Code](https://hitsofcode.com/github/saudadez21/novel-downloader?branch=main&label=Hits-of-Code)](https://hitsofcode.com/github/saudadez21/novel-downloader/view?branch=main&label=Hits-of-Code)

异步小说下载工具 / 库。支持断点续爬、广告过滤、多格式导出, 并提供 CLI 与 Web 图形界面。

> 运行要求: **Python 3.11+** (开发环境: Python 3.13)

## 功能特性

### 下载能力

* **可恢复下载**: 自动识别已完成的章节, 跳过重复抓取
* **可插拔式 HTTP 后端**: 支持 `aiohttp` (默认)、`httpx`、`curl_cffi`

### 多格式导出

* **TXT**
* **EPUB**
* **HTML**

### 内容清洗与增强

* 广告与活动过滤
  * 章节标题过滤
  * 正文章节过滤
* 文本处理流水线 (processors)
  * 正则清理
  * 繁简转换
  * 机器翻译
* 图片章节 / 混淆章节处理 (`image-utils` 可选)
  * 原图下载
  * 去水印
  * 图像预处理
  * 图片章节转文字 (需要 `enable_ocr`)
  * 字体混淆还原 (需要 `enable_ocr`)

### 扩展性

* **插件系统**: 可扩展站点解析、文本处理器、导出器等能力
* **可插拔式下载后端**: 适配不同 HTTP 客户端

### 使用方式

* **命令行 (CLI)**
* **Web 图形界面 (GUI)**

---

## 安装与更新

使用 `pip` 安装最新稳定版本:

```bash
pip install -U novel-downloader
```

如需启用字体解密 / 图片转文字 (`enable_ocr`), 请参见: [安装](https://github.com/saudadez21/novel-downloader/blob/main/docs/1-installation.md)

---

## 快速开始

### 0. 设置语言 (可选)

```bash
# 设置为中文
novel-cli config set-lang zh_CN

# 设置为英文
novel-cli config set-lang en_US
```

### 1. 初始化配置文件

```bash
# 生成默认配置 ./settings.toml
novel-cli config init
```

生成 `settings.toml` 后可编辑 `request_interval`、`book_ids` 等参数。

详见: [settings.toml 配置说明](https://github.com/saudadez21/novel-downloader/blob/main/docs/3-settings-schema.md)

### 2. 命令行 (CLI)

![cli_download](./docs/images/cli_download.gif)

常用示例:

```bash
# 使用书籍页面 URL 自动解析并下载
novel-cli download https://www.hetushu.com/book/5763/index.html

# 使用配置文件中的 book_ids 启动下载
novel-cli download --site qidian

# 指定站点 + 书籍 ID 启动下载
novel-cli download --site n23qb 12282
```

更多参数:

```bash
novel-cli --help
novel-cli download --help
```

* 支持站点见: [支持站点列表](https://github.com/saudadez21/novel-downloader/blob/main/docs/4-supported-sites.md)
* 更多示例见: [CLI 使用示例](https://github.com/saudadez21/novel-downloader/blob/main/docs/5-cli-usage-examples.md)
* 运行中可使用 `CTRL+C` 取消任务

### 3. 图形界面 (Web GUI)

Web GUI 依赖额外组件 (如 NiceGUI), 默认不会随主程序一起安装。

如需使用 Web 图形界面，请先安装对应的可选依赖。

#### 3.1. 安装 Web GUI 依赖

```bash
pip install novel-downloader[web-ui]
```

> 若只需使用 CLI，可忽略此步骤。

#### 3.2 启动 Web GUI

```bash
novel-web
```

如需提供局域网/外网访问 (请自行留意安全与网络环境):

```bash
novel-web --listen public
```

在运行过程中, 可使用 `CTRL+C` 停止服务。

#### 3.3 更多资料

* 支持站点见: [支持站点列表](https://github.com/saudadez21/novel-downloader/blob/main/docs/4-supported-sites.md)
* 更多示例见: [WEB 使用示例](https://github.com/saudadez21/novel-downloader/blob/main/docs/6-web-usage-examples.md)

### 4. 编程接口 (Programmatic API)

```python
import asyncio
from novel_downloader.plugins import registrar
from novel_downloader.schemas import BookConfig, ClientConfig

async def main() -> None:
    site = "n23qb"

    # 指定书籍 ID
    book = BookConfig(book_id="12282")

    # 创建客户端
    cfg = ClientConfig(request_interval=0.5)
    client = registrar.get_client(site, cfg)

    # 在异步上下文中执行下载
    async with client:
        await client.download(book)

    # 下载完成后执行导出操作
    client.export(book, formats=["txt", "epub"])

if __name__ == "__main__":
    asyncio.run(main())
```

---

## 文本处理 (`processors`)

导出前可执行多阶段流水线处理, 包括:

* 正则清理 (自定义去广告/去水印)
* 繁简转换 (基于 [opencc-python](https://github.com/yichen0831/opencc-python))
* 自动翻译 (支持 `google` / `edge` / `youdao` 等翻译器)
* 文本纠错 (基于 [pycorrector](https://github.com/shibing624/pycorrector))

处理顺序可配置, 并可生成中间产物用于导出

> 详细配置示例见: [processors 配置](./docs/3-settings-schema.md#processors-配置)

---

## 插件系统

通过插件可扩展站点解析、文本处理器、导出器等能力。

在 `settings.toml` 启用插件并实现对应接口后, 即可自动加入下载流程。

示例: 新增站点解析器 (如 "刺猬猫" -> `ciweimao`), 实现目录页与章节页的抓取及解析方后即可直接下载:

```bash
novel-cli download --site ciweimao 123456
```

> 详见: [插件系统文档](./docs/plugins.md)

---

## 从源码安装 (开发版)

```bash
git clone https://github.com/saudadez21/novel-downloader.git
cd novel-downloader

# 可选: 为多语言支持编译翻译文件
# pip install babel
# pybabel compile -d src/novel_downloader/locales

pip install .
# 或安装带可选功能:
# pip install .[image-utils]
```

---

## 常见问题 / 排错

* **网站结构变更导致解析失败**: 请更新至最新版或按站点文档自定义适配。
* **需要登录的站点**: 参考 [复制 Cookies](https://github.com/saudadez21/novel-downloader/blob/main/docs/copy-cookies.md)。
* **导出文件位置**: 见 [文件保存](https://github.com/saudadez21/novel-downloader/blob/main/docs/file-saving.md#downloads)。

---

## 注意事项

* **站点结构变更**: 若目标站点页面结构更新或章节抓取异常, 欢迎提 Issue 或提交 PR
* **登录支持范围**: 登录功能受站点策略与接口限制, 部分场景需要手动配置 Cookie 或进行账号绑定
* **请求频率**: 请合理设置抓取间隔, 避免触发风控或导致 IP 限制

---

## 文档导航

* [安装](https://github.com/saudadez21/novel-downloader/blob/main/docs/1-installation.md)
* [配置](https://github.com/saudadez21/novel-downloader/blob/main/docs/2-configuration.md)
* [settings.toml 配置说明](https://github.com/saudadez21/novel-downloader/blob/main/docs/3-settings-schema.md)
* [支持站点列表](https://github.com/saudadez21/novel-downloader/blob/main/docs/4-supported-sites.md)
* [CLI 使用示例](https://github.com/saudadez21/novel-downloader/blob/main/docs/5-cli-usage-examples.md)
* [WEB 使用示例](https://github.com/saudadez21/novel-downloader/blob/main/docs/6-web-usage-examples.md)
* [复制 Cookies](https://github.com/saudadez21/novel-downloader/blob/main/docs/copy-cookies.md)
* [文件保存](https://github.com/saudadez21/novel-downloader/blob/main/docs/file-saving.md)
* [模块与接口文档](https://github.com/saudadez21/novel-downloader/blob/main/docs/api.md)
* [TODO](https://github.com/saudadez21/novel-downloader/blob/main/docs/todo.md)

---

## 项目说明

* 本项目仅供学习和研究使用, **不得**用于任何商业或违法用途; 请遵守目标网站的 `robots.txt` 及相关法律法规
* 使用本项目产生的任何法律责任由使用者自行承担, 作者不承担相关责任
