Metadata-Version: 2.4
Name: text-block-renderer
Version: 3.0.0
Summary: A library for parsing HTML text, splitting text into blocks, and rendering blocks to PNG images
Author: Text Driven Subtitle Team
License-Expression: MIT
Keywords: text,subtitle,html,rendering,png
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pillow>=9.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: arabic-reshaper>=3.0.0
Requires-Dist: python-bidi>=0.6.7
Provides-Extra: dev
Requires-Dist: pytest==9.0.2; extra == "dev"
Requires-Dist: pyarmor==9.2.3; extra == "dev"
Requires-Dist: mypy==1.19.1; extra == "dev"
Requires-Dist: build==1.4.0; extra == "dev"
Requires-Dist: twine==6.2.0; extra == "dev"
Dynamic: license-file

# Text Block Renderer

A Python library for generating timeline subtitles from HTML text. Parse styled HTML, split text into dimension-constrained blocks, render to PNG images, and calculate reading times for video subtitle production.

一个用于从 HTML 文本生成时间轴字幕的 Python 库。解析带样式的 HTML、按尺寸约束拆分文本块、渲染为 PNG 图片，并计算阅读时间以用于视频字幕制作。

------------------------------------------------------------------------

## Installation \| 安装

Install from PyPI:

``` bash
pip install textblockrenderer
```

Or install from source:

``` bash
cd textblockrenderer
pip install -e .
```

从源码安装：

``` bash
cd textblockrenderer
pip install -e .
```

------------------------------------------------------------------------

## Features \| 功能特性

-   **HTML Parsing**: Parse HTML text with color and font-size styles\
    **HTML 解析**：支持解析带颜色与字号样式的 HTML 文本

-   **Text Splitting**: Split text into blocks within specified
    dimensions\
    **文本拆分**：根据尺寸约束拆分文本块

-   **PNG Rendering**: Render text blocks to PNG images with
    customizable styles\
    **PNG 渲染**：将文本块渲染为可自定义样式的 PNG 图片

-   **Newline Support**: Preserve `\n` as explicit line breaks\
    **换行支持**：保留 `\n` 作为显式换行

-   **Unlimited Height**: Use `max_height=0` to disable height limits\
    **无限高度模式**：设置 `max_height=0` 可取消高度限制

-   **O(n) Performance**: Incremental measurement algorithm for
    efficient splitting\
    **O(n) 性能**：使用增量测量算法提升拆分效率

-   **RTL Language Support**: Full support for Arabic and Hebrew (Right-to-Left)\
    **RTL 语言支持**：完整支持阿拉伯语和希伯来语（从右到左书写）

-   **Timeline Subtitles**: Generate timeline subtitles with read time calculation\
    **时间轴字幕**：生成带阅读时间计算的时间轴字幕

-   **Overlapping Frames**: Support multiple subtitle blocks in a single frame\
    **重叠帧**：支持单帧内显示多个字幕块

-   **Two-Step Pipeline**: Decouple timeline generation from rendering for flexible editing\
    **两步流水线**：时间轴生成与渲染解耦，支持灵活编辑

-   **Dual-Height Constraints**: `preferred_height` for optimal layout, `max_height` for absolute limits with auto-scaling\
    **双高度约束**：`preferred_height` 优化布局，`max_height` 绝对限制 + 自动缩放

-   **Translation Workflows**: Route A (re-split) and Route B (per-block translate + scale) with HTML round-trip\
    **翻译工作流**：路线 A（重新拆分）和路线 B（逐块翻译 + 缩放），支持 HTML 往返转换

-   **Long Image Export**: Generate scrolling long PNG images\
    **长图导出**：生成滚屏长图

-   **Adaptive Text Rendering**: Three adjust modes (WIDTH / HEIGHT_EXTEND / ADAPT) with vertical alignment\
    **自适应文本渲染**：三种调整模式（固定宽度 / 高度扩展 / 自适应缩放）及垂直对齐

------------------------------------------------------------------------

## Multi-language Support \| 多语言支持

Supports 19 languages with language-specific WPM (words per minute) and semantic splitting:

| Language | Code | WPM | Language | Code | WPM |
|----------|------|-----|----------|------|-----|
| English | EN | 150 | German | DE | 150 |
| French | FR | 150 | Spanish | ES | 150 |
| Portuguese | PT | 150 | Italian | IT | 150 |
| Dutch | NL | 150 | Polish | PL | 150 |
| Romanian | RO | 150 | Swedish | SV | 150 |
| Turkish | TR | 150 | Vietnamese | VI | 150 |
| Indonesian | ID | 150 | Greek | EL | 150 |
| Filipino | TL | 150 | Russian | RU | 130 |
| Hindi | HI | 100 | **Arabic** | AR | 120 |
| **Hebrew** | HE | 120 | | | |

**RTL Languages** (Arabic, Hebrew): Full support with Arabic reshaping and bidirectional text rendering.

------------------------------------------------------------------------

## Usage \| 使用示例

``` python
from textblockrenderer import (
    FontSpec, RenderConstraint, SplitConfig,
    TextMeasurer, split_html_paragraphs,
    split_html_to_colored_blocks, render_colored_block
)

# 1. Parse HTML paragraphs
html_text = '<p>Hello <span style="color: red;">World</span></p>'
paragraphs = split_html_paragraphs(html_text)

# Or combine paragraphs with newlines
combined = split_html_paragraphs(html_text, combine=True)

# 2. Configure font and constraints
font_spec = FontSpec(font_path="arialbd.ttf", font_size=36)
measurer = TextMeasurer(font_spec)
constraint = RenderConstraint(max_width=680, max_height=300)
# Configure language for better semantic splitting (default "EN")
config = SplitConfig(min_words_per_block=8, language="DE")

# 3. Split text into blocks
blocks = split_html_to_colored_blocks(paragraphs[0], measurer, constraint, config)

# 4. Render to PNG
for i, block in enumerate(blocks):
    img = render_colored_block(
        block,
        measurer,
        background_color=(0, 0, 0, 0),
        default_color="white",
        align="center"
    )
    img.save(f"block_{i}.png")
```

### Explanation \| 说明

-   Parse HTML into paragraphs\
    将 HTML 文本解析为段落

-   Configure font and layout constraints\
    配置字体与尺寸限制

-   Split paragraphs into renderable blocks\
    将段落拆分为可渲染文本块

-   Render blocks into PNG images\
    将文本块渲染为 PNG 图片

------------------------------------------------------------------------

## Newline Support \| 换行支持

``` python
text = "First line\nSecond line\n\nAfter empty line"
blocks = split_html_to_colored_blocks(text, measurer, constraint, config)
```

Result:

-   Line 1: First line\
-   Line 2: Second line\
-   Line 3: empty line\
-   Line 4: After empty line

结果：

-   第1行：First line
-   第2行：Second line
-   第3行：空行
-   第4行：After empty line

------------------------------------------------------------------------

## Unlimited Height \| 无限高度模式

``` python
constraint = RenderConstraint(max_width=680, max_height=0)
blocks = split_html_to_colored_blocks(text, measurer, constraint, config)
```

All text will remain in a single block.

所有文本将保留在一个块中。

------------------------------------------------------------------------

## API Reference \| API 参考

### HTML Parsing \| HTML 解析

-   `parse_html_text(html_text, base_font_size, target_font_size)`\
-   `split_html_paragraphs(html_text, combine=False)`

------------------------------------------------------------------------

### Text Splitting \| 文本拆分

-   `TextMeasurer(font_spec)`
-   `split_paragraph_to_blocks(paragraph, measurer, constraint, config)`
-   `split_html_to_colored_blocks(html_paragraph, measurer, constraint, config)`

------------------------------------------------------------------------

### PNG Rendering \| PNG 渲染

-   `render_colored_block(block, measurer, ...)`

------------------------------------------------------------------------

### Pipeline API \| 流水线 API

End-to-end functions for timeline subtitle generation:

| Function | Description |
|----------|-------------|
| `text_to_timeline_subtitles()` | Generate timeline subtitle frames from HTML / 从 HTML 生成时间轴字幕帧 |
| `text_to_overlapping_subtitles()` | Generate overlapping timeline subtitles / 生成可重叠时间轴字幕 |
| `build_overlapping_timeline()` | **Step 1**: Build timeline without rendering / 仅生成时间轴，不渲染 |
| `render_and_layout_frames()` | **Step 2**: Render PNG + auto-layout / 渲染 PNG + 自动布局 |
| `block_to_html()` | Convert block to HTML for translation / 将字幕块转为 HTML 用于翻译 |
| `rebuild_block_from_html()` | Rebuild block from translated HTML / 用翻译后的 HTML 重建字幕块 |
| `text_to_long_image()` | Generate a single long PNG image / 生成单张长图 |
| `render_text_image()` | Render text image with adaptive adjust modes / 自适应模式渲染文本图片 |
| `blocks_to_timeline_subtitles()` | Create subtitles from existing blocks / 从已有块创建字幕 |
| `calculate_total_duration()` | Calculate total duration of frames / 计算字幕帧总时长 |
| `get_timestamps()` | Get start/end timestamps for each frame / 获取每帧时间戳 |
| `export_to_srt()` | Export frames to SRT file / 导出为 SRT 文件 |
| `get_overlapping_total_duration()` | Calculate overlapping frames duration / 计算重叠帧总时长 |
| `export_overlapping_to_srt()` | Export overlapping frames to SRT / 导出重叠帧为 SRT |

#### text_to_timeline_subtitles

Generate timeline subtitle frames, one block per frame.  
生成时间轴字幕帧，每帧一个字幕块。

``` python
from textblockrenderer import (
    text_to_timeline_subtitles, export_to_srt,
    FontSpec, RenderConstraint, SplitConfig, RenderStyle
)

html_text = "<p>First paragraph.</p><p>Second paragraph with more text.</p>"
font_spec = FontSpec(font_path="arialbd.ttf", font_size=36, line_spacing=-8)
constraint = RenderConstraint(max_width=555, max_height=400)
split_config = SplitConfig(min_words_per_block=8, language="EN")
style = RenderStyle(
    text_color="#FFFFFF",
    mask_mode=2,  # 0=none, 1=rounded rect, 2=per-line polygon
    mask_color=(255, 0, 0, 180),
    padding=(4, 4),
    mask_offset=(8, 8),
    stroke_size=2,
    stroke_color=(0, 0, 0, 255),
)

frames = text_to_timeline_subtitles(
    html_text, font_spec, constraint, "output/",
    split_config=split_config, style=style, wpm=160
)
export_to_srt(frames, "output/subtitles.srt")
```

#### text_to_overlapping_subtitles

Generate overlapping subtitle frames, multiple blocks can share one frame.  
生成重叠字幕帧，多个字幕块可以合并到同一帧。

``` python
from textblockrenderer import (
    text_to_overlapping_subtitles, export_overlapping_to_srt,
    FontSpec, RenderConstraint, SplitConfig, RenderStyle
)

frames = text_to_overlapping_subtitles(
    html_text, font_spec, constraint, "output/",
    split_config=split_config,
    style=style,
    wpm=160,
    spacing=10,               # spacing between blocks in pixels
    max_blocks_per_frame=3,   # max blocks per frame, 0=unlimited
)
export_overlapping_to_srt(frames, "output/subtitles.srt")

# Each frame contains multiple BlockTiming objects
for frame in frames:
    print(f"Frame ends at {frame.frame_end_time}s, {len(frame.block_timings)} blocks")
    for timing in frame.block_timings:
        print(f"  {timing.start_time}s - {timing.end_time}s: {timing.block.plain_text[:30]}...")
```

#### Two-Step Pipeline \| 两步流水线

Decouple timeline generation from rendering for editing and translation workflows.  
将时间轴生成与渲染解耦，支持编辑和翻译工作流。

``` python
from textblockrenderer import (
    build_overlapping_timeline, render_and_layout_frames,
    FontSpec, RenderConstraint, SplitConfig, RenderStyle, TrackMode,
)

constraint = RenderConstraint(
    max_width=555,
    preferred_height=400,  # optimal layout height for frame merging
    max_height=600,        # absolute limit, triggers scaling if exceeded
)

# Step 1: Generate timeline (data only, no rendering)
frames = build_overlapping_timeline(
    html_text, font_spec, constraint,
    split_config=split_config, style=style, wpm=160,
    track_mode=TrackMode.SEQUENTIAL,       # or TrackMode.SYNCHRONIZED
)

# --- Edit block text here if needed ---

# Step 2: Render PNG + auto-layout
frame_layouts = render_and_layout_frames(
    frames, font_spec, constraint, "output/",
    style=style,
    spacing=10,
    align="center",
    scale_to_fit=False,                    # True to auto-scale when exceeding max_height
)

for layout in frame_layouts:
    print(f"Canvas: {layout.total_width}x{layout.total_height}, scale={layout.scale}")
    for p in layout.placements:
        print(f"  Block at ({p.x}, {p.y}) size {p.width}x{p.height}")
```

**Key Parameters / 关键参数:**

| Parameter | Description |
|-----------|-------------|
| `preferred_height` | Optimal height for frame merging (default 0 = use max_height) / 帧合并的最佳高度 |
| `max_height` | Absolute height limit, exceeding triggers scale / 绝对高度限制 |
| `scale_to_fit` | `True`: auto-scale PNG when exceeding max_height / 超出时自动缩放 |
| `track_mode` | `SEQUENTIAL`: blocks show at their own times / 按各自时间显示 |
| | `SYNCHRONIZED`: all blocks share same start/end time / 所有轨道同时显示 |

#### Translation Workflows \| 翻译工作流

Two routes for translating subtitle content:  
两种字幕翻译路线：

**Route A: Translate first, then re-split / 路线 A：先翻译全文，再重新拆分**

``` python
translated_html = your_translate(original_html)
frames = build_overlapping_timeline(translated_html, font_spec, constraint, ...)
layouts = render_and_layout_frames(frames, font_spec, constraint, "output/")
```

**Route B: Split first, translate per-block, then render / 路线 B：先拆分，逐块翻译，再渲染**

``` python
from textblockrenderer import (
    build_overlapping_timeline, render_and_layout_frames,
    block_to_html, rebuild_block_from_html, TextMeasurer,
)

# Step 1: Split by original text
frames = build_overlapping_timeline(original_html, font_spec, constraint, ...)
measurer = TextMeasurer(font_spec, stroke_size=style.stroke_size)

# Step 2: Translate each block (preserving HTML tags)
for frame in frames:
    for timing in frame.block_timings:
        html = block_to_html(timing.block)              # → <p><span>...</span>...</p>
        translated = your_translate(html)               # call translation API
        rebuild_block_from_html(
            timing.block, translated, measurer, constraint, style
        )

# Step 3: Render with scaling (translated text may be longer)
layouts = render_and_layout_frames(
    frames, font_spec, constraint, "output/",
    style=style, scale_to_fit=True,
)
```

| | Route A | Route B |
|--|---------|--------|
| Translate / 翻译对象 | Full HTML text / 整篇 HTML | Per-block HTML / 每个 block 的 HTML 片段 |
| Frame structure / 帧结构 | Regenerated from translation / 按译文重新生成 | Preserved from original / 保持原文结构 |
| Timeline / 时间轴 | Recalculated / 重新计算 | Unchanged / 不变 |
| `scale_to_fit` | Not needed / 不需要 | Recommended / 建议开启 |

#### text_to_long_image

Generate a single long scrolling image from HTML text.  
从 HTML 文本生成单张滚屏长图。

``` python
from textblockrenderer import (
    text_to_long_image,
    FontSpec, RenderConstraint, RenderStyle
)

block = text_to_long_image(
    html_text, font_spec, constraint, "output/long_image.png",
    style=style
)
print(f"Generated image: {block.image_width}x{block.image_height}")
```

#### render_text_image

Render text image with adaptive adjust modes and vertical alignment.  
自适应模式渲染文本图片，支持三种调整模式和垂直对齐。

**Adjust Modes / 调整模式:**

| Mode | Description |
|------|-------------|
| `WIDTH` | Fixed width, height grows until all text is rendered / 固定宽度，高度自适应 |
| `HEIGHT_EXTEND` | Fixed height, width grows from `min_width` to `max_width`; extends height if still needed / 固定高度，宽度扩展；仍放不下则延伸高度 |
| `ADAPT` | Fixed area, auto-shrinks font size (binary search, min 16px) → reduces line spacing → truncates / 固定区域，二分缩小字号→缩小行距→截断 |

**Vertical Alignment / 垂直对齐:** `top` / `middle` / `bottom` (effective when content < image height)

``` python
from textblockrenderer import (
    render_text_image,
    FontSpec, RenderConstraint, SplitConfig, RenderStyle,
    AdjustMode, VerticalAlignment,
)

html_text = "<p>Hello <span style='color: red;'>World</span></p>"
font_spec = FontSpec(font_path="arialbd.ttf", font_size=36, line_spacing=-10)
style = RenderStyle(
    text_color="#FFFFFF", alignment="center",
    padding=(10, 10), mask_mode=1, mask_color=(0, 0, 0, 180),
    stroke_size=2, stroke_color=(0, 0, 0, 255),
)

# WIDTH mode: fixed width, auto height
block = render_text_image(
    html_text, font_spec,
    RenderConstraint(max_width=555),
    "output/width.png",
    adjust="WIDTH", vertical_align="top", style=style,
)

# HEIGHT_EXTEND mode: try fit in height, expand width, then extend height
block = render_text_image(
    html_text, font_spec,
    RenderConstraint(max_width=555, max_height=400, min_width=300),
    "output/height_extend.png",
    adjust="HEIGHT_EXTEND", vertical_align="middle", style=style,
)

# ADAPT mode: fit in fixed area, auto-shrink font/spacing, truncate last
block = render_text_image(
    html_text, font_spec,
    RenderConstraint(max_width=555, max_height=400),
    "output/adapt.png",
    adjust=AdjustMode.ADAPT,
    vertical_align=VerticalAlignment.MIDDLE,
    style=style,
)
print(f"Final font size: {font_spec.font_size}, line spacing: {font_spec.line_spacing}")
```

### Timing API \| 时间计算 API

| Function | Description |
|----------|-------------|
| `calculate_read_time()` | Calculate read time for a block / 计算单个块的阅读时间 |
| `calculate_read_times()` | Calculate read times for multiple blocks / 批量计算阅读时间 |
| `create_frame_subtitles()` | Create subtitle frames with timing / 创建带时间的字幕帧 |
| `can_fit_next_block()` | Check if next block fits in frame / 检查下一块是否适合帧 |
| `create_overlapping_frame_subtitles()` | Create overlapping subtitle frames / 创建重叠字幕帧 |
| `get_wpm_for_language()` | Get WPM for a specific language / 获取特定语言的 WPM |

------------------------------------------------------------------------

### Data Models \| 数据模型

-   `FontSpec` --- Font configuration / 字体配置
-   `RenderConstraint` --- Size constraints (max_width, preferred_height, max_height) / 尺寸限制
-   `SplitConfig` --- Split configuration / 拆分配置
-   `RenderStyle` --- Render style configuration / 渲染样式配置
-   `ColoredWord` --- Styled word object (with `leading_space` for HTML round-trip) / 带样式的单词对象
-   `ColoredSubtitleBlock` --- Styled text block / 带样式文本块
-   `FrameSubtitle` --- Timeline subtitle frame / 时间轴字幕帧
-   `BlockTiming` --- Block timing info / 块时间信息
-   `OverlappingFrameSubtitle` --- Overlapping subtitle frame / 重叠字幕帧
-   `FrameLayout` --- Frame layout with placements and scale factor / 帧布局（坐标 + 缩放因子）
-   `BlockPlacement` --- Block position (x, y, width, height) / 块坐标
-   `AdjustMode` --- Adjust mode enum (WIDTH / HEIGHT_EXTEND / ADAPT) / 调整模式枚举
-   `VerticalAlignment` --- Vertical alignment enum (top / middle / bottom) / 垂直对齐枚举
-   `TrackMode` --- Track timing mode (SEQUENTIAL / SYNCHRONIZED) / 轨道时间模式枚举

------------------------------------------------------------------------

## Version History \| 版本历史

### 3.0.0

-   **Two-step pipeline**: `build_overlapping_timeline()` + `render_and_layout_frames()` / 两步流水线：时间轴生成与渲染解耦
-   **Dual-height constraints**: `preferred_height` for frame merging, `max_height` for scaling / 双高度约束
-   **Auto-scaling**: `scale_to_fit` parameter, `FrameLayout.scale` output / 自动缩放
-   **Track modes**: `TrackMode.SEQUENTIAL` / `TrackMode.SYNCHRONIZED` / 轨道时间模式
-   **Translation workflow**: `block_to_html()` + `rebuild_block_from_html()` for Route B / 翻译工作流工具函数
-   **HTML round-trip**: `ColoredWord.leading_space` preserves whitespace at tag boundaries / HTML 往返保留标签边界空格
-   **Pydantic V2**: Migrate from `class Config` to `ConfigDict` / 升级 Pydantic V2 配置

### 2.1.0

-   Add `render_text_image()` function with three adjust modes / 新增 `render_text_image()` 函数，支持三种调整模式
-   Add `AdjustMode` enum: `WIDTH`, `HEIGHT_EXTEND`, `ADAPT` / 新增 `AdjustMode` 枚举
-   Add `VerticalAlignment` enum: `top`, `middle`, `bottom` / 新增 `VerticalAlignment` 枚举
-   Add `min_width` field to `RenderConstraint` / `RenderConstraint` 新增 `min_width` 字段
-   Add `vertical_align` and `target_height` params to `render_colored_block()` / `render_colored_block()` 新增垂直对齐和目标高度参数
-   ADAPT mode: binary search font size (min 16px) → reduce line spacing → truncate / ADAPT 模式：二分缩小字号→缩小行距→截断

### 2.0.0

-   Add `max_blocks_per_frame` parameter to `text_to_overlapping_subtitles()` / 为 `text_to_overlapping_subtitles()` 添加每帧最大块数限制参数
-   Update README with Pipeline API usage examples / 更新 README 添加 Pipeline API 使用示例

### 1.4.4

-   Increase the size of the produced images / 增加生产的图片尺寸
-   Fix the issue of imprecise rendering / 修复渲染尺寸不精准的问题

### 1.4.1

-   Add stroke style / 增加描边样式

### 1.4.0

-   Overlapping Frame Subtitles / 可重叠帧时间轴字幕
-   Single Long Image Export / 单张长图字幕导出
-   `text_to_overlapping_subtitles()` API
-   `text_to_long_image()` API
-   `BlockTiming` and `OverlappingFrameSubtitle` models

### 1.3.0

-   Release code encryption / 发布代码加密


### 1.2.0

-   RTL Language Support / 从右到左语言支持 (Arabic, Hebrew)
-   Arabic reshaping for connected letters / 阿拉伯语字母连接
-   Bidirectional text rendering / 双向文本渲染
-   RTL-aware semantic break detection / RTL 感知的语义断点检测
-   Performance optimization with batch RTL processing / 批量 RTL 处理性能优化

### 1.1.0

-   Multi-language Support / 多语言支持 (19 languages)
-   Language-specific semantic splitting / 特定语言的语义拆分
-   Greek question mark support / 希腊语问号支持

### 1.0.0

-   Stable release / 稳定版本发布
-   O(n) incremental measurement / O(n) 增量测量算法
-   Newline support / 支持换行
-   Unlimited height mode / 无限高度模式
-   Paragraph combination support / 支持段落合并

------------------------------------------------------------------------

## License \| 许可证

MIT License

------------------------------------------------------------------------

## Todo / Roadmap

-   [ ] Chinese, Japanese and Korean language support / 中日韩语言支持
-   [ ] Block forced end marker / 区块强制结束标记
