Metadata-Version: 2.4
Name: text-block-renderer
Version: 2.2.0
Summary: A library for parsing HTML text, splitting text into blocks, and rendering blocks to PNG images
Author: Text Driven Subtitle Team
License-Expression: MIT
Keywords: text,subtitle,html,rendering,png
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pillow>=9.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: arabic-reshaper>=3.0.0
Requires-Dist: python-bidi>=0.6.7
Provides-Extra: dev
Requires-Dist: pytest==9.0.2; extra == "dev"
Requires-Dist: pyarmor==9.2.3; extra == "dev"
Requires-Dist: mypy==1.19.1; extra == "dev"
Requires-Dist: build==1.4.0; extra == "dev"
Requires-Dist: twine==6.2.0; extra == "dev"
Dynamic: license-file

# Text Block Renderer

A Python library for generating timeline subtitles from HTML text. Parse styled HTML, split text into dimension-constrained blocks, render to PNG images, and calculate reading times for video subtitle production.

一个用于从 HTML 文本生成时间轴字幕的 Python 库。解析带样式的 HTML、按尺寸约束拆分文本块、渲染为 PNG 图片，并计算阅读时间以用于视频字幕制作。

------------------------------------------------------------------------

## Installation \| 安装

Install from PyPI:

``` bash
pip install textblockrenderer
```

Or install from source:

``` bash
cd textblockrenderer
pip install -e .
```

从源码安装：

``` bash
cd textblockrenderer
pip install -e .
```

------------------------------------------------------------------------

## Features \| 功能特性

-   **HTML Parsing**: Parse HTML text with color and font-size styles\
    **HTML 解析**：支持解析带颜色与字号样式的 HTML 文本

-   **Text Splitting**: Split text into blocks within specified
    dimensions\
    **文本拆分**：根据尺寸约束拆分文本块

-   **PNG Rendering**: Render text blocks to PNG images with
    customizable styles\
    **PNG 渲染**：将文本块渲染为可自定义样式的 PNG 图片

-   **Newline Support**: Preserve `\n` as explicit line breaks\
    **换行支持**：保留 `\n` 作为显式换行

-   **Unlimited Height**: Use `max_height=0` to disable height limits\
    **无限高度模式**：设置 `max_height=0` 可取消高度限制

-   **O(n) Performance**: Incremental measurement algorithm for
    efficient splitting\
    **O(n) 性能**：使用增量测量算法提升拆分效率

-   **RTL Language Support**: Full support for Arabic and Hebrew (Right-to-Left)\
    **RTL 语言支持**：完整支持阿拉伯语和希伯来语（从右到左书写）

-   **Timeline Subtitles**: Generate timeline subtitles with read time calculation\
    **时间轴字幕**：生成带阅读时间计算的时间轴字幕

-   **Overlapping Frames**: Support multiple subtitle blocks in a single frame\
    **重叠帧**：支持单帧内显示多个字幕块

-   **Long Image Export**: Generate scrolling long PNG images\
    **长图导出**：生成滚屏长图

-   **Adaptive Text Rendering**: Three adjust modes (WIDTH / HEIGHT_EXTEND / ADAPT) with vertical alignment\
    **自适应文本渲染**：三种调整模式（固定宽度 / 高度扩展 / 自适应缩放）及垂直对齐

------------------------------------------------------------------------

## Multi-language Support \| 多语言支持

Supports 19 languages with language-specific WPM (words per minute) and semantic splitting:

| Language | Code | WPM | Language | Code | WPM |
|----------|------|-----|----------|------|-----|
| English | EN | 150 | German | DE | 150 |
| French | FR | 150 | Spanish | ES | 150 |
| Portuguese | PT | 150 | Italian | IT | 150 |
| Dutch | NL | 150 | Polish | PL | 150 |
| Romanian | RO | 150 | Swedish | SV | 150 |
| Turkish | TR | 150 | Vietnamese | VI | 150 |
| Indonesian | ID | 150 | Greek | EL | 150 |
| Filipino | TL | 150 | Russian | RU | 130 |
| Hindi | HI | 100 | **Arabic** | AR | 120 |
| **Hebrew** | HE | 120 | | | |

**RTL Languages** (Arabic, Hebrew): Full support with Arabic reshaping and bidirectional text rendering.

------------------------------------------------------------------------

## Usage \| 使用示例

``` python
from textblockrenderer import (
    FontSpec, RenderConstraint, SplitConfig,
    TextMeasurer, split_html_paragraphs,
    split_html_to_colored_blocks, render_colored_block
)

# 1. Parse HTML paragraphs
html_text = '<p>Hello <span style="color: red;">World</span></p>'
paragraphs = split_html_paragraphs(html_text)

# Or combine paragraphs with newlines
combined = split_html_paragraphs(html_text, combine=True)

# 2. Configure font and constraints
font_spec = FontSpec(font_path="arialbd.ttf", font_size=36)
measurer = TextMeasurer(font_spec)
constraint = RenderConstraint(max_width=680, max_height=300)
# Configure language for better semantic splitting (default "EN")
config = SplitConfig(min_words_per_block=8, language="DE")

# 3. Split text into blocks
blocks = split_html_to_colored_blocks(paragraphs[0], measurer, constraint, config)

# 4. Render to PNG
for i, block in enumerate(blocks):
    img = render_colored_block(
        block,
        measurer,
        background_color=(0, 0, 0, 0),
        default_color="white",
        align="center"
    )
    img.save(f"block_{i}.png")
```

### Explanation \| 说明

-   Parse HTML into paragraphs\
    将 HTML 文本解析为段落

-   Configure font and layout constraints\
    配置字体与尺寸限制

-   Split paragraphs into renderable blocks\
    将段落拆分为可渲染文本块

-   Render blocks into PNG images\
    将文本块渲染为 PNG 图片

------------------------------------------------------------------------

## Newline Support \| 换行支持

``` python
text = "First line\nSecond line\n\nAfter empty line"
blocks = split_html_to_colored_blocks(text, measurer, constraint, config)
```

Result:

-   Line 1: First line\
-   Line 2: Second line\
-   Line 3: empty line\
-   Line 4: After empty line

结果：

-   第1行：First line
-   第2行：Second line
-   第3行：空行
-   第4行：After empty line

------------------------------------------------------------------------

## Unlimited Height \| 无限高度模式

``` python
constraint = RenderConstraint(max_width=680, max_height=0)
blocks = split_html_to_colored_blocks(text, measurer, constraint, config)
```

All text will remain in a single block.

所有文本将保留在一个块中。

------------------------------------------------------------------------

## API Reference \| API 参考

### HTML Parsing \| HTML 解析

-   `parse_html_text(html_text, base_font_size, target_font_size)`\
-   `split_html_paragraphs(html_text, combine=False)`

------------------------------------------------------------------------

### Text Splitting \| 文本拆分

-   `TextMeasurer(font_spec)`
-   `split_paragraph_to_blocks(paragraph, measurer, constraint, config)`
-   `split_html_to_colored_blocks(html_paragraph, measurer, constraint, config)`

------------------------------------------------------------------------

### PNG Rendering \| PNG 渲染

-   `render_colored_block(block, measurer, ...)`

------------------------------------------------------------------------

### Pipeline API \| 流水线 API

End-to-end functions for timeline subtitle generation:

| Function | Description |
|----------|-------------|
| `text_to_timeline_subtitles()` | Generate timeline subtitle frames from HTML / 从 HTML 生成时间轴字幕帧 |
| `text_to_overlapping_subtitles()` | Generate overlapping timeline subtitles / 生成可重叠时间轴字幕 |
| `text_to_long_image()` | Generate a single long PNG image / 生成单张长图 |
| `render_text_image()` | Render text image with adaptive adjust modes / 自适应模式渲染文本图片 |
| `blocks_to_timeline_subtitles()` | Create subtitles from existing blocks / 从已有块创建字幕 |
| `calculate_total_duration()` | Calculate total duration of frames / 计算字幕帧总时长 |
| `get_timestamps()` | Get start/end timestamps for each frame / 获取每帧时间戳 |
| `export_to_srt()` | Export frames to SRT file / 导出为 SRT 文件 |
| `get_overlapping_total_duration()` | Calculate overlapping frames duration / 计算重叠帧总时长 |
| `export_overlapping_to_srt()` | Export overlapping frames to SRT / 导出重叠帧为 SRT |

#### text_to_timeline_subtitles

Generate timeline subtitle frames, one block per frame.  
生成时间轴字幕帧，每帧一个字幕块。

``` python
from textblockrenderer import (
    text_to_timeline_subtitles, export_to_srt,
    FontSpec, RenderConstraint, SplitConfig, RenderStyle
)

html_text = "<p>First paragraph.</p><p>Second paragraph with more text.</p>"
font_spec = FontSpec(font_path="arialbd.ttf", font_size=36, line_spacing=-8)
constraint = RenderConstraint(max_width=555, max_height=400)
split_config = SplitConfig(min_words_per_block=8, language="EN")
style = RenderStyle(
    text_color="#FFFFFF",
    mask_mode=2,  # 0=none, 1=rounded rect, 2=per-line polygon
    mask_color=(255, 0, 0, 180),
    padding=(4, 4),
    mask_offset=(8, 8),
    stroke_size=2,
    stroke_color=(0, 0, 0, 255),
)

frames = text_to_timeline_subtitles(
    html_text, font_spec, constraint, "output/",
    split_config=split_config, style=style, wpm=160
)
export_to_srt(frames, "output/subtitles.srt")
```

#### text_to_overlapping_subtitles

Generate overlapping subtitle frames, multiple blocks can share one frame.  
生成重叠字幕帧，多个字幕块可以合并到同一帧。

``` python
from textblockrenderer import (
    text_to_overlapping_subtitles, export_overlapping_to_srt,
    FontSpec, RenderConstraint, SplitConfig, RenderStyle
)

frames = text_to_overlapping_subtitles(
    html_text, font_spec, constraint, "output/",
    split_config=split_config,
    style=style,
    wpm=160,
    spacing=10,               # spacing between blocks in pixels
    max_blocks_per_frame=3,   # max blocks per frame, 0=unlimited
)
export_overlapping_to_srt(frames, "output/subtitles.srt")

# Each frame contains multiple BlockTiming objects
for frame in frames:
    print(f"Frame ends at {frame.frame_end_time}s, {len(frame.block_timings)} blocks")
    for timing in frame.block_timings:
        print(f"  {timing.start_time}s - {timing.end_time}s: {timing.block.plain_text[:30]}...")
```

#### text_to_long_image

Generate a single long scrolling image from HTML text.  
从 HTML 文本生成单张滚屏长图。

``` python
from textblockrenderer import (
    text_to_long_image,
    FontSpec, RenderConstraint, RenderStyle
)

block = text_to_long_image(
    html_text, font_spec, constraint, "output/long_image.png",
    style=style
)
print(f"Generated image: {block.image_width}x{block.image_height}")
```

#### render_text_image

Render text image with adaptive adjust modes and vertical alignment.  
自适应模式渲染文本图片，支持三种调整模式和垂直对齐。

**Adjust Modes / 调整模式:**

| Mode | Description |
|------|-------------|
| `WIDTH` | Fixed width, height grows until all text is rendered / 固定宽度，高度自适应 |
| `HEIGHT_EXTEND` | Fixed height, width grows from `min_width` to `max_width`; extends height if still needed / 固定高度，宽度扩展；仍放不下则延伸高度 |
| `ADAPT` | Fixed area, auto-shrinks font size (binary search, min 16px) → reduces line spacing → truncates / 固定区域，二分缩小字号→缩小行距→截断 |

**Vertical Alignment / 垂直对齐:** `top` / `middle` / `bottom` (effective when content < image height)

``` python
from textblockrenderer import (
    render_text_image,
    FontSpec, RenderConstraint, SplitConfig, RenderStyle,
    AdjustMode, VerticalAlignment,
)

html_text = "<p>Hello <span style='color: red;'>World</span></p>"
font_spec = FontSpec(font_path="arialbd.ttf", font_size=36, line_spacing=-10)
style = RenderStyle(
    text_color="#FFFFFF", alignment="center",
    padding=(10, 10), mask_mode=1, mask_color=(0, 0, 0, 180),
    stroke_size=2, stroke_color=(0, 0, 0, 255),
)

# WIDTH mode: fixed width, auto height
block = render_text_image(
    html_text, font_spec,
    RenderConstraint(max_width=555),
    "output/width.png",
    adjust="WIDTH", vertical_align="top", style=style,
)

# HEIGHT_EXTEND mode: try fit in height, expand width, then extend height
block = render_text_image(
    html_text, font_spec,
    RenderConstraint(max_width=555, max_height=400, min_width=300),
    "output/height_extend.png",
    adjust="HEIGHT_EXTEND", vertical_align="middle", style=style,
)

# ADAPT mode: fit in fixed area, auto-shrink font/spacing, truncate last
block = render_text_image(
    html_text, font_spec,
    RenderConstraint(max_width=555, max_height=400),
    "output/adapt.png",
    adjust=AdjustMode.ADAPT,
    vertical_align=VerticalAlignment.MIDDLE,
    style=style,
)
print(f"Final font size: {font_spec.font_size}, line spacing: {font_spec.line_spacing}")
```

### Timing API \| 时间计算 API

| Function | Description |
|----------|-------------|
| `calculate_read_time()` | Calculate read time for a block / 计算单个块的阅读时间 |
| `calculate_read_times()` | Calculate read times for multiple blocks / 批量计算阅读时间 |
| `create_frame_subtitles()` | Create subtitle frames with timing / 创建带时间的字幕帧 |
| `can_fit_next_block()` | Check if next block fits in frame / 检查下一块是否适合帧 |
| `create_overlapping_frame_subtitles()` | Create overlapping subtitle frames / 创建重叠字幕帧 |
| `get_wpm_for_language()` | Get WPM for a specific language / 获取特定语言的 WPM |

------------------------------------------------------------------------

### Data Models \| 数据模型

-   `FontSpec` --- Font configuration / 字体配置
-   `RenderConstraint` --- Size constraints / 尺寸限制
-   `SplitConfig` --- Split configuration / 拆分配置
-   `RenderStyle` --- Render style configuration / 渲染样式配置
-   `ColoredWord` --- Styled word object / 带样式的单词对象
-   `ColoredSubtitleBlock` --- Styled text block / 带样式文本块
-   `FrameSubtitle` --- Timeline subtitle frame / 时间轴字幕帧
-   `BlockTiming` --- Block timing info / 块时间信息
-   `OverlappingFrameSubtitle` --- Overlapping subtitle frame / 重叠字幕帧
-   `AdjustMode` --- Adjust mode enum (WIDTH / HEIGHT_EXTEND / ADAPT) / 调整模式枚举
-   `VerticalAlignment` --- Vertical alignment enum (top / middle / bottom) / 垂直对齐枚举

------------------------------------------------------------------------

## Version History \| 版本历史

### 2.1.0

-   Add `render_text_image()` function with three adjust modes / 新增 `render_text_image()` 函数，支持三种调整模式
-   Add `AdjustMode` enum: `WIDTH`, `HEIGHT_EXTEND`, `ADAPT` / 新增 `AdjustMode` 枚举
-   Add `VerticalAlignment` enum: `top`, `middle`, `bottom` / 新增 `VerticalAlignment` 枚举
-   Add `min_width` field to `RenderConstraint` / `RenderConstraint` 新增 `min_width` 字段
-   Add `vertical_align` and `target_height` params to `render_colored_block()` / `render_colored_block()` 新增垂直对齐和目标高度参数
-   ADAPT mode: binary search font size (min 16px) → reduce line spacing → truncate / ADAPT 模式：二分缩小字号→缩小行距→截断

### 2.0.0

-   Add `max_blocks_per_frame` parameter to `text_to_overlapping_subtitles()` / 为 `text_to_overlapping_subtitles()` 添加每帧最大块数限制参数
-   Update README with Pipeline API usage examples / 更新 README 添加 Pipeline API 使用示例

### 1.4.4

-   Increase the size of the produced images / 增加生产的图片尺寸
-   Fix the issue of imprecise rendering / 修复渲染尺寸不精准的问题

### 1.4.1

-   Add stroke style / 增加描边样式

### 1.4.0

-   Overlapping Frame Subtitles / 可重叠帧时间轴字幕
-   Single Long Image Export / 单张长图字幕导出
-   `text_to_overlapping_subtitles()` API
-   `text_to_long_image()` API
-   `BlockTiming` and `OverlappingFrameSubtitle` models

### 1.3.0

-   Release code encryption / 发布代码加密


### 1.2.0

-   RTL Language Support / 从右到左语言支持 (Arabic, Hebrew)
-   Arabic reshaping for connected letters / 阿拉伯语字母连接
-   Bidirectional text rendering / 双向文本渲染
-   RTL-aware semantic break detection / RTL 感知的语义断点检测
-   Performance optimization with batch RTL processing / 批量 RTL 处理性能优化

### 1.1.0

-   Multi-language Support / 多语言支持 (19 languages)
-   Language-specific semantic splitting / 特定语言的语义拆分
-   Greek question mark support / 希腊语问号支持

### 1.0.0

-   Stable release / 稳定版本发布
-   O(n) incremental measurement / O(n) 增量测量算法
-   Newline support / 支持换行
-   Unlimited height mode / 无限高度模式
-   Paragraph combination support / 支持段落合并

------------------------------------------------------------------------

## License \| 许可证

MIT License

------------------------------------------------------------------------

## Todo / Roadmap

-   [ ] Chinese, Japanese and Korean language support / 中日韩语言支持
-   [ ] Block forced end marker / 区块强制结束标记
