3.4. uniseg.linebreak — Line Break

Unicode line breaking algorithm.

UAX #14: Unicode Line Breaking Algorithm (Unicode 16.0.0) https://www.unicode.org/reports/tr14/tr14-53.html

uniseg.linebreak.line_break(c: str, index: int = 0, /) LineBreak

Return the Line_Break property for c.

c must be a single Unicode code point string.

>>> line_break('\x0d')
<LineBreak.CR: 'CR'>
>>> line_break(' ')
<LineBreak.SP: 'SP'>
>>> line_break('1')
<LineBreak.NU: 'NU'>
>>> line_break('\u1b44')
<LineBreak.VI: 'VI'>

If index is specified, this function consider c as a unicode string and return Line_Break property of the code point at c[index].

>>> line_break('a\x0d', 1)
<LineBreak.CR: 'CR'>
uniseg.linebreak.line_break_boundaries(s: str, legacy: bool = False, tailor: Callable[[str, Iterable[Literal[0, 1]]], Iterable[Literal[0, 1]]] | None = None) Iterator[int]

Iterate indices of the line breaking boundaries of s

This function yields from 0 to the end of the string (== len(s)).

uniseg.linebreak.line_break_breakables(s: str, legacy: bool = False, /) Iterable[Literal[0, 1]]

Iterate line breaking opportunities for every position of s

1 means “break” and 0 means “do not break” BEFORE the postion. The length of iteration will be the same as len(s).

>>> list(line_break_breakables('ABC'))
[0, 0, 0]
>>> list(line_break_breakables('Hello, world.'))
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
>>> list(line_break_breakables(''))
[]
uniseg.linebreak.line_break_units(s: str, legacy: bool = False, tailor: Callable[[str, Iterable[Literal[0, 1]]], Iterable[Literal[0, 1]]] | None = None, /) Iterator[str]

Iterate every line breaking token of s

>>> s = 'The quick (\u201cbrown\u201d) fox can\u2019t jump 32.3 feet, right?'
>>> '|'.join(line_break_units(s)) == 'The |quick |(\u201cbrown\u201d) |fox |can\u2019t |jump |32.3 |feet, |right?'
True
>>> list(line_break_units(''))
[]
>>> list(line_break_units('\u03b1\u03b1')) == ['\u03b1\u03b1']
True
>>> list(line_break_units('\u03b1\u03b1', True)) == ['\u03b1', '\u03b1']
True