Metadata-Version: 2.4
Name: stream-replace
Version: 0.1.0
Summary: Streaming text replacement for AI token streams — handles partial matches across chunk boundaries
License-Expression: MIT
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Text Processing :: Filters
Classifier: Typing :: Typed
Requires-Python: >=3.13
Description-Content-Type: text/markdown

# stream-replace

Streaming text replacement for AI token streams — correctly handles partial matches across chunk boundaries.

## Install

```bash
pip install stream-replace
```

## Quick Start

```python
import re
from stream_replace import Replacer

r = Replacer([
    ("敏感词", "***"),                                        # string → string
    ("secret", lambda s: s[0] + "***"),                       # string → callable
    (re.compile(r"1[3-9]\d{9}"), "[PHONE]"),                  # regex  → string
    (re.compile(r"<think>[\s\S]*?</think>"), ""),              # regex  → remove
    (re.compile(r"(\d+)"), lambda m: str(int(m.group()) * 2)), # regex  → callable
])

for chunk in ai_stream:
    safe_text = r.feed(chunk)
    print(safe_text, end="")

print(r.flush(), end="")
```

## Why?

AI models stream tokens incrementally. A word you want to replace may be split across chunks:

```
chunk 1: "hel"
chunk 2: "lo world"
```

Naive per-chunk replacement would miss `"hello"`. **stream-replace** buffers just enough text at chunk boundaries to detect partial matches, while emitting safe text as early as possible.

## API

### `Replacer(rules)`

Create a replacer with a list of `(pattern, replacement)` tuples.

| Pattern | Replacement | Description |
|---|---|---|
| `str` | `str` | Exact string replacement |
| `str` | `callable(matched_str) → str` | Dynamic string replacement |
| `re.Pattern` | `str` | Regex replacement (supports `\1` backrefs) |
| `re.Pattern` | `callable(re.Match) → str` | Dynamic regex replacement |

#### `r.feed(chunk: str) → str`

Process one incoming chunk. Returns text that is safe to emit (fully resolved, no pending partial matches).

#### `r.flush() → str`

Flush the internal buffer after the stream ends. Must be called once to get any remaining text.

#### `r.reset()`

Clear internal state so the replacer can be reused for another stream.

#### `r.wrap(iterable) → Iterable[str]`

Convenience wrapper for a sync chunk stream. Handles `feed` + `flush` automatically.

```python
for text in r.wrap(chunks):
    print(text, end="")
```

#### `r.wrap_async(async_iterable) → AsyncIterable[str]`

Same as `wrap`, but for async iterables.

```python
async for text in r.wrap_async(async_chunks):
    print(text, end="")
```

### Functional API

For one-off use without creating a `Replacer` instance:

```python
from stream_replace import stream_replace, astream_replace

# sync
for text in stream_replace(chunks, [("hello", "world")]):
    print(text, end="")

# async
async for text in astream_replace(async_chunks, [("hello", "world")]):
    print(text, end="")
```

## How It Works

1. **Buffer**: Incoming chunks accumulate in an internal buffer.
2. **Match**: On each `feed()`, the buffer is scanned for complete matches across all rules. The earliest match wins.
3. **Replace**: Matched text is replaced; scanning continues from after the replacement.
4. **Hold back**: After all matches, the buffer tail is checked for *potential* partial matches (a suffix that could be the start of a pattern). This tail is held back for the next `feed()`.
5. **Flush**: On `flush()`, the remaining buffer is processed without holding anything back.

For regex rules, the library automatically extracts literal prefixes from the pattern (e.g., `"<think>"` from `r"<think>[\s\S]*?</think>"`) to detect both partial prefix matches and open-but-unclosed matches spanning multiple chunks.

## License

MIT
