Metadata-Version: 2.1
Name: flash-tokenizer
Version: 0.1.0
Summary: Flash BERT tokenizer implementation with C++ backend
Author-email: spring <springnode@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/springkim/flash-tokenizer
Project-URL: Issues, https://github.com/springkim/flash-tokenizer/issues
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE

# flash-tokenizer

Flash BERT tokenizer implementation with C++ backend.

## Installation

```bash
pip install flash-tokenizer
```

Or install from source:

```bash
git clone https://github.com/springkim/flash-tokenizer.git
cd flash-tokenizer
pip install .
```

## Usage

```python
from flash_tokenizer import FlashBertTokenizer

# Initialize the tokenizer with a vocabulary file
tokenizer = FlashBertTokenizer("path/to/vocab.txt", do_lower_case=True)

# Tokenize text
tokens = tokenizer.tokenize("Hello, world!")
print(tokens)

# Convert tokens to IDs
ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)

# Or use the tokenizer directly
ids = tokenizer("Hello, world!")
print(ids)
```
