Metadata-Version: 2.4
Name: rwkv-emb
Version: 0.0.4
Summary: The EmbeddingRWKV Model
Author: Haowen HOU
Project-URL: Homepage, https://github.com/howard-hou/EmbeddingRWKV
Project-URL: Bug Tracker, https://github.com/howard-hou/EmbeddingRWKV/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

The EmbeddingRWKV Model

https://github.com/howard-hou/EmbeddingRWKV

## Tokenizer

The package ships with a byte-level trie tokenizer defined in `rwkv_emb.reference.utils.TRIE_TOKENIZER`, using the vocabulary file `rwkv_emb/reference/rwkv_vocab_v20230424.txt`. The tokenizer works on UTF-8 bytes and can be used to convert raw text into token IDs for the embedding model.

```python
import os
import rwkv_emb.reference

#
reference_dir = os.path.dirname(os.path.abspath(rwkv_emb.reference.__file__))

#
vocab_path = os.path.join(reference_dir, "rwkv_vocab_v20230424.txt")
from rwkv_emb.reference.utils import TRIE_TOKENIZER

tokenizer = TRIE_TOKENIZER(vocab_path)

text = "hello world"
tokens = tokenizer.encode(text)

EOS_INDEX = 65535
tokens_with_eos = tokens + [EOS_INDEX]
```

The `encode` method returns a list of integers. For embedding inference, append the end-of-sequence token (`65535`) to mark completion before feeding the tokens to the model.

```python
# !!! set these before import RWKV !!!
import os

os.environ["RWKV_CUDA_ON"] = '1'  # '1' to compile CUDA kernel (10x faster), requires c++ compiler & cuda libraries

from rwkv_emb.model import EmbeddingRWKV

EOS_INDEX = 65535

# download models: to be announced
model = EmbeddingRWKV(model_path='path-to-model')

# !!! model.forward(tokens, state) will modify state in-place !!!
# single-sample inference
emb, state = model.forward([187, 510, 1563, 310, 247, EOS_INDEX], None)
print(emb.detach().cpu().numpy())                   # get logits

# streaming a single sequence
emb, state = model.forward([187, 510], None)
emb, state = model.forward([1563], state)           # RNN has state (use deepcopy to clone states)
emb, state = model.forward([310, 247, EOS_INDEX], state)
print(emb.detach().cpu().numpy())                   # same result as above

# batch inference (all sequences must share the same length)
batch_tokens = [
    [187, 510, 1563, 310],
    [247, EOS_INDEX, 187, 310],
]
emb_batch, batch_state = model.forward(batch_tokens, None, full_output=False)
print(emb_batch.detach().cpu().numpy())             # shape: [batch, n_vocab]
print(len(batch_state), batch_state[-2].shape)      # batched state shapes
print('\n')
```
