Coverage for src/gentrie/__init__.py: 100%
6 statements
« prev ^ index » next coverage.py v7.6.10, created at 2025-08-17 12:20 -0700
« prev ^ index » next coverage.py v7.6.10, created at 2025-08-17 12:20 -0700
1# -*- coding: utf-8 -*-
2"""Package providing a generalized trie implementation.
4This package includes classes and functions to create and manipulate a generalized trie
5data structure. Unlike common trie implementations that only support strings as keys,
6this generalized trie can handle various types of tokens, as long as they are hashable.
8.. warning:: **gentrie IS NOT thread-safe**
10 It is not designed for concurrent use. If you need a thread-safe trie, you must
11 use an external locking mechanism to ensure that either only read-only threads are
12 accessing the trie at the same time or that only one write thread and no read threads
13 are accessing the trie at the same time.
15 One way to do this is to create TWO instances of the trie, one for read-only access and one for write access.
17 The read-only instance can be used by multiple threads at the same time, while the write instance can
18 be used by only one thread at a time. After a write operation, a new read-only instance can be created
19 from the write instance by forking it using :func:`copy.deepcopy()` which can then be used by the
20 read-only threads.
22Usage
23=======
25You can create a trie using the `GeneralizedTrie` class:
27.. code-block:: python
28 :linenos:
30 from gentrie import GeneralizedTrie
32 # Create a new trie instance
33 trie = GeneralizedTrie()
35There are three ways to add entries:
371. Using the `trie[key] = value` syntax
39This allows you to assign a value directly to a key and will create a
40new TrieEntry if the key does not already exist. If the key already exists,
41it will update the value associated with that key.
43.. code-block:: python
44 :linenos:
45 :caption: Examples of using the `trie[key] = value` syntax
47 # Assigns 'value' to 'key'
48 # (tokenized as characters 'k','e','y')
49 trie['key'] = 'value'
51 # Assigns value 'value2' to key 'another_key' (tokenized as
52 # 'a','n','o','t','h','e','r','_','k','e','y','2')
53 trie['another_key'] = 'another_value'
55 # Changes the value for 'key' (tokenized as 'k', 'e', 'y')
56 # to 'new_value'
57 trie['key'] = 'new_value'
59 # Assigns a tuple of int (tokenized as
60 # 128, 96, 160, 0) as a key with the value'value5'
61 trie[(128, 96, 160, 0)] = 'value5'
63 # Assigns a tuple with mixed value types (tokenized
64 # as 128, 'a') as a key with the value 'value5b'
65 trie[(128, 'a')] = 'value5b'
67 # Assigns a list of words (tokenized as 'hello', 'world')
68 # as a key with the value 'value6'
69 trie[['hello', 'world']] = 'value6'
712. Using the `trie.add(key, value)` method
73 This method adds a new entry to the trie and returns the TrieId
74 for the new entry. If the key already exists, it will throw an
75 error. The value argument is optional, and if not provided
76 the entry will be created with a value of `None`.
783. Using the `trie.update(key, value)` method
80 This method adds a new entry or updates an existing entry,
81 returning the TrieId for the entry and returns the TrieId of the entry.
83 This is the same as using the `trie[key] = value` syntax,
84 but it is more explicit about the intention to update or add
85 an entry and returns the TrieId of the entry.
87 The value argument is optional, and if not provided, the
88 entry will be created or updated with a value of `None`.
90You can use the `in` operator to check if a key exists in the trie,
91e.g., `if key in trie:`. This will return `True` if the key exists,
92and `False` otherwise.
94There are two ways to directly retrieve entries using their keys:
961. Using the `trie[key | TrieId]` syntax.
98 This retrieves the `TrieEntry` associated with the key or TrieId. It
99 will raise a `KeyError` if the key/TrieId does not exist. It will raise
100 a `TypeError` if the key is not either a valid `TrieId` or `GeneralizedKey`.
102 The returned `TrieEntry` contains the key, value, and an identifier
103 (ident - of type `TrieId`) that uniquely identifies the entry in the trie.
1052. Using the `trie.get(key | TrieId, [default])` method
106 This retrieves the `TrieEntry` associated with the key or TrieId,
107 returning `None` if the key/TrieId does not exist. This could be
108 preferable in cases where you want to avoid exceptions
109 for missing keys although it cannot distinguish between a key
110 that does not exist and a key that exists with a `None` value
111 in the trie by default.
113 You *can* provide a different default value to return if the key
114 does not exist, which can be useful for handling cases where
115 you want to return a specific value instead of `None`.
117You can also retrieve all entries that are prefixed by or prefixes for a given key:
119- `trie.prefixed_by(key)` returns a Generator of `TrieEntry` objects that
120 are prefixed_by of the given key.
121- `trie.prefixes(key)` returns a Generator of `TrieEntry` objects that
122 are prefixes of the given key.
124These methods are useful for searching and retrieving entries
125that match a specific pattern or structure in the trie.
127Example 1 - Basic Usage
128------------------------
130.. code-block:: python
131 :linenos:
133 from gentrie import GeneralizedTrie, TrieEntry
135 trie = GeneralizedTrie()
136 trie.add(['ape', 'green', 'apple'])
137 trie.add(['ape', 'green'])
138 matches: set[TrieEntry] = set(trie.prefixes(['ape', 'green']))
139 print(matches)
141Value of 'matches'::
143 {TrieEntry(ident=2, key=['ape', 'green'], value=None)}
145Example 2 - Trie of URLs
146--------------------------
148.. code-block:: python
149 :linenos:
151 from gentrie import GeneralizedTrie, TrieEntry
153 # Create a trie to store website URLs
154 url_trie = GeneralizedTrie()
156 # Add some URLs with different components (protocol, domain, path)
157 url_trie.add(["https", "com", "example", "www", "/", "products", "clothing"], value="Clothing Store")
158 url_trie.add(["http", "org", "example", "blog", "/", "2023", "10", "best-laptops"], value="Best Laptops 2023")
159 url_trie.add(["ftp", "net", "example", "ftp", "/", "data", "images"], value="FTP Data Images")
161 # Find all https URLs with "example.com" domain
162 prefixed_by: set[TrieEntry] = set(url_trie.prefixed_by(["https", "com", "example"]))
163 print(prefixed_by)
165Value of 'prefixed_by'::
167 {
168 TrieEntry(
169 ident=1,
170 key=['https', 'com', 'example', 'www', '/', 'products', 'clothing'],
171 value='Clothing Store')
172 }
174Example 3 - Entries prefixed by a key
175-------------------------------------
177.. code-block:: python
178 :linenos:
180 from gentrie import GeneralizedTrie, TrieEntry
182 trie = GeneralizedTrie()
183 trie.add('abcdef')
184 trie.add('abc')
185 trie.add('qrf')
186 matches: set[TrieEntry] = set(trie.prefixed_by('ab'))
187 print(matches)
189Value of 'matches'::
191 {
192 TrieEntry(ident=2, key='abc', value=None),
193 TrieEntry(ident=1, key='abcdef', value=None)
194 }
196"""
198# Import all public classes and functions
199from .exceptions import (
200 ErrorTag,
201 InvalidTrieKeyTokenError,
202 InvalidGeneralizedKeyError,
203 DuplicateKeyError,
204 TrieKeyError,
205 TrieTypeError,
206 TrieValueError
207)
208from .protocols import TrieKeyToken, Hashable, GeneralizedKey
209from .types import TrieId, TrieEntry, TRIE_IDENT, TRIE_KEY, TRIE_VALUE
210from .validation import is_triekeytoken, is_hashable, is_generalizedkey
211from .trie import GeneralizedTrie
213__all__ = [
214 # Core classes
215 'GeneralizedTrie',
216 'TrieEntry',
217 'TrieId',
219 # Protocols and types
220 'TrieKeyToken',
221 'GeneralizedKey',
222 'Hashable', # deprecated
224 # Validation functions
225 'is_generalizedkey',
226 'is_triekeytoken',
227 'is_hashable', # deprecated
229 # Exceptions
230 'ErrorTag',
231 'InvalidTrieKeyTokenError',
232 'InvalidGeneralizedKeyError',
233 'DuplicateKeyError',
234 'TrieKeyError',
235 'TrieTypeError',
236 'TrieValueError',
238 # Constants
239 'TRIE_IDENT',
240 'TRIE_KEY',
241 'TRIE_VALUE',
242]