Coverage for src/gentrie/__init__.py: 100%

6 statements  

« prev     ^ index     » next       coverage.py v7.6.10, created at 2025-08-17 12:20 -0700

1# -*- coding: utf-8 -*- 

2"""Package providing a generalized trie implementation. 

3 

4This package includes classes and functions to create and manipulate a generalized trie 

5data structure. Unlike common trie implementations that only support strings as keys, 

6this generalized trie can handle various types of tokens, as long as they are hashable. 

7 

8.. warning:: **gentrie IS NOT thread-safe** 

9 

10 It is not designed for concurrent use. If you need a thread-safe trie, you must 

11 use an external locking mechanism to ensure that either only read-only threads are 

12 accessing the trie at the same time or that only one write thread and no read threads 

13 are accessing the trie at the same time. 

14 

15 One way to do this is to create TWO instances of the trie, one for read-only access and one for write access. 

16 

17 The read-only instance can be used by multiple threads at the same time, while the write instance can 

18 be used by only one thread at a time. After a write operation, a new read-only instance can be created 

19 from the write instance by forking it using :func:`copy.deepcopy()` which can then be used by the 

20 read-only threads. 

21 

22Usage 

23======= 

24 

25You can create a trie using the `GeneralizedTrie` class: 

26 

27.. code-block:: python 

28 :linenos: 

29 

30 from gentrie import GeneralizedTrie 

31 

32 # Create a new trie instance 

33 trie = GeneralizedTrie() 

34 

35There are three ways to add entries: 

36 

371. Using the `trie[key] = value` syntax 

38 

39This allows you to assign a value directly to a key and will create a 

40new TrieEntry if the key does not already exist. If the key already exists, 

41it will update the value associated with that key. 

42 

43.. code-block:: python 

44 :linenos: 

45 :caption: Examples of using the `trie[key] = value` syntax 

46 

47 # Assigns 'value' to 'key' 

48 # (tokenized as characters 'k','e','y') 

49 trie['key'] = 'value' 

50 

51 # Assigns value 'value2' to key 'another_key' (tokenized as 

52 # 'a','n','o','t','h','e','r','_','k','e','y','2') 

53 trie['another_key'] = 'another_value' 

54 

55 # Changes the value for 'key' (tokenized as 'k', 'e', 'y') 

56 # to 'new_value' 

57 trie['key'] = 'new_value' 

58 

59 # Assigns a tuple of int (tokenized as 

60 # 128, 96, 160, 0) as a key with the value'value5' 

61 trie[(128, 96, 160, 0)] = 'value5' 

62 

63 # Assigns a tuple with mixed value types (tokenized 

64 # as 128, 'a') as a key with the value 'value5b' 

65 trie[(128, 'a')] = 'value5b' 

66 

67 # Assigns a list of words (tokenized as 'hello', 'world') 

68 # as a key with the value 'value6' 

69 trie[['hello', 'world']] = 'value6' 

70 

712. Using the `trie.add(key, value)` method 

72 

73 This method adds a new entry to the trie and returns the TrieId 

74 for the new entry. If the key already exists, it will throw an 

75 error. The value argument is optional, and if not provided 

76 the entry will be created with a value of `None`. 

77 

783. Using the `trie.update(key, value)` method 

79 

80 This method adds a new entry or updates an existing entry, 

81 returning the TrieId for the entry and returns the TrieId of the entry. 

82 

83 This is the same as using the `trie[key] = value` syntax, 

84 but it is more explicit about the intention to update or add 

85 an entry and returns the TrieId of the entry. 

86 

87 The value argument is optional, and if not provided, the 

88 entry will be created or updated with a value of `None`. 

89 

90You can use the `in` operator to check if a key exists in the trie, 

91e.g., `if key in trie:`. This will return `True` if the key exists, 

92and `False` otherwise. 

93 

94There are two ways to directly retrieve entries using their keys: 

95 

961. Using the `trie[key | TrieId]` syntax. 

97 

98 This retrieves the `TrieEntry` associated with the key or TrieId. It 

99 will raise a `KeyError` if the key/TrieId does not exist. It will raise 

100 a `TypeError` if the key is not either a valid `TrieId` or `GeneralizedKey`. 

101 

102 The returned `TrieEntry` contains the key, value, and an identifier 

103 (ident - of type `TrieId`) that uniquely identifies the entry in the trie. 

104 

1052. Using the `trie.get(key | TrieId, [default])` method 

106 This retrieves the `TrieEntry` associated with the key or TrieId, 

107 returning `None` if the key/TrieId does not exist. This could be 

108 preferable in cases where you want to avoid exceptions 

109 for missing keys although it cannot distinguish between a key 

110 that does not exist and a key that exists with a `None` value 

111 in the trie by default. 

112 

113 You *can* provide a different default value to return if the key 

114 does not exist, which can be useful for handling cases where 

115 you want to return a specific value instead of `None`. 

116 

117You can also retrieve all entries that are prefixed by or prefixes for a given key: 

118 

119- `trie.prefixed_by(key)` returns a Generator of `TrieEntry` objects that 

120 are prefixed_by of the given key. 

121- `trie.prefixes(key)` returns a Generator of `TrieEntry` objects that 

122 are prefixes of the given key. 

123 

124These methods are useful for searching and retrieving entries 

125that match a specific pattern or structure in the trie. 

126 

127Example 1 - Basic Usage 

128------------------------ 

129 

130.. code-block:: python 

131 :linenos: 

132 

133 from gentrie import GeneralizedTrie, TrieEntry 

134 

135 trie = GeneralizedTrie() 

136 trie.add(['ape', 'green', 'apple']) 

137 trie.add(['ape', 'green']) 

138 matches: set[TrieEntry] = set(trie.prefixes(['ape', 'green'])) 

139 print(matches) 

140 

141Value of 'matches':: 

142 

143 {TrieEntry(ident=2, key=['ape', 'green'], value=None)} 

144 

145Example 2 - Trie of URLs 

146-------------------------- 

147 

148.. code-block:: python 

149 :linenos: 

150 

151 from gentrie import GeneralizedTrie, TrieEntry 

152 

153 # Create a trie to store website URLs 

154 url_trie = GeneralizedTrie() 

155 

156 # Add some URLs with different components (protocol, domain, path) 

157 url_trie.add(["https", "com", "example", "www", "/", "products", "clothing"], value="Clothing Store") 

158 url_trie.add(["http", "org", "example", "blog", "/", "2023", "10", "best-laptops"], value="Best Laptops 2023") 

159 url_trie.add(["ftp", "net", "example", "ftp", "/", "data", "images"], value="FTP Data Images") 

160 

161 # Find all https URLs with "example.com" domain 

162 prefixed_by: set[TrieEntry] = set(url_trie.prefixed_by(["https", "com", "example"])) 

163 print(prefixed_by) 

164 

165Value of 'prefixed_by':: 

166 

167 { 

168 TrieEntry( 

169 ident=1, 

170 key=['https', 'com', 'example', 'www', '/', 'products', 'clothing'], 

171 value='Clothing Store') 

172 } 

173 

174Example 3 - Entries prefixed by a key 

175------------------------------------- 

176 

177.. code-block:: python 

178 :linenos: 

179 

180 from gentrie import GeneralizedTrie, TrieEntry 

181 

182 trie = GeneralizedTrie() 

183 trie.add('abcdef') 

184 trie.add('abc') 

185 trie.add('qrf') 

186 matches: set[TrieEntry] = set(trie.prefixed_by('ab')) 

187 print(matches) 

188 

189Value of 'matches':: 

190 

191 { 

192 TrieEntry(ident=2, key='abc', value=None), 

193 TrieEntry(ident=1, key='abcdef', value=None) 

194 } 

195 

196""" 

197 

198# Import all public classes and functions 

199from .exceptions import ( 

200 ErrorTag, 

201 InvalidTrieKeyTokenError, 

202 InvalidGeneralizedKeyError, 

203 DuplicateKeyError, 

204 TrieKeyError, 

205 TrieTypeError, 

206 TrieValueError 

207) 

208from .protocols import TrieKeyToken, Hashable, GeneralizedKey 

209from .types import TrieId, TrieEntry, TRIE_IDENT, TRIE_KEY, TRIE_VALUE 

210from .validation import is_triekeytoken, is_hashable, is_generalizedkey 

211from .trie import GeneralizedTrie 

212 

213__all__ = [ 

214 # Core classes 

215 'GeneralizedTrie', 

216 'TrieEntry', 

217 'TrieId', 

218 

219 # Protocols and types 

220 'TrieKeyToken', 

221 'GeneralizedKey', 

222 'Hashable', # deprecated 

223 

224 # Validation functions 

225 'is_generalizedkey', 

226 'is_triekeytoken', 

227 'is_hashable', # deprecated 

228 

229 # Exceptions 

230 'ErrorTag', 

231 'InvalidTrieKeyTokenError', 

232 'InvalidGeneralizedKeyError', 

233 'DuplicateKeyError', 

234 'TrieKeyError', 

235 'TrieTypeError', 

236 'TrieValueError', 

237 

238 # Constants 

239 'TRIE_IDENT', 

240 'TRIE_KEY', 

241 'TRIE_VALUE', 

242]