API Reference¶
This section provides detailed information about the API of gen-trie. It includes descriptions of the main classes, methods, and attributes available in the library.
- class gentrie.GeneralizedTrie(runtime_validation: bool = True)¶
A general purpose trie.
Unlike many trie implementations which only support strings as keys and token match only at the character level, it is agnostic as to the types of tokens used to key it and thus far more general purpose.
It requires only that the indexed tokens be hashable. This is verified at runtime using the
gentrie.TrieKeyTokenprotocol.Tokens in a key do NOT have to all be the same type as long as they can be compared for equality.
It can handle a
SequenceofTrieKeyTokenconforming objects as keys for the trie out of the box.You can ‘mix and match’ types of objects used as token in a key as long as they all conform to the
TrieKeyTokenprotocol.The code emphasizes robustness and correctness.
Warning
GOTCHA: Using User Defined Classes As Tokens In Keys
Objects of user-defined classes are conformant with the
TrieKeyTokenprotocol by default, but this will not work as naively expected. The hash value of an object is based on its memory address by default. This results in the hash value of an object changing every time the object is created and means that the object will not be found in the trie unless you have a reference to the original object.If you want to use a user-defined class as a token in a key to look up by value instead of the instance, you must implement the
__eq__()and__hash__()dunder methods in a content aware way (the hash and eq values must depend on the content of the object).Tip
Using `dataclasses.dataclass` For Content-Aware User Defined Classes
A simple way to implement a user-defined class that is content aware hashable is to use the
dataclasses.dataclassdecorator using thefrozen=Trueandeq=Trueoptions . This will automatically implement appropriate__eq__()and__hash__()methods for you.Example of a content-aware user-defined class¶1from dataclasses import dataclass 2 3from gentrie import TrieKeyToken 4 5@dataclass(frozen=True, eq=True) 6class MyTokenClass: 7 name: str 8 value: int 9 10# Create an instance of the token class 11token = MyTokenClass(name="example", value=42) 12 13# Check if the token is hashable 14if isinstance(token, TrieKeyToken): 15 print("token is usable as a TrieKeyToken") 16else: 17 print("token is not usable as a TrieKeyToken")- add(key: Sequence[TrieKeyToken], value: Any | None = None) TrieId¶
Adds the key to the trie.
Warning
Keys Must Be Immutable
Once a key is added to the trie, neither the key sequence itself nor any of its constituent tokens should be mutated. Modifying a key after it has been added can corrupt the internal state of the trie, leading to unpredictable behavior and making entries unreachable. The trie does not create a deep copy of keys for performance reasons.
If you need to modify a key, you should remove the old key and add a new one with the modified value.
- Parameters:
key (GeneralizedKey) – Must be an object that can be iterated and that when iterated returns elements conforming to the
TrieKeyTokenprotocol.value (Optional[Any], default=None) – Optional value to associate with the key.
- Raises:
InvalidGeneralizedKeyError – If key is not a valid
GeneralizedKey.DuplicateKeyError – If the key is already in the trie.
- Returns:
Id of the inserted key. If the key was not in the trie, it returns the id of the new entry. If the key was already in the trie, it raises a
DuplicateKeyError.- Return type:
- get(key: TrieId | Sequence[TrieKeyToken], default: Any | None = None) TrieEntry | Any | None¶
Returns the
TrieEntryfor the ident or key with the passed identifier.The identifier can be either the
TrieId(ident) or theGeneralizedKey(key) for the entry.If the key is not found, it returns the default value if provided or None if not provided.
- Parameters:
Returns:
TrieEntry: TrieEntry for the key with the passed identifier or the default value if not found.
- items() Generator[tuple[TrieId, TrieEntry], None, None]¶
Returns an iterator for the trie.
The keys are the TrieIds and the values are the TrieEntry instances.
The generator yields the
TrieIdandTrieEntryfor each key in the trie.- Returns:
Generator for the trie.
- Return type:
Generator[tuple[TrieId, TrieEntry], None, None]
- keys() Generator[TrieId, None, None]¶
Returns an iterator for all the TrieIds in the trie.
The generator yields the
TrieIdfor each key in the trie.It returns TrieIds instead of GeneralizedKeys because TrieIds are
Faster: Lookups using TrieIds are O(1) for time regardless of the length of the GeneralizedKey they are associated with vs O(n) to the length of keys for operations using GeneralizedKeys to look up entries.
More efficient memory usage: TrieIds are typically smaller in size compared to GeneralizedKeys, leading to reduced memory overhead when storing and processing keys in the trie.
Guaranteed stable even with key modifications: TrieIds remain consistent even if the underlying GeneralizedKey changes, making them more reliable for long-term storage and retrieval.
- Returns:
Generator for the trie.
- Return type:
Generator[TrieId, None, None]
- prefixed_by(key: Sequence[TrieKeyToken], depth: int = -1) Iterator[TrieEntry]¶
Yields all entries in the trie that are prefixed by the given key, up to a specified depth.
Searches the trie for all keys that start with the provided key and yields their
TrieEntryinstances.Note
The prefixed_by method finds all keys that start with the given prefix. For example, trie.prefixed_by(‘app’) will find entries for keys like ‘apple’ and ‘application’.
Warning
GOTCHA: Generators
Because generators are not executed until the first iteration, they may not behave as expected if not consumed properly. For example, exceptions will not be raised until the generator is iterated over.
- Parameters:
key (GeneralizedKey) – Key for matching.
depth (int, default=-1) – Depth starting from the matched key to include. The depth determines how many ‘layers’ deeper into the trie to look for prefixed_by.: * A depth of -1 (the default) includes ALL entries for the exact match and all children nodes. * A depth of 0 only includes the entries for the exact match for the key. * A depth of 1 includes entries for the exact match and the next layer down. * A depth of 2 includes entries for the exact match and the next two layers down.
- Yields:
- Raises:
InvalidGeneralizedKeyError – If key arg is not a valid GeneralizedKey.
TrieTypeError – If depth arg is not an int.
TrieValueError – If depth arg is less than -1.
Usage:
from gentrie import GeneralizedTrie, TrieEntry trie = GeneralizedTrie() keys: list[str] = ['abcdef', 'abc', 'a', 'abcd', 'qrs'] for entry in keys: trie.add(entry) matches_generator = trie.prefixed_by('abcd') for trie_entry in sorted(list(matches_generator)): print(f'{trie_entry.ident}: {trie_entry.key}') # 1: abcdef # 4: abcd
- prefixes(key: Sequence[TrieKeyToken]) Iterator[TrieEntry]¶
Yields TrieEntry instances for all keys in the trie that are a prefix of the passed key.
Searches the trie for all keys that are prefix matches for the key and yields their TrieEntry instances.
Note
The prefixes method finds all keys that are prefixes of the passed key. For example, trie.prefixes(‘apple’) will find entries for keys like ‘a’, ‘apple’ and ‘app’.
Warning
GOTCHA: Generators
Because generators are not executed until the first iteration, they may not behave as expected if not consumed properly. For example, exceptions will not be raised until the generator is iterated over.
- Parameters:
key (GeneralizedKey) – Key for matching.
- Yields:
- Raises:
InvalidGeneralizedKeyError – If key is not a valid
GeneralizedKey(is not aSequenceofTrieKeyTokenobjects).
Usage:
from gentrie import GeneralizedTrie, TrieEntry trie: GeneralizedTrie = GeneralizedTrie() keys: list[str] = ['abcdef', 'abc', 'a', 'abcd', 'qrs'] for entry in keys: trie.add(entry) matches_generator: Generator[TrieEntry, None, None] = trie.prefixes('abcd') for trie_entry in sorted(list(matches_generator)): print(f'{trie_entry.ident}: {trie_entry.key}') # 2: abc # 3: a # 4: abcd
- remove(key: TrieId | Sequence[TrieKeyToken]) None¶
Remove the specified key from the trie.
Removes the key from the trie. If the key is not found, it raises a KeyError. The key can be specified either as a
TrieIdor as aGeneralizedKey.- Parameters:
key (TrieId | GeneralizedKey) – identifier for key to remove.
- Raises:
TrieTypeError – if the key arg is not a
TrieIdor a validGeneralizedKey.TrieKeyError – if the key arg does not match the id or trie key of any entries in the trie.
- update(key: Sequence[TrieKeyToken], value: Any | None = None) TrieId¶
Updates the key/value pair in the trie.
Warning
Keys Must Be Immutable
Once a key is added to the trie, neither the key sequence itself nor any of its constituent tokens should be mutated. Modifying a key after it has been added can corrupt the internal state of the trie, leading to unpredictable behavior and making entries unreachable. The trie does not create a deep copy of keys for performance reasons.
If you need to modify a key, you should remove the old key and add a new one with the modified value.
- Parameters:
key (GeneralizedKey) – Must be an object that can be iterated and that when iterated returns elements conforming to the
TrieKeyTokenprotocol.value (Optional[Any], default=None) – Optional value to associate with the key.
- Raises:
InvalidGeneralizedKeyError – If key is not a valid
GeneralizedKey.- Returns:
Id of the inserted key. If the key was already in the trie with the same value it returns the id for the already existing entry. If the key was not already in the trie, it returns the id for a new entry.
- Return type:
- class gentrie.types.TrieEntry(ident: TrieId, key: Sequence[TrieKeyToken], value: Any | None = None)¶
A
TrieEntryis aNamedTuplecontaining the unique identifer and key for an entry in the trie.
- class gentrie.exceptions.InvalidTrieKeyTokenError(msg: str, tag: ErrorTag)¶
Raised when a token in a key is not a valid
TrieKeyTokenobject.This is a sub-class of
TrieTypeError.
- class gentrie.exceptions.InvalidGeneralizedKeyError(msg: str, tag: ErrorTag)¶
Raised when a key is not a valid
GeneralizedKeyobject.This is a sub-class of
TrieTypeError.
- class gentrie.exceptions.DuplicateKeyError(msg: str, tag: ErrorTag)¶
Raised when an attempt is made to add a key that is already in the trie with a different associated value.
This is a sub-class of
TrieKeyError.
- gentrie.protocols.GeneralizedKey¶
A
GeneralizedKeyis an object of any class that is aSequenceand that when iterated returns tokens conforming to theTrieKeyTokenprotocol.Warning
Keys Must Be Immutable
Once a key is added to the trie, neither the key sequence itself nor any of its constituent tokens should be mutated. Modifying a key after it has been added can corrupt the internal state of the trie, leading to unpredictable behavior and making entries unreachable. The trie does not create a deep copy of keys for performance reasons.
If you need to modify a key, you should remove the old key and add a new one with the modified value.
Examples of valid :class:`GeneralizedKey` types
- class gentrie.protocols.TrieKeyToken(*args, **kwargs)¶
TrieKeyTokenis a protocol that defines key tokens that are usable with aGeneralizedTrie.The protocol requires that a token object be hashable. This means that it implements both an
__eq__()method and a__hash__()method.Some examples of built-in types that are suitable for use as tokens in a key:
Note: frozensets and tuples are only hashable if their contents are hashable.
Usage:
1from gentrie import TrieKeyToken 2 3token = SomeTokenClass() 4if isinstance(token, TrieKeyToken): 5 print("supports the TrieKeyToken protocol") 6else: 7 print("does not support the TrieKeyToken protocol")Warning
Using User Defined Classes As Tokens In Keys
User-defined classes are hashable by default, but you should implement the
__eq__()and__hash__()dunder methods in a content-aware way (the hash and eq values must depend on the content of the object) if you want to use them as tokens in a key. The default implementation of__eq__()and__hash__()uses the memory address of the object, which means that two different instances of the same class will not be considered equal.