gentrie package
Module contents
Package providing a generalized trie implementation.
This package includes classes and functions to create and manipulate a generalized trie data structure. Unlike common trie implementations that only support strings as keys, this generalized trie can handle various types of tokens, as long as they are hashable.
Usage:
Example 1:
from gentrie import GeneralizedTrie, TrieEntry trie = GeneralizedTrie() trie.add(['ape', 'green', 'apple']) trie.add(['ape', 'green']) matches: set[TrieEntry] = trie.prefixes(['ape', 'green']) print(matches)Example 1 Output:
{TrieEntry(ident=2, key=['ape', 'green'])}Example 2:
from gentrie import GeneralizedTrie, TrieEntry # Create a trie to store website URLs url_trie = GeneralizedTrie() # Add some URLs with different components (protocol, domain, path) url_trie.add(["https", "com", "example", "www", "/", "products", "clothing"]) url_trie.add(["http", "org", "example", "blog", "/", "2023", "10", "best-laptops"]) url_trie.add(["ftp", "net", "example", "ftp", "/", "data", "images"]) # Find all https URLs with "example.com" domain suffixes: set[TrieEntry] = url_trie.suffixes(["https", "com", "example"]) print(suffixes)Example 2 Output:
{TrieEntry(ident=1, key=['https', 'com', 'example', 'www', '/', 'products', 'clothing'])}}Example 3:
from gentrie import GeneralizedTrie, TrieEntry trie = GeneralizedTrie() trie.add('abcdef') trie.add('abc') trie.add('qrf') matches: set[TrieEntry] = trie.suffixes('ab') print(matches)Example 3 Output:
{TrieEntry(ident=2, key='abc'), TrieEntry(ident=1, key='abcdef')}
- gentrie.GeneralizedKey
A
GeneralizedKey
is an object of any class that is aSequence
and that when iterated returns tokens conforming to theHashable
protocol.Examples
- class gentrie.GeneralizedTrie
Bases:
object
A general purpose trie.
Unlike many trie implementations which only support strings as keys and token match only at the character level, it is agnostic as to the types of tokens used to key it and thus far more general purpose.
It requires only that the indexed tokens be hashable. This is verified at runtime using the
gentrie.Hashable
protocol.Tokens in a key do NOT have to all be the same type as long as they can be compared for equality.
It can handle a
Sequence
ofHashable
conforming objects as keys for the trie out of the box.You can ‘mix and match’ types of objects used as token in a key as long as they all conform to the
Hashable
protocol.The code emphasizes robustness and correctness.
Warning
GOTCHA: Using User Defined Classes As Tokens In Keys
Objects of user-defined classes are
Hashable
by default, but this will not work as naively expected. The hash value of an object is based on its memory address by default. This results in the hash value of an object changing every time the object is created and means that the object will not be found in the trie unless you have a reference to the original object.If you want to use a user-defined class as a token in a key to look up by value instead of the instance, you must implement the
__eq__()
and__hash__()
dunder methods in a content aware way (the hash and eq values must depend on the content of the object).- add(key: Sequence[Hashable | str]) int
Adds the key to the trie.
- Parameters:
key (GeneralizedKey) – Must be an object that can be iterated and that when iterated returns elements conforming to the
Hashable
protocol.- Raises:
InvalidGeneralizedKeyError ([GTA001]) – If key is not a valid
GeneralizedKey
.- Returns:
Id of the inserted key. If the key was already in the trie, it returns the id for the already existing entry.
- Return type:
- items() Generator[tuple[int, TrieEntry], None, None]
Returns an iterator for the trie.
The generator yields the
TrieId
andTrieEntry
for each key in the trie.- Returns:
Generator for the trie.
- Return type:
Generator[tuple[TrieId, TrieEntry], None, None]
- keys() Generator[int, None, None]
Returns an iterator for all the TrieId keys in the trie.
The generator yields the
TrieId
for each key in the trie.- Returns:
Generator for the trie.
- Return type:
Generator[TrieId, None, None]
- prefixes(key: Sequence[Hashable | str]) set[TrieEntry]
Returns a set of TrieEntry instances for all keys in the trie that are a prefix of the passed key.
Searches the trie for all keys that are prefix matches for the key and returns their TrieEntry instances as a set.
- Parameters:
key (GeneralizedKey) – Key for matching.
- Returns:
set
containing TrieEntry instances for keys that are prefixes of the key. This will be an empty set if there are no matches.- Return type:
set[TrieEntry]
- Raises:
InvalidGeneralizedKeyError ([GTM001]) – If key is not a valid
GeneralizedKey
(is not aSequence
ofHashable
objects).
Usage:
from gentrie import GeneralizedTrie, TrieEntry trie: GeneralizedTrie = GeneralizedTrie() keys: list[str] = ['abcdef', 'abc', 'a', 'abcd', 'qrs'] for entry in keys: trie.add(entry) matches: set[TrieEntry] = trie.prefixes('abcd') for trie_entry in sorted(list(matches)): print(f'{trie_entry.ident}: {trie_entry.key}') # 2: abc # 3: a # 4: abcd
- remove(ident: int) None
Remove the key with the passed ident from the trie.
- Parameters:
ident (TrieId) – id of the key to remove.
- Raises:
ValueError ([GTR002]) – if the ident arg is not a legal value.
KeyError ([GTR003]) – if the ident does not match the id of any keys.
- suffixes(key: Sequence[Hashable | str], depth: int = -1) set[TrieEntry]
Returns the ids of all suffixes of the trie_key up to depth.
Searches the trie for all keys that are suffix matches for the key up to the specified depth below the key match and returns their ids as a set.
- Parameters:
key (GeneralizedKey) – Key for matching.
depth (int, default=-1) – Depth starting from the matched key to include. The depth determines how many ‘layers’ deeper into the trie to look for suffixes.: * A depth of -1 (the default) includes ALL entries for the exact match and all children nodes. * A depth of 0 only includes the entries for the exact match for the key. * A depth of 1 includes entries for the exact match and the next layer down. * A depth of 2 includes entries for the exact match and the next two layers down.
- Returns:
Set of TrieEntry instances for keys that are suffix matches for the key. This will be an empty set if there are no matches.
- Return type:
set[TrieId]
- Raises:
InvalidGeneralizedKeyError ([GTS001]) – If key arg is not a GeneralizedKey.
TypeError ([GTS002]) – If depth arg is not an int.
ValueError ([GTS003]) – If depth arg is less than -1.
InvalidGeneralizedKeyError ([GTS004]) – If a token in the key arg does not conform to the
Hashable
protocol.
Usage:
from gentrie import GeneralizedTrie, TrieEntry trie = GeneralizedTrie() keys: list[str] = ['abcdef', 'abc', 'a', 'abcd', 'qrs'] for entry in keys: trie.add(entry) matches: set[TrieEntry] = trie.suffixes('abcd') for trie_entry in sorted(list(matches)): print(f'{trie_entry.ident}: {trie_entry.key}') # 1: abcdef # 4: abcd
- class gentrie.Hashable(*args, **kwargs)
Bases:
Protocol
Hashable
is a protocol that defines key tokens that are usable with aGeneralizedTrie
.The protocol requires that a token object be hashable. This means that it implements both an
__eq__()
method and a__hash__()
method.Some examples of built-in types suitable for use as tokens in a key:
Note: frozensets and tuples are only hashable if their contents are hashable.
User-defined classes are hashable by default.
Usage:
from gentrie import Hashable if isinstance(token, Hashable): print("token supports the Hashable protocol") else: print("token does not support the Hashable protocol")
- exception gentrie.InvalidGeneralizedKeyError
Bases:
TypeError
Raised when a key is not a valid
GeneralizedKey
object.This is a sub-class of
TypeError
.
- exception gentrie.InvalidHashableError
Bases:
TypeError
Raised when a token in a key is not a valid
Hashable
object.This is a sub-class of
TypeError
.
- class gentrie.TrieEntry(ident: int, key: Sequence[Hashable | str])
Bases:
NamedTuple
A
TrieEntry
is aNamedTuple
containing the unique identifer and key for an entry in the trie.- key: Sequence[Hashable | str]
GeneralizedKey
Key for an entry in the trie. Alias for field number 1.
- gentrie.TrieId
Unique identifier for a key in a trie.
- gentrie.is_generalizedkey(key: Sequence[Hashable | str]) bool
Tests key for whether it is a valid GeneralizedKey.
A valid
GeneralizedKey
is aSequence
that returnsHashable
protocol conformant objects when iterated. It must have at least one token.- Parameters:
key (GeneralizedKey) – Key for testing.
- Returns:
True
if a validGeneralizedKey
,False
otherwise.- Return type: