gentrie package

Subpackages

Submodules

gentrie.exceptions module

Custom exceptions for the gentrie package.

exception gentrie.exceptions.DuplicateKeyError(msg: str, tag: ErrorTag)

Bases: TrieKeyError

Raised when an attempt is made to add a key that is already in the trie with a different associated value.

This is a sub-class of TrieKeyError.

class gentrie.exceptions.ErrorTag(value)

Bases: str, Enum

Tags for error path identification for tests for the gentrie packages.

ErrorTags’ are used to identify specific error conditions in the gentrie package. Tests use these tags to assert specific error condition paths.

GETITEM_ID_NOT_FOUND = 'GETITEM_ID_NOT_FOUND'
GETITEM_INVALID_KEY_TYPE = 'GETITEM_INVALID_KEY_TYPE'
GETITEM_KEY_NOT_FOUND = 'GETITEM_KEY_NOT_FOUND'
GETITEM_NOT_TERMINAL = 'GETITEM_NOT_TERMINAL'
REMOVAL_INVALID_KEY_TYPE = 'REMOVAL_INVALID_KEY_TYPE'
REMOVAL_KEY_NOT_FOUND = 'REMOVAL_KEY_NOT_FOUND'
STORE_ENTRY_DUPLICATE_KEY = 'STORE_ENTRY_DUPLICATE_KEY'
STORE_ENTRY_INVALID_GENERALIZED_KEY = 'STORE_ENTRY_INVALID_GENERALIZED_KEY'
TRIE_PREFIXED_BY_BAD_DEPTH_TYPE = 'TRIE_PREFIXED_BY_BAD_DEPTH_TYPE'
TRIE_PREFIXED_BY_BAD_DEPTH_VALUE = 'TRIE_PREFIXED_BY_BAD_DEPTH_VALUE'
TRIE_PREFIXED_BY_INVALID_GENERALIZED_KEY = 'TRIE_PREFIXED_BY_INVALID_GENERALIZED_KEY'
TRIE_PREFIXES_INVALID_GENERALIZED_KEY = 'TRIE_PREFIXES_INVALID_GENERALIZED_KEY'
exception gentrie.exceptions.InvalidGeneralizedKeyError(msg: str, tag: ErrorTag)

Bases: TrieTypeError

Raised when a key is not a valid GeneralizedKey object.

This is a sub-class of TrieTypeError.

exception gentrie.exceptions.InvalidTrieKeyTokenError(msg: str, tag: ErrorTag)

Bases: TrieTypeError

Raised when a token in a key is not a valid TrieKeyToken object.

This is a sub-class of TrieTypeError.

exception gentrie.exceptions.TrieKeyError(msg: str, tag: ErrorTag)

Bases: KeyError

Base class for all trie-related key errors.

It differs from a standard KeyError by the addition of a tag code used to very specifically identify where the error was thrown in the code for testing and development support.

This tag code does not have a direct semantic meaning except to identify the specific code throwing the exception for tests.

Parameters:
  • msg (str) – The error message.

  • tag (ErrorTag) – The tag code.

exception gentrie.exceptions.TrieTypeError(msg: str, tag: ErrorTag)

Bases: TypeError

Base class for all trie-related type errors.

It differs from a standard TypeError by the addition of a tag code used to very specifically identify where the error was thrown in the code for testing and development support.

This tag code does not have a direct semantic meaning except to identify the specific code throwing the exception for tests.

Parameters:
  • msg (str) – The error message.

  • tag (ErrorTag) – The tag code.

exception gentrie.exceptions.TrieValueError(msg: str, tag: ErrorTag)

Bases: ValueError

Base class for all trie-related value errors.

It differs from a standard ValueError by the addition of a tag code used to very specifically identify where the error was thrown in the code for testing and development support.

This tag code does not have a direct semantic meaning except to identify the specific code throwing the exception for tests.

Parameters:
  • msg (str) – The error message.

  • tag (ErrorTag) – The tag code.

gentrie.nodes module

Internal node implementation for the gentrie package.

class gentrie.nodes.Node(token: TrieKeyToken, parent: GeneralizedTrie | Node, value: Any | None = None)

Bases: object

A node in the trie.

A node is a container for a key in the trie. It has a unique identifier and a reference to the key.

ident

Unique identifier for the key.

Type:

TrieId

token

Token for the key.

Type:

TrieKeyToken

parent

Reference to the parent node.

Type:

Optional[GeneralizedTrie | Node

children

Dictionary of child nodes.

Type:

dict[TrieKeyToken, Node]

children: dict[TrieKeyToken, 'Node']
ident: TrieId | None
parent: 'GeneralizedTrie | Node' | None
token: TrieKeyToken
value: Any | None

gentrie.protocols module

Protocols and type definitions for the gentrie package.

gentrie.protocols.GeneralizedKey

A GeneralizedKey is an object of any class that is a Sequence and that when iterated returns tokens conforming to the TrieKeyToken protocol.

Warning

Keys Must Be Immutable

Once a key is added to the trie, neither the key sequence itself nor any of its constituent tokens should be mutated. Modifying a key after it has been added can corrupt the internal state of the trie, leading to unpredictable behavior and making entries unreachable. The trie does not create a deep copy of keys for performance reasons.

If you need to modify a key, you should remove the old key and add a new one with the modified value.

Examples of valid :class:`GeneralizedKey` types

  • str

  • bytes

  • list[bool]

  • list[int]

  • list[bytes]

  • list[str]

  • list[Optional[str]]

  • tuple[int, int, str]

class gentrie.protocols.Hashable(*args, **kwargs)

Bases: TrieKeyToken, Protocol

The Hashable protocol is deprecated and will be removed in a future version.

This protocol is a sub-class of TrieKeyToken and is only provided for backward compatibility.

Use TrieKeyToken instead.

class gentrie.protocols.TrieKeyToken(*args, **kwargs)

Bases: Protocol

TrieKeyToken is a protocol that defines key tokens that are usable with a GeneralizedTrie.

The protocol requires that a token object be hashable. This means that it implements both an __eq__() method and a __hash__() method.

Some examples of built-in types that are suitable for use as tokens in a key:

Note: frozensets and tuples are only hashable if their contents are hashable.

Usage:

1from gentrie import TrieKeyToken
2
3token = SomeTokenClass()
4if isinstance(token, TrieKeyToken):
5    print("supports the TrieKeyToken protocol")
6else:
7    print("does not support the TrieKeyToken protocol")

Warning

Using User Defined Classes As Tokens In Keys

User-defined classes are hashable by default, but you should implement the __eq__() and __hash__() dunder methods in a content-aware way (the hash and eq values must depend on the content of the object) if you want to use them as tokens in a key. The default implementation of __eq__() and __hash__() uses the memory address of the object, which means that two different instances of the same class will not be considered equal.

gentrie.types module

Core data types for the gentrie package.

gentrie.types.TRIE_IDENT: int = 0

Alias for field number 0 (ident) in TrieEntry. It is faster to use this than accessing the field by name.

gentrie.types.TRIE_KEY: int = 1

Alias for field number 1 (key) in TrieEntry. It is faster to use this than accessing the field by name.

gentrie.types.TRIE_VALUE: int = 2

Alias for field number 2 (value) in TrieEntry. It is faster to use this than accessing the field by name.

class gentrie.types.TrieEntry(ident: TrieId, key: Sequence[TrieKeyToken], value: Any | None = None)

Bases: NamedTuple

A TrieEntry is a NamedTuple containing the unique identifer and key for an entry in the trie.

ident: TrieId

TrieId Unique identifier for a key in the trie. Alias for field number 0.

key: Sequence[TrieKeyToken]

GeneralizedKey Key for an entry in the trie. Alias for field number 1.

value: Any | None

Optional value for the entry in the trie. Alias for field number 2.

class gentrie.types.TrieId(value: int)

Bases: int

Unique identifier for a key in a trie.

gentrie.validation module

Validation functions for the gentrie package.

gentrie.validation.is_generalizedkey(key: Any) bool

Tests key for whether it is a valid GeneralizedKey.

A valid GeneralizedKey is a Sequence that returns TrieKeyToken protocol conformant objects when iterated. It must have at least one token.

Parameters:

key (Any) – Key for testing.

Returns:

True if a valid GeneralizedKey, False otherwise.

Return type:

bool

gentrie.validation.is_hashable(token: Any) bool

is_hashable is deprecated and will be removed in a future version.

This function is a wrapper for is_triekeytoken() and is only provided for backward compatibility.

Use is_triekeytoken() instead.

gentrie.validation.is_triekeytoken(token: Any) bool

Tests token for whether it is a valid TrieKeyToken.

A valid TrieKeyToken is a hashable object (implements both __eq__() and __hash__() methods).

Examples: bool, bytes, float, frozenset, int, str, None, tuple.

Parameters:

token (Any) – Object for testing.

Returns:

True if a valid TrieKeyToken, False otherwise.

Return type:

bool

Module contents

Package providing a generalized trie implementation.

This package includes classes and functions to create and manipulate a generalized trie data structure. Unlike common trie implementations that only support strings as keys, this generalized trie can handle various types of tokens, as long as they are hashable.

Warning

gentrie IS NOT thread-safe

It is not designed for concurrent use. If you need a thread-safe trie, you must use an external locking mechanism to ensure that either only read-only threads are accessing the trie at the same time or that only one write thread and no read threads are accessing the trie at the same time.

One way to do this is to create TWO instances of the trie, one for read-only access and one for write access.

The read-only instance can be used by multiple threads at the same time, while the write instance can be used by only one thread at a time. After a write operation, a new read-only instance can be created from the write instance by forking it using copy.deepcopy() which can then be used by the read-only threads.

Usage

You can create a trie using the GeneralizedTrie class:

1from gentrie import GeneralizedTrie
2
3# Create a new trie instance
4trie = GeneralizedTrie()

There are three ways to add entries:

  1. Using the trie[key] = value syntax

This allows you to assign a value directly to a key and will create a new TrieEntry if the key does not already exist. If the key already exists, it will update the value associated with that key.

Examples of using the trie[key] = value syntax
 1# Assigns 'value' to 'key'
 2# (tokenized as characters 'k','e','y')
 3trie['key'] = 'value'
 4
 5# Assigns value 'value2' to key 'another_key' (tokenized as
 6# 'a','n','o','t','h','e','r','_','k','e','y','2')
 7trie['another_key'] = 'another_value'
 8
 9# Changes the value for 'key' (tokenized as 'k', 'e', 'y')
10# to 'new_value'
11trie['key'] = 'new_value'
12
13# Assigns a tuple of int (tokenized as
14# 128, 96, 160, 0) as a key with the value'value5'
15trie[(128, 96, 160, 0)] = 'value5'
16
17# Assigns a tuple with mixed value types (tokenized
18# as 128, 'a') as a key with the value 'value5b'
19trie[(128, 'a')] = 'value5b'
20
21# Assigns a list of words (tokenized as 'hello', 'world')
22# as a key with the value 'value6'
23trie[['hello', 'world']] = 'value6'
  1. Using the trie.add(key, value) method

    This method adds a new entry to the trie and returns the TrieId for the new entry. If the key already exists, it will throw an error. The value argument is optional, and if not provided the entry will be created with a value of None.

  2. Using the trie.update(key, value) method

    This method adds a new entry or updates an existing entry, returning the TrieId for the entry and returns the TrieId of the entry.

    This is the same as using the trie[key] = value syntax, but it is more explicit about the intention to update or add an entry and returns the TrieId of the entry.

    The value argument is optional, and if not provided, the entry will be created or updated with a value of None.

You can use the in operator to check if a key exists in the trie, e.g., if key in trie:. This will return True if the key exists, and False otherwise.

There are two ways to directly retrieve entries using their keys:

  1. Using the trie[key | TrieId] syntax.

    This retrieves the TrieEntry associated with the key or TrieId. It will raise a KeyError if the key/TrieId does not exist. It will raise a TypeError if the key is not either a valid TrieId or GeneralizedKey.

    The returned TrieEntry contains the key, value, and an identifier (ident - of type TrieId) that uniquely identifies the entry in the trie.

  2. Using the trie.get(key | TrieId, [default]) method

    This retrieves the TrieEntry associated with the key or TrieId, returning None if the key/TrieId does not exist. This could be preferable in cases where you want to avoid exceptions for missing keys although it cannot distinguish between a key that does not exist and a key that exists with a None value in the trie by default.

    You can provide a different default value to return if the key does not exist, which can be useful for handling cases where you want to return a specific value instead of None.

You can also retrieve all entries that are prefixed by or prefixes for a given key:

  • trie.prefixed_by(key) returns a Generator of TrieEntry objects that are prefixed_by of the given key.

  • trie.prefixes(key) returns a Generator of TrieEntry objects that are prefixes of the given key.

These methods are useful for searching and retrieving entries that match a specific pattern or structure in the trie.

Example 1 - Basic Usage

1from gentrie import GeneralizedTrie, TrieEntry
2
3trie = GeneralizedTrie()
4trie.add(['ape', 'green', 'apple'])
5trie.add(['ape', 'green'])
6matches: set[TrieEntry] = set(trie.prefixes(['ape', 'green']))
7print(matches)

Value of ‘matches’:

{TrieEntry(ident=2, key=['ape', 'green'], value=None)}

Example 2 - Trie of URLs

 1from gentrie import GeneralizedTrie, TrieEntry
 2
 3# Create a trie to store website URLs
 4url_trie = GeneralizedTrie()
 5
 6# Add some URLs with different components (protocol, domain, path)
 7url_trie.add(["https", "com", "example", "www", "/", "products", "clothing"], value="Clothing Store")
 8url_trie.add(["http", "org", "example", "blog", "/", "2023", "10", "best-laptops"], value="Best Laptops 2023")
 9url_trie.add(["ftp", "net", "example", "ftp", "/", "data", "images"], value="FTP Data Images")
10
11# Find all https URLs with "example.com" domain
12prefixed_by: set[TrieEntry] = set(url_trie.prefixed_by(["https", "com", "example"]))
13print(prefixed_by)

Value of ‘prefixed_by’:

{
    TrieEntry(
        ident=1,
        key=['https', 'com', 'example', 'www', '/', 'products', 'clothing'],
        value='Clothing Store')
}

Example 3 - Entries prefixed by a key

1from gentrie import GeneralizedTrie, TrieEntry
2
3trie = GeneralizedTrie()
4trie.add('abcdef')
5trie.add('abc')
6trie.add('qrf')
7matches: set[TrieEntry] = set(trie.prefixed_by('ab'))
8print(matches)

Value of ‘matches’:

{
    TrieEntry(ident=2, key='abc', value=None),
    TrieEntry(ident=1, key='abcdef', value=None)
}
exception gentrie.DuplicateKeyError(msg: str, tag: ErrorTag)

Bases: TrieKeyError

Raised when an attempt is made to add a key that is already in the trie with a different associated value.

This is a sub-class of TrieKeyError.

class gentrie.ErrorTag(value)

Bases: str, Enum

Tags for error path identification for tests for the gentrie packages.

ErrorTags’ are used to identify specific error conditions in the gentrie package. Tests use these tags to assert specific error condition paths.

GETITEM_ID_NOT_FOUND = 'GETITEM_ID_NOT_FOUND'
GETITEM_INVALID_KEY_TYPE = 'GETITEM_INVALID_KEY_TYPE'
GETITEM_KEY_NOT_FOUND = 'GETITEM_KEY_NOT_FOUND'
GETITEM_NOT_TERMINAL = 'GETITEM_NOT_TERMINAL'
REMOVAL_INVALID_KEY_TYPE = 'REMOVAL_INVALID_KEY_TYPE'
REMOVAL_KEY_NOT_FOUND = 'REMOVAL_KEY_NOT_FOUND'
STORE_ENTRY_DUPLICATE_KEY = 'STORE_ENTRY_DUPLICATE_KEY'
STORE_ENTRY_INVALID_GENERALIZED_KEY = 'STORE_ENTRY_INVALID_GENERALIZED_KEY'
TRIE_PREFIXED_BY_BAD_DEPTH_TYPE = 'TRIE_PREFIXED_BY_BAD_DEPTH_TYPE'
TRIE_PREFIXED_BY_BAD_DEPTH_VALUE = 'TRIE_PREFIXED_BY_BAD_DEPTH_VALUE'
TRIE_PREFIXED_BY_INVALID_GENERALIZED_KEY = 'TRIE_PREFIXED_BY_INVALID_GENERALIZED_KEY'
TRIE_PREFIXES_INVALID_GENERALIZED_KEY = 'TRIE_PREFIXES_INVALID_GENERALIZED_KEY'
gentrie.GeneralizedKey

alias of Sequence[TrieKeyToken]

class gentrie.GeneralizedTrie(runtime_validation: bool = True)

Bases: TrieBase, TrieStorageMixin, TrieAccessMixin, TrieRemovalMixin, TrieTraversalMixin, TrieMutationMixin, TrieCollectionMixin

A general purpose trie.

Unlike many trie implementations which only support strings as keys and token match only at the character level, it is agnostic as to the types of tokens used to key it and thus far more general purpose.

It requires only that the indexed tokens be hashable. This is verified at runtime using the gentrie.TrieKeyToken protocol.

Tokens in a key do NOT have to all be the same type as long as they can be compared for equality.

It can handle a Sequence of TrieKeyToken conforming objects as keys for the trie out of the box.

You can ‘mix and match’ types of objects used as token in a key as long as they all conform to the TrieKeyToken protocol.

The code emphasizes robustness and correctness.

Warning

GOTCHA: Using User Defined Classes As Tokens In Keys

Objects of user-defined classes are conformant with the TrieKeyToken protocol by default, but this will not work as naively expected. The hash value of an object is based on its memory address by default. This results in the hash value of an object changing every time the object is created and means that the object will not be found in the trie unless you have a reference to the original object.

If you want to use a user-defined class as a token in a key to look up by value instead of the instance, you must implement the __eq__() and __hash__() dunder methods in a content aware way (the hash and eq values must depend on the content of the object).

Tip

Using `dataclasses.dataclass` For Content-Aware User Defined Classes

A simple way to implement a user-defined class that is content aware hashable is to use the dataclasses.dataclass decorator using the frozen=True and eq=True options . This will automatically implement appropriate __eq__() and __hash__() methods for you.

Example of a content-aware user-defined class
 1from dataclasses import dataclass
 2
 3from gentrie import TrieKeyToken
 4
 5@dataclass(frozen=True, eq=True)
 6class MyTokenClass:
 7    name: str
 8    value: int
 9
10# Create an instance of the token class
11token = MyTokenClass(name="example", value=42)
12
13# Check if the token is hashable
14if isinstance(token, TrieKeyToken):
15    print("token is usable as a TrieKeyToken")
16else:
17    print("token is not usable as a TrieKeyToken")
class gentrie.Hashable(*args, **kwargs)

Bases: TrieKeyToken, Protocol

The Hashable protocol is deprecated and will be removed in a future version.

This protocol is a sub-class of TrieKeyToken and is only provided for backward compatibility.

Use TrieKeyToken instead.

exception gentrie.InvalidGeneralizedKeyError(msg: str, tag: ErrorTag)

Bases: TrieTypeError

Raised when a key is not a valid GeneralizedKey object.

This is a sub-class of TrieTypeError.

exception gentrie.InvalidTrieKeyTokenError(msg: str, tag: ErrorTag)

Bases: TrieTypeError

Raised when a token in a key is not a valid TrieKeyToken object.

This is a sub-class of TrieTypeError.

class gentrie.TrieEntry(ident: TrieId, key: Sequence[TrieKeyToken], value: Any | None = None)

Bases: NamedTuple

A TrieEntry is a NamedTuple containing the unique identifer and key for an entry in the trie.

ident: TrieId

TrieId Unique identifier for a key in the trie. Alias for field number 0.

key: Sequence[TrieKeyToken]

GeneralizedKey Key for an entry in the trie. Alias for field number 1.

value: Any | None

Optional value for the entry in the trie. Alias for field number 2.

class gentrie.TrieId(value: int)

Bases: int

Unique identifier for a key in a trie.

exception gentrie.TrieKeyError(msg: str, tag: ErrorTag)

Bases: KeyError

Base class for all trie-related key errors.

It differs from a standard KeyError by the addition of a tag code used to very specifically identify where the error was thrown in the code for testing and development support.

This tag code does not have a direct semantic meaning except to identify the specific code throwing the exception for tests.

Parameters:
  • msg (str) – The error message.

  • tag (ErrorTag) – The tag code.

class gentrie.TrieKeyToken(*args, **kwargs)

Bases: Protocol

TrieKeyToken is a protocol that defines key tokens that are usable with a GeneralizedTrie.

The protocol requires that a token object be hashable. This means that it implements both an __eq__() method and a __hash__() method.

Some examples of built-in types that are suitable for use as tokens in a key:

Note: frozensets and tuples are only hashable if their contents are hashable.

Usage:

1from gentrie import TrieKeyToken
2
3token = SomeTokenClass()
4if isinstance(token, TrieKeyToken):
5    print("supports the TrieKeyToken protocol")
6else:
7    print("does not support the TrieKeyToken protocol")

Warning

Using User Defined Classes As Tokens In Keys

User-defined classes are hashable by default, but you should implement the __eq__() and __hash__() dunder methods in a content-aware way (the hash and eq values must depend on the content of the object) if you want to use them as tokens in a key. The default implementation of __eq__() and __hash__() uses the memory address of the object, which means that two different instances of the same class will not be considered equal.

exception gentrie.TrieTypeError(msg: str, tag: ErrorTag)

Bases: TypeError

Base class for all trie-related type errors.

It differs from a standard TypeError by the addition of a tag code used to very specifically identify where the error was thrown in the code for testing and development support.

This tag code does not have a direct semantic meaning except to identify the specific code throwing the exception for tests.

Parameters:
  • msg (str) – The error message.

  • tag (ErrorTag) – The tag code.

exception gentrie.TrieValueError(msg: str, tag: ErrorTag)

Bases: ValueError

Base class for all trie-related value errors.

It differs from a standard ValueError by the addition of a tag code used to very specifically identify where the error was thrown in the code for testing and development support.

This tag code does not have a direct semantic meaning except to identify the specific code throwing the exception for tests.

Parameters:
  • msg (str) – The error message.

  • tag (ErrorTag) – The tag code.

gentrie.is_generalizedkey(key: Any) bool

Tests key for whether it is a valid GeneralizedKey.

A valid GeneralizedKey is a Sequence that returns TrieKeyToken protocol conformant objects when iterated. It must have at least one token.

Parameters:

key (Any) – Key for testing.

Returns:

True if a valid GeneralizedKey, False otherwise.

Return type:

bool

gentrie.is_hashable(token: Any) bool

is_hashable is deprecated and will be removed in a future version.

This function is a wrapper for is_triekeytoken() and is only provided for backward compatibility.

Use is_triekeytoken() instead.

gentrie.is_triekeytoken(token: Any) bool

Tests token for whether it is a valid TrieKeyToken.

A valid TrieKeyToken is a hashable object (implements both __eq__() and __hash__() methods).

Examples: bool, bytes, float, frozenset, int, str, None, tuple.

Parameters:

token (Any) – Object for testing.

Returns:

True if a valid TrieKeyToken, False otherwise.

Return type:

bool