Using gen-trie¶
Installation¶
Via PyPI:
pip3 install gen-tries
From source:
git clone https://github.com/JerilynFranz/python-gen-tries
cd python-gen-tries
pip3 install .
Usage¶
The gentrie module provides a GeneralizedTrie class that allows you to create a trie structure for storing and searching sequences of items, such as strings, lists, or tuples.
Below are some examples of how to use the GeneralizedTrie class.
Examples¶
Assigning Values to Entries¶
#!/usr/bin/env python3
"""Assigning Values to Entries in a Generalized Trie.
This script demonstrates how to create a Generalized Trie, add
or update entries with values, and retrieve entries, suffixes,
and prefixes with their associated values.
There are three ways to add entries:
1. Using the `trie[key] = value` syntax
This allows you to assign a value directly to a key and
will create a new entry if the key does not already exist.
If the key already exists, it will update the value
associated with that key.
2. Using the `trie.add(key, value)` method
This method adds a new entry to the trie. If the key
already exists, it will throw an error. The value argument
is optional, and if not provided the entry will be created
with a `None` value.
3. Using the `trie.update(key, value)` method
This method adds a new entry or updates an existing entry.
This is the same as using the `trie[key] = value` syntax,
but it is more explicit about the intention to update or
add an entry.
The value argument is optional, and if not provided, the
entry will be created or updated with a `None` value.
You can use the `in` operator to check if a key exists in the
trie, e.g., `if key in trie:`. This will return `True` if the
key exists, and `False` otherwise.
There are two ways to directly retrieve entries using their keys:
1. Using the `trie[key | TrieId]` syntax.
This retrieves the `TrieEntry` associated with the key or
TrieId. It will raise a `KeyError` if the key/TrieId does
not exist. It will raise a `TypeError` if the key is not a
valid `TrieId` or `GeneralizedKey`.
The `TrieEntry` contains the key, value, and an identifier
(ident - of type `TrieId`) that uniquely identifies the
entry in the trie.
You can also use the `trie.get(key | TrieId)` method to
retrieve the `TrieEntry` associated with the key or TrieId,
which is similar to the `trie[key]` syntax but allows for a
default value to be specified if the key/TrieId does not exist
rather than raising an exception.
2. Using the `trie.get(key | TrieId, [default])` method
This retrieves the `TrieEntry` associated with the key or
TrieId, returning `None` if the key/TrieId does not exist.
This could be preferable in cases where you want to avoid
exceptions.
You *can* provide a default value to return if the key
does not exist, which can be useful for handling cases where
you want to return a specific value instead of `None`.
You can also retrieve all entries that are prefixes for or prefixed by
a given key:
- `trie.prefixed_by(key)` returns a set of `TrieEntry` objects that
are prefixed_by of the given key.
- `trie.prefixes(key)` returns a set of `TrieEntry` objects that
are prefixes of the given key.
These methods are useful for searching and retrieving entries
that match a specific pattern or structure in the trie.
"""
from gentrie import DuplicateKeyError, GeneralizedTrie, TrieEntry
trie = GeneralizedTrie()
# Adding entries using the trie[key] = value syntax
entries: list[tuple[str, str | list[str]]] = [
('abcdef', 'value1'),
('abc', 'value2'),
('abcd', 'value3'),
('qrf', 'value4'),
('xyz', ['lists', 'are', 'also', 'supported']),
('xy', 'value6'),
]
for key, value in entries:
trie[key] = value
prefixed_by: set[TrieEntry] = trie.prefixed_by('xy')
print(f'prefixed_by = {prefixed_by}')
prefixes: set[TrieEntry] = trie.prefixes('abcdefg')
print(f'prefixes = {prefixes}')
# prefixed_by = {
# TrieEntry(ident=6, key='xy', value='value6'),
# TrieEntry(ident=5, key='xyz', value=['lists','are', 'also', 'supported'])
# }
#
# prefixes = {
# TrieEntry(ident=3, key='abcd', value='value3'),
# TrieEntry(ident=2, key='abc', value='value2'),
# TrieEntry(ident=1, key='abcdef', value='value1')
# }
# Adding using the add method
# This will throw an error if the key already exists.
more_entries: list[tuple[str | tuple[int, ...], str]] = [
((128, 96, 160, 0), 'value5'),
((128, 90), 'value5b'),
('xy', 'value6'),
]
for key, value in more_entries:
try:
trie.add(key, value)
print(f'Added entry: {key} -> {value}')
except DuplicateKeyError:
print(f'DuplicateKeyError - entry already exists: {key}')
prefixed_by: set[TrieEntry] = trie.prefixed_by([128])
print(f'prefixed_by = {prefixed_by}')
prefixes: set[TrieEntry] = trie.prefixes([128, 90])
print(f'prefixes = {prefixes}')
# prefixed_by = {
# TrieEntry(ident=8, key=(128, 90), value='value5b'),
# TrieEntry(ident=7, key=(128, 96, 160, 0), value='value5')
# }
#
# prefixes = {
# TrieEntry(ident=8, key=(128, 90), value='value5b')
# }
# Updating or adding entries using the update method
# This will update the value if the key already exists,
# or create a new entry if it does not.
# This is the same as using the trie[key] = value syntax.
even_more_entries: list[tuple[str, str]] = [
('abcdefghi', 'value7'),
('abcd', 'value8'),
]
for key, value in even_more_entries:
trie.update(key, value)
prefixed_by = trie.prefixed_by('abcd')
print(f'prefixed_by = {prefixed_by}')
prefixes = trie.prefixes('abcdefg')
print(f'prefixes = {prefixes}')
# prefixed_by = {
# TrieEntry(ident=3, key='abcd', value='value8'),
# TrieEntry(ident=9, key='abcdefghi', value='value7'),
# TrieEntry(ident=1, key='abcdef', value='value1')
# }
#
# prefixes = {
# TrieEntry(ident=2, key='abc', value='value2'),
# TrieEntry(ident=3, key='abcd', value='value8'),
# TrieEntry(ident=1, key='abcdef', value='value1')
# }
Dictionary Syntax Examples¶
#!/usr/bin/env python3
"""This example demonstrates how to use the GeneralizedTrie
to store and retrieve entries using a dictionary-like syntax.
It includes adding, updating, retrieving, and deleting entries,
as well as checking for existence of keys or identifiers."""
from typing import Optional
from gentrie import GeneralizedTrie, TrieEntry, TrieId
trie = GeneralizedTrie()
# Add entries with different key types to demonstrate flexibility
key1: str = 'key1' # String key (tokenized as individual characters)
trie[key1] = 'value1'
key2: tuple[int, int, int] = (1, 2, 3) # Tuple key (tokenized as individual elements)
trie[key2] = 'value2'
key3: tuple[str, str] = ('hello', 'world') # Tuple of strings (tokenized as individual elements)
trie[key3] = 'value3'
# Retrieve the value for the key 'key1' (the returned value
# from the dictionary is a TrieEntry object - the actual value is in the
# 'value' attribute)
value1 = trie[key1].value
id1: TrieId = trie[key1].ident
print(f'ID for {key1}: {id1}')
print(f'Value for {key1}: {value1}')
# Retrieve the value for key2
value2 = trie[key2].value
print(f'Value for {key2}: {value2}')
# Retrieve the value for key3
value3 = trie[key3].value
print(f'Value for {key3}: {value3}')
# Check if the key key1 exists in the trie
exists_key1 = key1 in trie
print(f'Key {key1} exists: {exists_key1}')
# Check if ident id1 exists in the trie
# This checks if the identifier for key1 exists in the trie.
# This is useful for accessing entries by their identifiers.
#
# The idents and the keys are diffrent things.
# The idents are unique identifiers for the entries, while the keys
# are the actual keys used to access the entries.
# Either can be used to access the entries in the trie,
# and both can be used to check for existence of entries,
# but they are not interchangeable. Ids are faster (*O(1)*) than
# keys (*O(n)* to the length of the keys) for accessing entries.
exists_id1 = id1 in trie
print(f'Ident {id1} exists: {exists_id1}')
# Check if the key2 exists in the trie
exists_key2 = key2 in trie
print(f'Key {key2} exists: {exists_key2}')
# Check if key3 exists in the trie
exists_key3 = key3 in trie
print(f'Key {key3} exists: {exists_key3}')
# Update the value for the key key1
trie[key1] = 'new_value1'
updated_value1 = trie[key1].value
print(f'Updated value for {key1}: {updated_value1}')
# delete the entry for the key2
del trie[key2]
exists_key2_after_delete = key2 in trie
print(f'Key {key2} exists after deletion: {exists_key2_after_delete}')
print("\nDifferent approaches to handle missing keys:")
none_key: str = "none_value_key"
trie[none_key] = None
print(f"Added key '{none_key}' (as characters) with None value for demonstration")
# 1. Exception-based approach (most reliable)
try:
deleted_entry = trie[key2]
print(f'1. Exception approach - Value for {key2}: {deleted_entry.value}')
except KeyError as e:
print(f'1. Exception approach - Key {key2} not found: {e}')
# 2. get() approach - Clear and unambiguous:
# When a key is not found, it returns None.
# When a key exists, it returns the TrieEntry object.
# This makes it easy to distinguish between missing keys and existing None values.
print("\n2. get() approach:")
safe_entry: Optional[TrieEntry] = trie.get(key2) # Returns None for missing key
none_entry: Optional[TrieEntry] = trie.get(none_key) # Returns TrieEntry with None value
print(f'Missing key result: {safe_entry}')
print(f'Existing None value result: {none_entry}')
# Demonstrate the clear distinction:
print(f'Missing key is None: {safe_entry is None}')
print(f'Existing entry is TrieEntry: {isinstance(none_entry, TrieEntry)}')
# 3. Membership test first
print("\n3. Membership test approach:")
if key2 in trie:
print(f'Check first approach - Value: {trie[key2].value}')
else:
print(f'Check first approach - Key {key2} not found')
if none_key in trie:
print(f'Check first approach - Value for None key: {trie[none_key].value}')
else:
print('Check first approach - None key not found')
print("\nWhen to use each approach:")
print("1. Exception approach: When you expect the key to exist")
print("2. get() approach: When key might not exist (recommended)")
print("3. Membership test: When you only need to check existence")
# Clean up the demonstration entry
del trie[none_key]
# Show remaining entries
print("Remaining entries in trie:")
for entry in trie.values():
print(f" {entry.key} -> {entry.value}")
# Delete the key1 entry using the ident instead of the key
print(f"\nDeleting entry with ident {id1} (for key '{key1}')")
del trie[id1]
# Show remaining entry values after deletion
print(f"\nRemaining entry values after deleting entry with ident {id1}:")
for entry in trie.values():
print(f" {entry.key} -> {entry.value}")
# Demonstrate additional dictionary-like operations
print("\nAdditional dictionary-like operations:")
# Add a new entry
trie['new_key'] = 'new_value'
print("Added new entry: 'new_key' -> 'new_value'")
# Additional dictionary-like operations
print("\nTrie statistics:")
print(f"Number of entries: {len(trie)}")
# Iterate over idents. Keys returns the identifiers of the entries
# which can be used to access the entries directly.
print("All idents for entries in trie:")
for ident in trie.keys():
print(f" {ident}")
# Iterate over key-value pairs
print("All ident-value pairs:")
for ident, entry in trie.items():
print(f" {ident}: {entry.value}")
# Output:
# ID for key1: 1
# Value for key1: value1
# Value for (1, 2, 3): value2
# Value for ('hello', 'world'): value3
# Key key1 exists: True
# Key (1, 2, 3) exists: True
# Key ('hello', 'world') exists: True
# Updated value for key1: new_value1
# Key (1, 2, 3) exists after deletion: False
#
# Different approaches to handle missing keys:
# Added key 'none_value_key' (as characters) with None value for demonstration
# 1. Exception approach - Key (1, 2, 3) not found: '[GTGI001] key does not match any idents or keys in the trie'
#
# 2. get() approach:
# Missing key result: None
# Existing None value result: TrieEntry(ident=4, key='none_value_key', value=None)
# Missing key is None: True
# Existing entry is TrieEntry: True
#
# 3. Membership test approach:
# Check first approach - Key (1, 2, 3) not found
# Check first approach - Value for None key: None
#
# When to use each approach:
# 1. Exception approach: When you expect the key to exist
# 2. get() approach: When key might not exist (recommended)
# 3. Membership test: When you only need to check existence
# Remaining entries in trie:
# key1 -> new_value1
# ('hello', 'world') -> value3
#
# Deleting entry with ident 1 (for key 'key1')
#
# Remaining entry values after deleting entry with ident 1:
# ('hello', 'world') -> value3
#
# Additional dictionary-like operations:
# Added new entry: 'new_key' -> 'new_value'
#
# Trie statistics:
# Number of entries: 2
# All idents for entries in trie:
# 3
# 5
# All ident-value pairs:
# 3: value3
# 5: new_value
By Letter¶
#!/usr/bin/env python3
"""Example of using a GeneralizedTrie for 'by letter' string indexing.
"""
from gentrie import GeneralizedTrie, TrieEntry
trie = GeneralizedTrie()
entries: list[str] = [
'abcdef',
'abc',
'abcd',
'qrf',
]
for item in entries:
trie.add(item)
prefixed_by: set[TrieEntry] = trie.prefixed_by('abcd')
print(f'prefixed_by = {prefixed_by}')
prefixes: set[TrieEntry] = trie.prefixes('abcdefg')
print(f'prefixes = {prefixes}')
# prefixed_by = {
# TrieEntry(ident=TrieId(1), key='abcdef', value=None),
# TrieEntry(ident=TrieId(3), key='abcd', value=None)}
# prefixes = {
# TrieEntry(ident=TrieId(2), key='abc', value=None),
# TrieEntry(ident=TrieId(3), key='abcd', value=None),
# TrieEntry(ident=TrieId(1), key='abcdef', value=None)}
By Tuple¶
#!/usr/bin/env python3
"""Example of using a GeneralizedTrie for indexing sequences of tuples"""
from gentrie import GeneralizedTrie, TrieEntry
trie = GeneralizedTrie()
entries = [
[(1, 2), (3, 4), (5, 6)],
[(1, 2), (3, 4)],
[(5, 6), (7, 8)],
]
for item in entries:
trie.add(item)
prefixed_by: set[TrieEntry] = trie.prefixed_by([(1, 2)])
print(f'prefixed_by = {prefixed_by}')
prefixes: set[TrieEntry] = trie.prefixes([(1, 2), (3, 4), (5, 6), (7, 8)])
print(f'prefixes = {prefixes}')
# prefixed_by = {
# TrieEntry(ident=TrieId(1), key=[(1, 2), (3, 4), (5, 6)], value=None),
# TrieEntry(ident=TrieId(2), key=[(1, 2), (3, 4)], value=None)
# }
# prefixes = {
# TrieEntry(ident=TrieId(1), key=[(1, 2), (3, 4), (5, 6)], value=None),
# TrieEntry(ident=TrieId(2), key=[(1, 2), (3, 4)], value=None)
# }
By Word¶
#!/usr/bin/env python3
"""Example of using a GeneralizedTrie for indexing sequences of words
"""
from gentrie import GeneralizedTrie, TrieEntry
trie = GeneralizedTrie()
entries: list[list[str]] = [
['ape', 'green', 'apple'],
['ape', 'green'],
['ape', 'green', 'pineapple'],
]
for item in entries:
trie.add(item)
prefixes: set[TrieEntry] = trie.prefixes(['ape', 'green', 'apple'])
print(f'prefixes = {prefixes}')
prefixed_by: set[TrieEntry] = trie.prefixed_by(['ape', 'green'])
print(f'prefixed_by = {prefixed_by}')
# prefixes = {
# TrieEntry(ident=TrieId(1), key=['ape', 'green', 'apple'], value=None),
# TrieEntry(ident=TrieId(2), key=['ape', 'green'], value=None)
# }
# prefixed_by = {
# TrieEntry(ident=TrieId(1), key=['ape', 'green', 'apple'], value=None),
# TrieEntry(ident=TrieId(2), key=['ape', 'green'], value=None),
# TrieEntry(ident=TrieId(3), key=['ape', 'green', 'pineapple'], value=None)
# }
Key In Trie¶
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
This script demonstrates how to create a trie, add entries, and check for their existence
using both keys and TrieIds.
"""
from gentrie import GeneralizedTrie, TrieId
trie = GeneralizedTrie()
entries: list[str] = [
'abcdef',
'abc',
'abcd',
'qrf',
]
# Add entries to the trie and save their TrieIds
entry_ids: list[TrieId] = []
for item in entries:
ident: TrieId = trie.add(item)
print(f'Added {item} with {ident}')
entry_ids.append(ident)
if 'abc' in trie:
print('abc is in trie')
else:
print('error: abc is not in trie')
if 'abcde' not in trie:
print('abcde is not in trie')
else:
print('error: abcde is in trie')
if 'qrf' not in trie:
print('error: qrf is not in trie')
else:
print('qrf is in trie')
if 'abcdef' not in trie:
print('error: abcdef is not in trie')
else:
print('abcdef is in trie')
if entry_ids[0] not in trie:
print(f'error: {entry_ids[0]} is not in trie')
else:
print(f'{entry_ids[0]} is in trie')
# abc is in trie
# abcde is not in trie
# qrf is in trie
# abcdef is in trie
# TrieId(1) is in trie
Prefixes¶
#!/usr/bin/env python3
from gentrie import GeneralizedTrie, TrieEntry
trie: GeneralizedTrie = GeneralizedTrie()
keys: list[str] = ['abcdef', 'abc', 'a', 'abcd', 'qrs']
for entry in keys:
trie.add(entry)
matches: set[TrieEntry] = trie.prefixes('abcd')
for trie_entry in sorted(list(matches)):
print(f'{trie_entry.ident}: {trie_entry.key}')
# 2: abc
# 3: a
# 4: abcd
Prefixed By¶
#!/usr/bin/env python3
from gentrie import GeneralizedTrie, TrieEntry
trie = GeneralizedTrie()
keys: list[str] = ['abcdef', 'abc', 'a', 'abcd', 'qrs']
for entry in keys:
trie.add(entry)
matches: set[TrieEntry] = trie.prefixed_by('abcd')
for trie_entry in sorted(list(matches)):
print(f'{trie_entry.ident}: {trie_entry.key}')
# 1: abcdef
# 4: abcd
URL Suffixes¶
#!/usr/bin/env python3
"""Example of using a GeneralizedTrie for indexing website URLs by path"""
from gentrie import GeneralizedTrie, TrieEntry
# Create a trie to store website URLs
url_trie = GeneralizedTrie()
# Add some URLs with different components (protocol, domain, path)
url_trie.add(["https", ":", "//", "com", "example", "www", "/", "products", "clothing"],
"https://www.example.com/products/clothing")
url_trie.add(["http", ":", "//", "org", "/", "example", "blog", "/", "2023", "10", "best-laptops"],
"http://example.org/blog/2023/10/best-laptops") # DevSkim: ignore DS137138
url_trie.add(["ftp", ":", "//", "net", "example", "/", "ftp", "/", "data", "images"],
"ftp://example.net/ftp/data/images")
# Find https URLs with "example.com" domain or sub-domain
print("HTTPS URLs for domains or sub-domains of 'example.com'")
prefixed_by: set[TrieEntry] = url_trie.prefixed_by(["https", ":", "//", "com", "example"])
for entry in prefixed_by:
print(f"Found URL: {entry.value}")
# Find ftp protocol URLs
print("FTP URLs")
prefixed_by = url_trie.prefixed_by(["ftp"])
for entry in prefixed_by:
print(f"Found URL: {entry.value}")
# HTTPS URLs for domains or sub-domains of 'example.com'
# Found URL: https://www.example.com/products/clothing
# FTP URLs
# Found URL: ftp://example.net/ftp/data/images
Word Suggestions¶
#!/usr/bin/env python3
"""Example of using a GeneralizedTrie for word suggestions based on prefixes.
"""
from gentrie import GeneralizedTrie, TrieEntry
trie = GeneralizedTrie()
entries: list[str] = [
'hell',
'hello',
'help',
'dog',
'doll',
'dolly',
'dolphin',
'do'
]
for item in entries:
trie.add(item)
suggestions: set[TrieEntry] = trie.prefixed_by('do', depth=2)
print(f'+2 letter suggestions for "do" = {suggestions}')
suggestions = trie.prefixed_by('do', depth=3)
print(f'+3 letter suggestions for "do" = {suggestions}')
# +2 letter suggestions for "do" = {
# TrieEntry(ident=TrieId(5), key='doll', value=None),
# TrieEntry(ident=TrieId(4), key='dog', value=None),
# TrieEntry(ident=TrieId(8), key='do', value=None)
# }
# +3 letter suggestions for "do" = {
# TrieEntry(ident=TrieId(5), key='doll', value=None),
# TrieEntry(ident=TrieId(6), key='dolly', value=None),
# TrieEntry(ident=TrieId(4), key='dog', value=None),
# TrieEntry(ident=TrieId(8), key='do', value=None)
# }
Trie of numeric vectors¶
#!/usr/bin/env python3
"""Example of using a GeneralizedTrie for 'by number' indexing
of numeric sequences/vectors.
"""
from gentrie import GeneralizedTrie, TrieEntry
trie = GeneralizedTrie()
entries = [
[128, 256, 512],
[128, 256],
[512, 1024],
]
for item in entries:
trie.add(item)
prefixed_by: set[TrieEntry] = trie.prefixed_by([128])
print(f'prefixed_by = {prefixed_by}')
prefixes: set[TrieEntry] = trie.prefixes([128, 256, 512, 1024])
print(f'prefixes = {prefixes}')
# prefixed_by = {
# TrieEntry(ident=TrieId(1), key=[128, 256, 512], value=None),
# TrieEntry(ident=TrieId(2), key=[128, 256], value=None)
# }
# prefixes = {
# TrieEntry(ident=TrieId(1), key=[128, 256, 512], value=None),
# TrieEntry(ident=TrieId(2), key=[128, 256], value=None)
# }