fm_index
An FM-index for efficient full-text search on a single string.
The FM-index is a compressed text index based on the Burrows–Wheeler Transform (BWT).
It supports fast substring queries whose runtime depends only on the pattern length,
not on the size of the indexed text.
Internally, several independent stages of index construction and query processing
are optimized using data-parallel execution.
Construction
Time / Space Complexity
- Time:
O(N log σ) - Space:
O(N log σ)
where:
N= length of the indexed stringσ= size of the alphabet (2⁸ for UCS-1, 2¹⁶ for UCS-2, etc.)
from fm_index import FMIndex
fm = FMIndex("mississippi")
Convert the FMIndex back into the original string.
Complexity
- Time:
O(N log σ) - Space:
O(N)
Examples
fm.item()
# 'mississippi'
Check whether the indexed string contains the given pattern.
Complexity
- Time:
O(|pattern| log σ) - Space:
O(|pattern|)
Examples
fm.contains("issi")
# True
Count how many times a pattern appears in the indexed string.
Complexity
- Time:
O(|pattern| log σ) - Space:
O(|pattern|)
Examples
fm.count("issi")
# 2
Locate all starting positions of the pattern in the indexed string.
This operation may internally leverage parallel execution to efficiently
enumerate large result sets.
⚠️ Order of returned positions is not guaranteed.
Complexity
- Time:
O((|pattern| + |count|) log σ) - Space:
O(|pattern| + |count|)
Examples
fm.locate("issi")
# [4, 1]
Lazily locate all starting positions of the pattern in the indexed string.
This method yields the same positions as locate, but returns them
one by one as an iterator instead of allocating a list.
⚠️ Order of yielded positions is not guaranteed.
Complexity
- Time:
O(|pattern| log σ)for initialization,O(log σ)per yielded position - Space:
O(|pattern|)
Examples
iter = fm.iter_locate("issi")
next(iter)
# 4
next(iter)
# 1
A multi-document FM-index for fast substring search across multiple strings.
Internally, all strings are concatenated with separators and indexed as a single FM-index,
while preserving the ability to map matches back to their original documents.
Query processing across documents is internally parallelized where applicable,
making multi-document search efficient in practice.
Construction
Time / Space Complexity
- Time:
O(S log σ) - Space:
O(S log σ)
where:
S= total length of all indexed stringsσ= size of the alphabet (2⁸ for UCS-1, 2¹⁶ for UCS-2, etc.)
from fm_index import MultiFMIndex
mfm = MultiFMIndex(["abcabcabcabc", "xxabcabcxxabc", "abcababcabc"])
Convert the index back into the original list of strings.
Complexity
- Time:
O(S log σ) - Space:
O(S)
Examples
mfm.item()
# ['abcabcabcabc', 'xxabcabcxxabc', 'abcababcabc']
Check if the pattern exists as a full document in the index.
Complexity
- Time:
O(|pattern| log σ) - Space:
O(|pattern|)
Examples
mfm.contains("abcabcabcabc")
# True
Count total occurrences of a pattern across all documents.
Complexity
- Time:
O(|pattern| log σ) - Space:
O(|pattern|)
Examples
mfm.count_all("abc")
# 10
Count occurrences per document.
Returns {doc_index: count}.
Complexity
- Time:
O((|pattern| + |total_count|) log σ) - Space:
O(|pattern| + |output|)
Examples
mfm.count("abc")
# {0: 4, 1: 3, 2: 3}
Locate occurrences per document.
Internally, result enumeration and aggregation may be parallelized.
⚠️ Order is not guaranteed.
Complexity
- Time:
O((|pattern| + |total_count|) log σ) - Space:
O(|pattern| + |total_count|)
Examples
mfm.locate("abc")
# {0: [9, 6, 3, 0], 1: [10, 2, 5], 2: [8, 0, 5]}
Lazily locate all occurrences of the pattern across documents.
Yields (doc_index, position) pairs without constructing
an intermediate result dictionary.
⚠️ Order of yielded results is not guaranteed.
Complexity
- Time:
O(|pattern| log σ)to initialize, thenO(log σ)per yielded occurrence. - Space:
O(|pattern|)
Examples
iter = mfm.iter_locate("abc")
next(iter)
# (2, 8)
next(iter)
# (1, 10)
...