====================
Benchmarks
====================

.. toctree::


These are benchmark results for the `gen_tries` library, focusing on
various operations such as building tries, checking key existence,
prefix matching, and updating tries.


--------
Platform
--------

.. index:: platform

Hardware
--------

Mac Studio, Apple M2 Ultra, 24-Core CPU (approx 3.5 GHz), 128GB RAM 

Python Version
--------------

Python 3.13.7

Benchmark Configuration
-----------------------

- **gen-tries Version**: 0.9.4
- **Single Threaded**
- **Symbol Set**: '0123'
- **Key Generation**: All possible combinations of symbols up to a certain length, starting with all '0's and incrementing the last symbol until all combinations were generated.
- **Trie Depths**: 2 to 9
- **Runtime Validation**: Enabled and Disabled
- **Benchmarking Tool**: pytest-benchmark
- **Benchmarking Code**: bench/benchmark_performance.py


Running the Benchmarks
----------------------

To run all benchmarks with pytest-benchmark:

.. code-block:: shell

    cd bench
    pytest bench_performance.py \
        --benchmark-sort=name \
        --benchmark-group-by=func \
        --benchmark-histogram=histogram/benchmark \
        --benchmark-time-unit=ns

Benchmark Configuration
-----------------------

Benchmarks are configured with:

- **min_rounds**: 100
- **min_time**: 1s
- **max_time**: 10s
- **timer**: perf_counter_ns

See the source for details.

What is Measured
----------------

- **Add**: Time to add keys to the trie
- **Update**: Time to update existing keys
- **Lookup**: Time to check key presence (hit/miss)
- **Remove**: Time to remove keys
- **Traversal**: Time for `prefixes()` and `prefixed_by()`

Interpreting Results
--------------------

- **ops**: Operations per second (higher is better)
- **Scaling**: As trie depth increases, expect some slowdown; large jumps may indicate inefficiency.


Advanced Usage
--------------

- Compare runs: ``pytest ... --benchmark-compare=previous.json``
- Export results: ``pytest ... --benchmark-json=results.json``
- View histograms: ``pytest ... --benchmark-histogram=histogram/benchmark``

Discussion
----------

The benchmarks are designed to evaluate performance across different trie
depths and configurations.

Comparison of runtime validation settings reveals that disabling runtime validation
can lead to small performance improvements in some scenarios, particularly for deeper
tries. However, the performance gains are often minimal and may not justify the trade-offs
in terms of safety and correctness. Runtime validation helps catch errors and ensures
correctness by validating the structure and content of the trie during operations.

Turning it off may be appropriate in extremely performance-critical applications where the data is
well-defined and unlikely to change, allowing for optimizations that bypass some of the safety checks.

**HOWEVER**, it is crucial to thoroughly test and validate the trie implementation in such
scenarios to avoid potential issues. Absolutely NO assumptions should be made about
the data or its structure. The behavior of the trie with invalid data is undefined and may lead to
catastrophic failures.

In Summary: If you choose to disable runtime validation, do so with extreme caution and only after
thorough testing. The performance increase is relatively small, and the risks may outweigh the benefits.

For the tests, a symbol set of '0123' was used and keys were generated by creating
all possible combinations of these symbols up to a certain length starting
with all '0's and incrementing the last symbol until all combinations were generated.

So for a trie depth of 3, the keys would be '000', '001', '002', ..., '333'. Thus
covering all possible keys of length 3. This approach ensures a comprehensive
evaluation of the trie performance across different key patterns and depths and has a
branching factor (4) that allows for efficient exploration of the search space to
a substantial depth (9).

Depth 1 is generally omitted from the benchmarks due to its limited key space and minimal impact on performance.

The tests themselves are 'micro-benchmarks' designed to measure specific operations in isolation.
They measure the performance of the **SPECIFIC** call being tested and do not include the overhead taken
by other operations or the overall process.

This means that it measures the performance of something like 'trie.add(key)' or
'key in trie' in isolation, not including any overhead from say a containing 'for' loop or
other operations.

It is measuring the performance of code that looks like this:

.. code-block:: python
    :linenos:

    from gentrie import GeneralizedTrie

    trie = GeneralizedTrie()
    trie['example_key'] = 'example_value'
    trie['different_key'] = 'different_value'
    trie['yet_another_key'] = 'yet_another_value'


NOT code that looks like

.. code-block:: python
    :linenos:

    from gentrie import GeneralizedTrie

    trie = GeneralizedTrie()
    data = [('example_key', 'example_value'),
            ('different_key', 'different_value'),
            ('yet_another_key', 'yet_another_value')]
    for key, value in data:
        trie[key] = value

The reason for this is that the overhead of the loop and other operations are implementation
details that cannot be easily separated from the operation being measured, and so would skew
the results by including unrelated overhead from code NOT being targeted for measurement. 

The `for` loop in the above example would add significant overhead that would skew the results for
the specific operation being measured. The hash assignment (`__setitem__`) method in this case.

As such, the benchmarks focus on specific operations in isolation to provide a clearer picture of
performance BUT tend to over-estimate real-world performance due to the extremely controlled
nature of the tests (which are designed to eliminate as many variables as possible).

In practice, the performance characteristics of a trie may vary significantly based on the
specific use case, data distribution, and access patterns. As such, while synthetic benchmarks
provide valuable insights, they should be interpreted with caution and supplemented with
real-world testing whenever possible.

--------------------
Synthetic Benchmarks
--------------------

The following benchmarks were conducted using synthetically generated keys.

Synthetic keys were created by generating all possible combinations of the symbols
'0123' for a specific depth (length) of keys. This approach ensures a comprehensive
evaluation of the trie performance across different key patterns and depths - and has a
branching factor (4) that allows for efficient exploration of the search space to
a substantial depth (9).

This allows for a thorough examination of the trie’s performance characteristics
across various scenarios.

However, it is important to note that synthetic benchmarks may not fully capture
real-world performance characteristics, as they do not account for factors such as
data distribution, key collisions, and other complexities that may arise in
practical applications. It is designed to provide a controlled environment for
evaluating the trie implementation's baseline performance characteristics that is not
influenced by quirks of real-world data patterns.

Building trie using add()
--------------------------------
Table shows thousands of operations per second (Kops/sec) for different trie depths and
runtime key validation settings while building the trie using the add() method
using strings as keys. Trie Depth is equivalent to the number of symbols in each key
being added to the trie.

The operations being measured are the individual add() method calls adding keys to the trie.

=============  ======================  ==========
Trie Depth       Runtime Validation    Kops / sec
=============  ======================  ==========
2              False                   266.7
2              True                    266.7
3              False                   609.5
3              True                    587.2
4              False                   612.4
4              True                    587.2
5              False                   644.8
5              True                    612.4
6              False                   634.4
6              True                    601.8
7              False                   618.5
7              True                    591.6
8              False                   622.9
8              True                    600.3
9              False                   602.6
9              True                    584.5
=============  ======================  ==========

.. image:: _static/benchmarks/build_trie_with_add.png
    :align: center

Building trie using update()
-----------------------------------

This is the same as above but using the update() method to build the trie instead of add()

=============  ======================  ==========
Trie Depth       Runtime Validation    Kops / sec
=============  ======================  ==========
2              False                   400.0
2              True                    444.4
3              False                   547.0
3              True                    561.4
4              False                   651.4
4              True                    635.2
5              False                   592.6
5              True                    635.2
6              False                   599.2
6              True                    600.5
7              False                   602.3
7              True                    585.9
8              False                   608.8
8              True                    588.8
9              False                   598.1
9              True                    573.8
=============  ======================  ==========

.. image:: _static/benchmarks/build_trie_with_update.png
    :align: center

key in trie
-----------

=============  ======================  ==========
Key Depth       Runtime Validation     Kops / sec
=============  ======================  ==========
2              False                   4327.1
2              True                    3388.7
3              False                   3617.9
3              True                    2959.5
4              False                   3167.6
4              True                    2657.5
5              False                   2834.5
5              True                    2399.2
6              False                   2555.6
6              True                    2193.9
7              False                   2344.7
7              True                    2045.4
8              False                   2156.1
8              True                    1736.1
9              False                   1946.3
9              True                    1690.6
=============  ======================  ==========


.. image:: _static/benchmarks/key_in_trie.png
    :align: center

key not in trie
--------------- 

=============  ======================  ==========
Key Depth      Runtime Validation      Kops / sec
=============  ======================  ==========
2              False                   5243.8
2              True                    3636.4
3              False                   4024.1
3              True                    3200.0
4              False                   3403.7
4              True                    2849.0
5              False                   3027.6
5              True                    2548.4
6              False                   2642.7
6              True                    2325.0
7              False                   2403.3
7              True                    2143.2
8              False                   2268.6
8              True                    1992.0
9              False                   2072.1
9              True                    1806.7
=============  ======================  ==========

.. image:: _static/benchmarks/key_not_in_trie.png
    :align: center

get()
-----

The get() operation retrieves the value (the TrieEntry)associated with a key in the trie.

This benchmark measures the performance of the get() operation for existing keys
at various depths in the trie, with and without runtime validation enabled.

=============  ======================  ==========
Key Depth      Runtime Validation      Kops / sec
=============  ======================  ==========
2              False                   1460.3
2              True                    1460.3
3              False                   2103.8
3              True                    1852.3
4              False                   2299.3
4              True                    1992.7
5              False                   2151.9
5              True                    1929.5
6              False                   1987.7
6              True                    1778.5
7              False                   1881.2
7              True                    1667.6
8              False                   1698.2
8              True                    1552.3
9              False                   1640.2
9              True                    1495.8
=============  ======================  ==========

.. image:: _static/benchmarks/get_kops_per_second_by_key_depth_and_runtime_validation.png
    :align: center

update()
--------

This benchmark measures the performance of updating values for existing keys in the trie.

The results show the number of values updated per second (Kops) for different key depths
and runtime validation settings.

=============  ======================  ==========
Key Depth       Runtime Validation    Kops / sec
=============  ======================  ==========
2              False                   2106.6
2              True                    1845.0
3              False                   1862.2
3              True                    1683.5
4              False                   1703.6
4              True                    1565.4
5              False                   1633.9
5              True                    1463.0
6              False                   1512.5
6              True                    1396.9
7              False                   1385.5
7              True                    1319.3
8              False                   1371.1
8              True                    1271.2
9              False                   1306.6
9              True                    1205.8
=============  ======================  ==========

.. image:: _static/benchmarks/update_kops_per_second_by_key_depth_and_runtime_validation.png
    :align: center

remove()
--------

This benchmark is designed as a 'semi-worst-case' scenario for the remove() operation.

By generating only keys at the maximum depth and then removing them,
we can observe the performance impact on the trie structure as intermediate
nodes are frequently removed all the way to the root node (requiring the most work).

A true worst case scenario for the remove() operation would involve a degenerate trie,
where all keys share the same prefix, effectively behaving like a linked list. While this
is not the primary focus of the benchmark, it is worth noting that such a structure would
exhibit significantly different performance characteristics than typical trie use cases.

In practice, the performance impact of the remove() operation is usually mitigated by the
structure of the trie and the distribution of keys within it. A degenerate trie is an edge case and
is not representative of typical usage patterns (and probably indicative of a poorly chosen data structure
for the task at hand).

=============  ======================  ==========
Key Depth      Runtime Validation      Kops / sec
=============  ======================  ==========
2               False                  533.3
2               True                   457.1
3               False                  820.5
3               True                   633.7
4               False                  870.7
4               True                   795.0
5               False                  819.2
5               True                   782.3
6               False                  836.8
6               True                   751.1
7               False                  807.7
7               True                   727.4
8               False                  780.4
8               True                   709.3
9               False                  749.9
9               True                   685.8
=============  ======================  ==========

.. image:: _static/benchmarks/remove_kops_per_second_vs_key_depth_and_runtime_validation.png
    :align: center

prefixed_by()
-------------

Performance of the `prefixed_by()` method is more complex due to the nature of prefix matching.

Rather than a simple hit/miss scenario, `prefixed_by()` can return multiple, potentially
thousands or more (!), matches depending on the prefix length, the search depth, and the
number of keys present in the trie.

For this test, a completely filled trie 7 levels deep is used using the same keys as before. Matches are done
starting from key depths of 2 to 4 symbols to a search depth of 1 to 3 symbols from that starting point.

A search for the prefix "01" (a key depth of 2) with a search depth of 1 would match the keys
`"01"`, `"010"`, `"011"`, `"012"` and `"013"` (5 keys).

The number of matches increases exponentially with the search depth in the completely filled trie.
In this case, a search depth of 2 would match 21 keys (the key itself, 4 at the first level below it,
and 16 more at the second level). A search depth of 3 would match 85 keys (1 + 4 + 16 + 64).

At lower levels of a trie, the number of matches can be smaller due to there not being as many keys
because of reaching the maximum depth of the trie, so the performance impact of returning multiple
matches is less pronounced (it effectively limits the search depth). In this benchmark, we made sure
that the trie is fully populated at all levels up to the maximum depth being searched (4 + 3) to avoid
that issue.

As can be seen, while the performance impact of looking up keys is slightly more pronounced at higher
key depths, the performance impact of returning multiple matches is *far* more significant as the search
depth increases, as the number of potential matches grows exponentially.

Turning off runtime validation has a noticeable positive effect on performance, especially at higher search
depths, but is not as significant as the impact of returning multiple matches.

Note that the performance numbers here are for returning all matches as a list,
rather than stopping at the first match found. This is done to provide a more comprehensive
view of the performance characteristics of the `prefixed_by()` method.

Note that although, for example, a search depth of 3 from a key depth of 4 in a trie of depth 7
has a Kops/sec of 77.8 (Runtime Validation == False) or 85.5 (Runtime Validation == True),
it is returning 85 matches each time and so the number of prefixed keys matched per second
is actually 85 times higher (6.6 Mkeys/sec or 7.3 Mkeys/sec respectively).

Real world performance will vary based on the specific use case, data distribution, and other factors.

============  =========  ==========  ================  ==================  ==========
Search Depth  Key Depth  Trie Depth  Returned Matches  Runtime Validation  Kops / sec
============  =========  ==========  ================  ==================  ==========
1             2          7           5                 False               902.5
1             2          7           5                 True                825.8
1             3          7           5                 False               883.4
1             3          7           5                 True                824.4
1             4          7           5                 False               873.4
1             4          7           5                 True                834.0
2             2          7           21                False               308.5
2             2          7           21                True                303.1
2             3          7           21                False               316.8
2             3          7           21                True                304.2
2             4          7           21                False               307.3
2             4          7           21                True                300.6
3             2          7           85                False               86.1
3             2          7           85                True                84.9
3             3          7           85                False               85.8
3             3          7           85                True                83.4
3             4          7           85                False               77.8
3             4          7           85                True                85.5
============  =========  ==========  ================  ==================  ==========

.. image:: _static/benchmarks/prefixed_by_average_kops_per_second_by_runtime_validation.png
    :align: center

.. image:: _static/benchmarks/prefixed_by_search_depth_vs_matched_keys_per_second.png
    :align: center

prefixes()
----------

prefixes() performance is generally better than prefixed_by() as it stops searching as soon as it finds a match
and only returns the match and its parent keys. As the number of matches is limited to a maximum of the key depth,
and key depth generally grows logarithmically (for non-degenerate tries), the performance impact is *far*
less pronounced at greater key depths.

A fully populated trie with a fanout of 4 and a depth 9 contains 349524 nodes - but prefixes() will never
return more than 9 matches for it. With a fanout of 32, the number of nodes in a fully populated 9 level trie
increases exponentially (to more than 36 trillion keys), but the maximum number of prefixes() matches
returned remains unchanged at 9.

The exception here is degenerate tries (tries where all keys share the same prefix). In these cases, the number of
returned prefixes() matches can be much larger, as the trie may effectively behave like a linked list.

=========  ================  ==================  ==========
Key Depth  Returned Matches  Runtime Validation  Kops / sec
=========  ================  ==================  ==========
3          3                 False               1927.9
3          3                 True                1693.5
4          4                 False               1722.7
4          4                 True                1545.5
5          5                 False               1503.1
5          5                 True                1379.9
6          6                 False               1248.9
6          6                 True                1226.5
7          7                 False               1231.5
7          7                 True                1135.4
8          8                 False               1098.0
8          8                 True                993.4
9          9                 False               966.9
9          9                 True                947.1
=========  ================  ==================  ==========

.. image:: _static/benchmarks/prefixes_kops_per_second_by_key_depth_and_runtime_validation.png
    :align: center

------------------
Organic Benchmarks
------------------

Organic benchmarks use real-world data to evaluate trie performance in practical scenarios.

For example, measuring the time it takes create a trie populated
by a large dataset of English words from a list using a `for` loop.

As such, they provide a more realistic assessment of trie performance in typical use cases.

Creating a Trie Of English Words
----------------------------------------------------

The following benchmarks were conducted on a trie populated by a list of 274137 English words.

It is functionally equivalent to benchmarking the performance of this code:

.. code-block:: python
    :linenos:

    from typing import Sequence

    from gentrie import GeneralizedTrie

    def create_dictionary(words: Sequence[str], runtime_validation: bool) -> GeneralizedTrie:
        trie = GeneralizedTrie(runtime_validation=runtime_validation)
        for word in words:
            trie.update(word)
        return trie

As can be seen, the performance impact of runtime validation is minimal in this scenario
- it has less than a 1% impact on net performance.

==========  ==================  ================
Word Count  Runtime Validation  KWords Per Second
=========  ===================  ================
274137     False                234.7
274137     True                 236.4
=========  ===================  ================

Micro-benchmark of the update() method for the same real-world dataset
This shows performance of *just the update() calls*
rather than benchmarking the entire process.

The loop in the organic benchmark adds 

==========  ==================  ================
Word Count  Runtime Validation  KWords Per Second
=========  ===================  ================
274137     False                576.6
274137     True                 566.8
=========  ===================  ================
