{% if text_output %}
{{ text_output }}
{% endif %}

Analyses Included in Comparison

{% for result in comparison.results %}
{{ result.name }}

{{ result.themes|length }} themes

    {% for t in result.themes %}
  • {{ t.name }}
  • {% endfor %}
{% endfor %}

Embedding Details

The actual strings that were embedded for similarity comparison. Labels are used in plots; embedded strings are used for calculating similarity.

{% for result in comparison.results %}
{{ result.name }}
{% for item in comparison.embedded_strings.get(result.name, []) %} {% endfor %}
Theme Name Label (in plots) Embedded String (for similarity)
{{ item.theme_name }} {{ item.label }} {{ item.embedded_string }}

{% endfor %}

Theme Network (UMAP Projection)

2-D UMAP projection of theme embeddings from each analysis, shown in different colours. Each point represents a theme; proximity reflects semantic similarity in the original embedding space.

UMAP projection of theme embeddings
Interpreting this plot: UMAP is a non-linear dimensionality-reduction method that prioritises preserving local neighbourhood structure rather than global variance. Nearby points can be interpreted as closely related themes, while larger-scale distances and cluster shapes should be interpreted qualitatively rather than metrically. This plot is intended as an exploratory visualisation of thematic relationships and overlap between sets, not as a quantitative evaluation.

Pairwise Comparisons

Select a pair to view detailed comparison metrics.

{% for key, comp in comparison.by_comparisons().items() %}

{{ comp.a.name }} vs {{ comp.b.name }}

Angular Similarity

Angular distance uses the angle between embedding vectors (arccos of cosine), normalised to [0,1]. Unlike cosine, it satisfies the triangle inequality, making it a proper metric for averaging and comparison.

Continuous Values
Angular distance heatmap
Binary Match (threshold={{ comparison.config.threshold }})
Thresholded heatmap

Summary Statistics

Thematic analysis doesn't have ground truth, so traditional precision/recall don't apply. Instead, we measure coverage (did themes find matches?) and fidelity (how close are the best matches?). Based on {{ comp.stats.similarity_metric }} similarity.

Coverage (Hit Rates)

Proportion of themes with at least one match above threshold ({{ comparison.config.threshold }})

  • Hit Rate A: {{ "%.1f"|format(comp.stats.hit_rate_a * 100) }}%
  • Hit Rate B: {{ "%.1f"|format(comp.stats.hit_rate_b * 100) }}%
  • Pair Match Rate: {{ "%.3f"|format(comp.stats.jaccard) }} (pairs above threshold / total pairs)

High hit rates indicate both analyses found similar conceptual territory. Pair match rate shows the density of above-threshold pairs across all possible theme combinations.

Fidelity (Match Quality)

How close are the best matches? (Mean of each theme's best match similarity)

  • A→B: {{ "%.3f"|format(comp.stats.mean_max_sim_a_to_b) }}
  • B→A: {{ "%.3f"|format(comp.stats.mean_max_sim_b_to_a) }}
  • Fidelity: {{ "%.3f"|format(comp.stats.fidelity) }}

Fidelity is the harmonic mean of directional scores. Higher = tighter semantic alignment.

Similarity Matrix
{{ comp.stats.similarity_matrix }}

Best Matches (1:1)

The Hungarian algorithm finds the optimal one-to-one pairing that maximizes total similarity. Each theme maps to at most one theme in the other set -- no reuse allowed.

Intuition: "If I had to explain set B's themes to someone who only knew set A, which single theme in A would each B theme correspond to, with no reuse?"

What this enables: Hungarian matching removes ambiguity by assigning each theme to at most one partner. Coverage metrics show what proportion of each set found a good match.

Limitation: This penalises legitimate theme refinement (splitting one theme into two is treated as unmatched). Use OT if you want to allow many-to-many alignment.
Mean Matched Similarity (primary metric)

{{ "%.3f"|format(comp.stats.hungarian.soft_metrics.soft_precision) }}

Average similarity of optimal pairs

Interpretation: "How good are the best one-to-one correspondences?" Higher = tighter semantic alignment between the two theme sets.

{% if comp.stats.hungarian.distribution.n_pairs > 0 %}

Distribution of {{ comp.stats.hungarian.distribution.n_pairs }} optimal pairs:

  • Median: {{ "%.3f"|format(comp.stats.hungarian.distribution.median) }}   (Q1: {{ "%.3f"|format(comp.stats.hungarian.distribution.q1) }}, Q3: {{ "%.3f"|format(comp.stats.hungarian.distribution.q3) }})
  • Range: {{ "%.3f"|format(comp.stats.hungarian.distribution.min) }} -- {{ "%.3f"|format(comp.stats.hungarian.distribution.max) }}
{% endif %}
Coverage & Set Overlap

Based on {{ comp.stats.hungarian.distribution.n_pairs }} matched pairs above threshold ({{ comparison.config.threshold }})

{% set cov_a = comp.stats.hungarian.thresholded_metrics.recall %} {% set cov_b = comp.stats.hungarian.thresholded_metrics.precision %} {% set mean_cov = (cov_a + cov_b) / 2 %}

{{ "%.0f"|format(cov_a * 100) }}%

Coverage A

(A themes matched)

{{ "%.0f"|format(cov_b * 100) }}%

Coverage B

(B themes matched)

{{ "%.3f"|format(comp.stats.hungarian.thresholded_metrics.true_jaccard) }}

Jaccard Index

(set overlap)


Jaccard Index = matched / (|A| + |B| - matched). Measures overlap between theme sets after 1:1 assignment. Higher = more themes found good partners.

Optimal Matched Pairs ({{ comp.stats.hungarian.all_pairs|length }})
{% if comp.stats.hungarian.all_pairs|length > 0 %}

Hungarian algorithm finds the optimal one-to-one assignment.

{% for i, j, similarity in comp.stats.hungarian.all_pairs %} {% set theme_a = comp.embedded_a[i] %} {% set theme_b = comp.embedded_b[j] %} {% endfor %}
Theme in {{ comp.a.name }} Theme in {{ comp.b.name }} Angular Similarity
{{ theme_a.theme_name }}
{{ theme_a.embedded_string }}
{{ theme_b.theme_name }}
{{ theme_b.embedded_string }}
{{ "%.3f"|format(similarity) }}
{% else %}

No optimal pairs found.

{% endif %}

Unbalanced Optimal Transport (Many-to-Many Alignment)

Unbalanced Optimal Transport allows themes to remain unmatched, representing genuinely novel or missing concepts. Unlike balanced OT (which forces all mass to transport), unbalanced OT permits themes to be left out when no good match exists. The reg_m (K) parameter controls the penalty for leaving mass unmatched.

■ Default K={{ "%.2f"|format(comp.stats.default_k) }}  |  ◆ Chord elbow K={{ "%.2f"|format(comp.stats.chord_k) }}  |  ▲ Dim. returns K={{ "%.2f"|format(comp.stats.diminishing_k) }} {% if not comp.stats.plateau_reached %}
⚠ Curve may not have plateaued -- elbow estimates may be less reliable{% endif %}

These plots show how shared mass and alignment change as K varies. Baseline curves show the paraphrase ceiling (green, best case) and word-salad floor (red, random baseline) -- both also vary with K because the OT mass penalty affects all comparisons equally.

Shared Mass vs K
Shared mass scree plot

How much thematic content is matched. Higher = more themes matched. Elbow markers: ◆ chord, ▲ dim. returns.

{% if comp.stats.alignment_scree %}
Semantic Alignment vs K
Alignment scree plot

Quality of matches (1 - cost). Higher = better semantic similarity between matched themes.

{% endif %} {% if comp.stats.splits_joins_scree %}
Splits/Joins vs K
Splits/joins scree plot

Average targets per theme. 1.0 = perfect 1:1 matching. Higher = more many-to-many relationships.

{% endif %}
Summary Table: Metrics Across K Values
{% for k_val in comp.stats.k_values %} {% set ot_k = comp.stats.ot_by_k[k_val] %} {% endfor %}
K Shared Mass Alignment Mass % ceiling Align % ceiling Splits/Joins
{{ "%.2f"|format(k_val) }}{% if k_val == comp.stats.default_k %} ■{% endif %}{% if k_val == comp.stats.chord_k %} ◆{% endif %}{% if k_val == comp.stats.diminishing_k %} ▲{% endif %} {{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}% {% if ot_k.ot.alignment_observed is defined %}{{ "%.2f"|format(ot_k.ot.alignment_observed) }}{% else %}{{ "%.2f"|format(1 - ot_k.ot.avg_cost) }}{% endif %} {% if ot_k.ot.shared_mass_pct_of_ceiling is defined %}{{ "%.0f"|format(ot_k.ot.shared_mass_pct_of_ceiling * 100) }}%{% elif ot_k.ot.shared_mass_relative is defined %}{{ "%.0f"|format(ot_k.ot.shared_mass_relative * 100) }}%{% else %}-{% endif %} {% if ot_k.ot.alignment_pct_of_ceiling is defined %}{{ "%.0f"|format(ot_k.ot.alignment_pct_of_ceiling * 100) }}%{% elif ot_k.ot.avg_cost_relative is defined %}{{ "%.0f"|format(ot_k.ot.avg_cost_relative * 100) }}%{% else %}-{% endif %} {{ "%.1f"|format((ot_k.split_join_stats.splits_from_a.mean + ot_k.split_join_stats.joins_to_b.mean) / 2) }}

■ = Default K, ◆ = Chord elbow, ▲ = Dim. returns. % of ceiling = how close to paraphrase baseline (100% = as good as identical meaning). Higher K forces more transport; lower K allows more unmatched themes.

{% if comp.stats.paraphrase_baseline %}
Paraphrase Baseline

LLM-generated paraphrases of each theme establish a realistic upper bound for alignment. Paraphrases capture the same meaning in different words -- this represents the best achievable similarity between semantically equivalent analyses.

Baseline Statistics
  • Mean self-similarity: {{ "%.3f"|format(comp.stats.paraphrase_baseline.paraphrase_similarity_mean) }}
  • Std dev: {{ "%.3f"|format(comp.stats.paraphrase_baseline.paraphrase_similarity_std) }}
  • Model: {{ comp.stats.paraphrase_baseline.metadata.model }}
  • Paraphrases per theme: {{ comp.stats.paraphrase_baseline.metadata.n_paraphrases }}
Interpretation

The paraphrase ceiling represents the best realistic case -- comparing themes to their own LLM paraphrases (same meaning, different words). If observed alignment reaches this level, the analyses are semantically equivalent.

{{ comp.a.name }} -- Sample Themes with Paraphrases
{% for sample in comp.stats.paraphrase_baseline.samples_a %}
Original:
{{ sample.original }}
Paraphrases (self-similarity: {{ "%.3f"|format(sample.similarity) }}):
{% for para in sample.paraphrases %}
{{ loop.index }} {{ para }}
{% endfor %}
{% endfor %}
{{ comp.b.name }} -- Sample Themes with Paraphrases
{% for sample in comp.stats.paraphrase_baseline.samples_b %}
Original:
{{ sample.original }}
Paraphrases (self-similarity: {{ "%.3f"|format(sample.similarity) }}):
{% for para in sample.paraphrases %}
{{ loop.index }} {{ para }}
{% endfor %}
{% endfor %}
{% endif %} {% if comp.stats.word_salad_samples %}
Word Salad Baseline (Random Floor)

Word salad is generated by randomly shuffling words from themes, destroying semantic meaning. This represents what you'd expect from random text with similar vocabulary -- a floor below which alignment cannot meaningfully fall.

Generation Method
  • Samples generated: {{ comp.stats.word_salad_samples|length }}
  • Themes per sample: {{ comp.stats.word_salad_samples[0]|length }}
  • Method: Words randomly shuffled while preserving theme length
Interpretation

If observed alignment is close to the word-salad floor, the themes may not share meaningful semantic content. The further above this baseline, the more genuine the semantic similarity.

All {{ comp.stats.word_salad_samples|length }} Word Salad Samples

Each sample contains {{ comp.stats.word_salad_samples[0]|length }} scrambled "themes" (matching B's theme count). Words from B's themes are randomly shuffled while preserving length.

{% for sample_idx, sample in enumerate(comp.stats.word_salad_samples) %}
Sample {{ sample_idx + 1 }}
{% for text in sample %}
{{ loop.index }} {{ text }}
{% endfor %}
{% endfor %}
{% endif %} {% for k_val in comp.stats.k_values %} {% set ot_k = comp.stats.ot_by_k[k_val] %}
Shared Mass
{% if ot_k.ot.shared_mass_pct_of_ceiling is defined %}

{{ "%.0f"|format(ot_k.ot.shared_mass_pct_of_ceiling * 100) }}%

of paraphrase ceiling

best-case: identical meaning, different words


  • Observed: {{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}%
  • Paraphrase ceiling: {{ "%.1f"|format(ot_k.ot.paraphrase_upper_bound * 100) }}%
  • Word-salad floor: {{ "%.1f"|format(ot_k.ot.null_shared_mass_mean * 100) }}%
  • vs word-salad: {{ "%.0f"|format(ot_k.ot.shared_mass_improvement_vs_null * 100) }}% of possible improvement
  • Effect size: {{ "%.1f"|format(ot_k.ot.shared_mass_effect) }} MADs
{% elif ot_k.ot.shared_mass_relative is defined %}

{{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}%

shared mass


  • Word-salad floor: {{ "%.1f"|format(ot_k.ot.null_shared_mass_mean * 100) }}%
  • vs word-salad: +{{ "%.1f"|format(ot_k.ot.shared_mass_excess * 100) }}pp
  • Effect size: {{ "%.1f"|format(ot_k.ot.shared_mass_effect) }} MADs

Paraphrase baseline not available -- showing raw metrics.

{% else %}

{{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}%

Shared Mass

{% endif %}
{% if ot_k.ot.shared_mass_relative_paraphrase is defined %}

% of paraphrase ceiling (headline) = observed / ceiling. Shows what fraction of the best-case alignment was achieved. 100% would mean alignment as good as comparing themes to their own paraphrases.

% of possible improvement (vs word-salad) = (observed - floor) / (ceiling - floor). Shows progress from random baseline toward the ceiling -- how much better than chance, relative to the best case.

Paraphrase ceiling is computed by generating LLM paraphrases of each theme (same meaning, different words) and measuring their self-similarity.

Word-salad floor is computed by randomly shuffling words from each theme, destroying semantic meaning. This represents what you'd expect from random text with similar vocabulary.

{% else %}

Word-salad comparison tests whether observed alignment exceeds what you'd expect from random text. Word-salad is generated by shuffling words from themes, destroying semantic meaning.

Effect size shows how many MADs (median absolute deviations) above the word-salad baseline the observed value falls.

{% endif %}
Semantic Alignment
{% if ot_k.ot.alignment_pct_of_ceiling is defined %}

{{ "%.0f"|format(ot_k.ot.alignment_pct_of_ceiling * 100) }}%

of paraphrase ceiling

quality of theme-to-theme matches


  • Observed: {{ "%.2f"|format(ot_k.ot.alignment_observed) }}
  • Paraphrase ceiling: {{ "%.2f"|format(ot_k.ot.alignment_paraphrase_ceiling) }}
  • Word-salad floor: {{ "%.2f"|format(ot_k.ot.alignment_null_floor) }}
  • vs word-salad: {{ "%.0f"|format(ot_k.ot.alignment_improvement_vs_null * 100) }}% of possible improvement
{% elif ot_k.ot.avg_cost_relative is defined %} {% set fallback_alignment = 1 - ot_k.ot.avg_cost %} {% set fallback_floor = 1 - ot_k.ot.null_avg_cost_mean %}

{{ "%.1f"|format(fallback_alignment * 100) }}%

alignment (1 - cost)


  • Word-salad floor: {{ "%.1f"|format(fallback_floor * 100) }}%
  • vs word-salad: +{{ "%.1f"|format((fallback_alignment - fallback_floor) * 100) }}pp better

Paraphrase baseline not available -- showing raw alignment vs word-salad floor.

{% else %}

{{ "%.3f"|format(ot_k.ot.avg_cost) }}

Average Cost

{% endif %}
{% if ot_k.ot.alignment_pct_of_ceiling is defined %}

Semantic alignment measures the quality of theme-to-theme matches (computed as 1 - transport cost). Higher alignment = better semantic similarity between matched themes.

% of paraphrase ceiling (headline) = observed / ceiling. Shows what fraction of the best-case alignment was achieved. 100% would mean matches as semantically close as paraphrases.

% of possible improvement (vs word-salad) = (observed - floor) / (ceiling - floor). Shows progress from random baseline toward the ceiling.

Paraphrase ceiling = alignment when comparing themes to their own paraphrases. Word-salad floor = alignment when comparing to randomly shuffled words.

{% else %}

Semantic alignment measures how well matched themes relate to each other. Higher = better semantic similarity.

{% endif %}
Splits from A

Themes in A flowing to multiple themes in B

  • Mean: {{ "%.2f"|format(ot_k.split_join_stats.splits_from_a.mean) }}
  • Median: {{ "%.1f"|format(ot_k.split_join_stats.splits_from_a.median) }}
  • Mode: {{ ot_k.split_join_stats.splits_from_a.mode }}
  • Max: {{ ot_k.split_join_stats.splits_from_a.max }}
  • Themes with >1 target: {{ ot_k.split_join_stats.splits_from_a.n_multiple }}/{{ ot_k.split_join_stats.splits_from_a.total }} ({{ "%.0f"|format(ot_k.split_join_stats.splits_from_a.pct_multiple * 100) }}%)
{% if ot_k.split_join_stats.splits_from_a.counts %}

Distribution (# targets → # themes)

{% set max_count = ot_k.split_join_stats.splits_from_a.counts.values()|max %} {% for n, count in ot_k.split_join_stats.splits_from_a.counts.items() %}
{{ n }} ({{ count }})
{% endfor %}
{% endif %}
Joins to B

Themes in B receiving from multiple themes in A

  • Mean: {{ "%.2f"|format(ot_k.split_join_stats.joins_to_b.mean) }}
  • Median: {{ "%.1f"|format(ot_k.split_join_stats.joins_to_b.median) }}
  • Mode: {{ ot_k.split_join_stats.joins_to_b.mode }}
  • Max: {{ ot_k.split_join_stats.joins_to_b.max }}
  • Themes with >1 source: {{ ot_k.split_join_stats.joins_to_b.n_multiple }}/{{ ot_k.split_join_stats.joins_to_b.total }} ({{ "%.0f"|format(ot_k.split_join_stats.joins_to_b.pct_multiple * 100) }}%)
{% if ot_k.split_join_stats.joins_to_b.counts %}

Distribution (# sources → # themes)

{% set max_count = ot_k.split_join_stats.joins_to_b.counts.values()|max %} {% for n, count in ot_k.split_join_stats.joins_to_b.counts.items() %}
{{ n }} ({{ count }})
{% endfor %}
{% endif %}
Baseline Comparison (K={{ "%.2f"|format(k_val) }})

How we calibrate alignment: To interpret the observed alignment, we compare against two reference points. The floor is a null baseline -- random "word-salad" sentences constructed from words in the themes, representing what we'd see by chance. The ceiling is a best-case baseline -- LLM-generated paraphrases that retain the original meaning but use different wording, representing the maximum similarity we'd expect between genuinely equivalent themes.

{% if comp.stats.paraphrase_baseline and ot_k.ot.shared_mass_pct_of_ceiling is defined %}
Shared Mass
{{ "%.0f"|format(ot_k.ot.shared_mass_pct_of_ceiling * 100) }}% of paraphrase ceiling

The observed shared mass ({{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}%) is {{ "%.0f"|format(ot_k.ot.shared_mass_pct_of_ceiling * 100) }}% of the paraphrase ceiling ({{ "%.1f"|format(ot_k.ot.paraphrase_upper_bound * 100) }}%).

vs word-salad ({{ "%.1f"|format(ot_k.ot.null_shared_mass_mean * 100) }}%): {{ "%.0f"|format(ot_k.ot.shared_mass_improvement_vs_null * 100) }}% of possible improvement over word-salad.

Alignment Quality
{% set alignment_obs = ot_k.ot.alignment_observed if ot_k.ot.alignment_observed is defined else (1 - ot_k.ot.avg_cost) %} {% set alignment_ceiling = ot_k.ot.alignment_paraphrase_ceiling if ot_k.ot.alignment_paraphrase_ceiling is defined else none %} {% set alignment_floor = ot_k.ot.alignment_null_floor if ot_k.ot.alignment_null_floor is defined else (1 - ot_k.ot.null_avg_cost_mean) %} {% if ot_k.ot.alignment_pct_of_ceiling is defined %}
{{ "%.0f"|format(ot_k.ot.alignment_pct_of_ceiling * 100) }}% of paraphrase ceiling

The observed alignment ({{ "%.1f"|format(alignment_obs * 100) }}%) is {{ "%.0f"|format(ot_k.ot.alignment_pct_of_ceiling * 100) }}% of the paraphrase ceiling ({{ "%.1f"|format(alignment_ceiling * 100) }}%).

{% if ot_k.ot.alignment_improvement_vs_null is defined %}

vs word-salad ({{ "%.1f"|format(alignment_floor * 100) }}%): {{ "%.0f"|format(ot_k.ot.alignment_improvement_vs_null * 100) }}% of possible improvement over word-salad.

{% endif %} {% else %}
{{ "%.1f"|format(alignment_obs * 100) }}% observed

Alignment data not available for this K value.

{% endif %}
{% endif %}
Raw Values (K={{ "%.2f"|format(k_val) }})
{% if comp.stats.paraphrase_baseline %} {% endif %} {% if comp.stats.paraphrase_baseline %} {% endif %}
Baseline Shared Mass Alignment
Paraphrase ceiling (best realistic) {{ "%.1f"|format(ot_k.ot.paraphrase_upper_bound * 100) }}% {% if ot_k.ot.alignment_paraphrase_ceiling is defined %}{{ "%.2f"|format(ot_k.ot.alignment_paraphrase_ceiling) }}{% else %}-{% endif %}
Observed (A ↔ B) {{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}% {% if ot_k.ot.alignment_observed is defined %}{{ "%.2f"|format(ot_k.ot.alignment_observed) }}{% else %}{{ "%.2f"|format(1 - ot_k.ot.avg_cost) }}{% endif %}
Word-salad floor (random) {{ "%.1f"|format(ot_k.ot.null_shared_mass_mean * 100) }}% {% if ot_k.ot.alignment_null_floor is defined %}{{ "%.2f"|format(ot_k.ot.alignment_null_floor) }}{% else %}{{ "%.2f"|format(1 - ot_k.ot.null_avg_cost_mean) }}{% endif %}
% of ceiling (observed/ceiling) {% if ot_k.ot.shared_mass_pct_of_ceiling is defined %} {{ "%.0f"|format(ot_k.ot.shared_mass_pct_of_ceiling * 100) }}% {% else %}-{% endif %} {% if ot_k.ot.alignment_pct_of_ceiling is defined %} {{ "%.0f"|format(ot_k.ot.alignment_pct_of_ceiling * 100) }}% {% else %}-{% endif %}
% of possible improvement (from word-salad floor) {% if ot_k.ot.shared_mass_improvement_vs_null is defined %} {{ "%.0f"|format(ot_k.ot.shared_mass_improvement_vs_null * 100) }}% {% else %}-{% endif %} {% if ot_k.ot.alignment_improvement_vs_null is defined %} {{ "%.0f"|format(ot_k.ot.alignment_improvement_vs_null * 100) }}% {% else %}-{% endif %}

Shared Mass How much thematic content could be matched between the two analyses. Higher = more overlap.

Alignment Quality of theme-to-theme matches (1 - transport cost). Higher = better semantic similarity between matched themes.

Paraphrase ceiling: The best realistic case -- comparing themes to their own LLM paraphrases (same meaning, different words).

Word-salad floor: Random baseline -- comparing to shuffled words with no semantic meaning.

{% if comp.stats.word_salad_samples %}

Each sample contains {{ comp.stats.word_salad_samples[0]|length }} scrambled "themes" (matching B's theme count). Words from B's themes are randomly shuffled while preserving length.

{% for sample_idx, sample in enumerate(comp.stats.word_salad_samples) %}
Sample {{ sample_idx + 1 }}:
{% for text in sample %}
{{ loop.index }} {{ text }}
{% endfor %}
{% endfor %}
{% endif %} {% if comp.stats.paraphrase_baseline %}
Paraphrase Upper Bound

LLM-generated paraphrases of each theme establish a realistic upper bound for alignment. Paraphrases capture the same meaning in different words -- this represents the best achievable similarity between semantically equivalent analyses.

  • Mean self-similarity: {{ "%.3f"|format(comp.stats.paraphrase_baseline.paraphrase_similarity_mean) }} (paraphrase ceiling)
  • Std dev: {{ "%.3f"|format(comp.stats.paraphrase_baseline.paraphrase_similarity_std) }}
  • Model: {{ comp.stats.paraphrase_baseline.metadata.model }}
  • Paraphrases per theme: {{ comp.stats.paraphrase_baseline.metadata.n_paraphrases }}

Sample themes with their LLM-generated paraphrases. Self-similarity shown for each theme (similarity between original and paraphrases).

{{ comp.a.name }}:
{% for sample in comp.stats.paraphrase_baseline.samples_a[:3] %}
Original:
{{ sample.original }}
Paraphrases (self-sim: {{ "%.3f"|format(sample.similarity) }}):
{% for para in sample.paraphrases %}
{{ loop.index }} {{ para }}
{% endfor %}
{% endfor %}
{{ comp.b.name }}:
{% for sample in comp.stats.paraphrase_baseline.samples_b[:3] %}
Original:
{{ sample.original }}
Paraphrases (self-sim: {{ "%.3f"|format(sample.similarity) }}):
{% for para in sample.paraphrases %}
{{ loop.index }} {{ para }}
{% endfor %}
{% endfor %}
{% endif %}
Effect Sizes

How far is the observed value from the baselines? Measured in MADs (median absolute deviations).

Metric MADs above floor MADs below ceiling
Shared Mass +{{ "%.1f"|format(ot_k.ot.shared_mass_effect) }} {% if ot_k.ot.paraphrase_upper_bound is defined and comp.stats.paraphrase_baseline and comp.stats.paraphrase_baseline.paraphrase_similarity_std > 0 %} -{{ "%.1f"|format((ot_k.ot.paraphrase_upper_bound - ot_k.ot.shared_mass) / comp.stats.paraphrase_baseline.paraphrase_similarity_std) }} {% else %}-{% endif %}
Alignment {% if ot_k.ot.avg_cost_effect is defined %}+{{ "%.1f"|format(ot_k.ot.avg_cost_effect) }}{% else %}-{% endif %} {% if ot_k.ot.paraphrase_cost_lower_bound is defined and comp.stats.paraphrase_baseline and comp.stats.paraphrase_baseline.paraphrase_similarity_std > 0 %} {% set observed_alignment = 1 - ot_k.ot.avg_cost %} {% set ceiling_alignment = 1 - ot_k.ot.paraphrase_cost_lower_bound %} -{{ "%.1f"|format((ceiling_alignment - observed_alignment) / comp.stats.paraphrase_baseline.paraphrase_similarity_std) }} {% else %}-{% endif %}

MADs above floor: How many MADs above word-salad baseline (higher = more distinct from random).

MADs below ceiling: How many MADs below paraphrase ceiling (lower = closer to ideal).

Note: MAD (median absolute deviation) is a robust measure of spread, less sensitive to outliers than standard deviation. Do not compare effect sizes across analyses with different embedding lengths.

Embedding Metadata
  • Mean embedding length A: {{ "%.1f"|format(comp.stats.mean_embedding_words_a) }} words
  • Mean embedding length B: {{ "%.1f"|format(comp.stats.mean_embedding_words_b) }} words
Transport Visualisations (K={{ "%.2f"|format(k_val) }})
Transport Flow (Sankey)

Width of links shows amount of mass transported between themes. Colour indicates alignment quality (green = high similarity, red = low similarity). Hover over links for details.

Colour scale calibration. Link colours represent the cosine similarity between connected themes, mapped to a green-amber-red gradient. To ensure comparability across K values, all plots use a shared colour scale derived from the default K={{ "%.2f"|format(comp.stats.default_k) }} transport plan.

The scale endpoints are set to the minimum and maximum similarity values observed among links in the default K solution (similarity range: {{ "%.2f"|format(comp.stats.color_sim_min) }}--{{ "%.2f"|format(comp.stats.color_sim_max) }}). Green indicates the highest-similarity matches; red indicates the lowest-similarity matches within this analysis.

This calibration means that as K increases and additional lower-quality matches are transported, these appear as progressively redder links. At low K values, only the best matches (greenest links) are transported; higher K values force the algorithm to include weaker alignments. The consistent scale across K values allows direct visual comparison of match quality.

Note: Because the scale is normalised to each comparison's observed range, colours are not directly comparable across different pairwise comparisons. Within a single comparison, however, the colour scale provides an intuitive representation of relative alignment quality across the full range of K values examined.

Transport Plan Heatmap

Each cell shows percentage of transported mass flowing from A to B theme. Values sum to 100%.

Transport heatmap
Coverage by Theme (K={{ "%.2f"|format(k_val) }})

For each theme, how much of its mass was transported? Low coverage = theme is conceptually distinct from the other set.

{{ comp.a.name }} → {{ comp.b.name }} {% for i, theme in enumerate(comp.embedded_a) %} {% endfor %}
Theme Coverage
{{ theme.theme_name }} {{ "%.2f"|format(ot_k.ot.coverage_a[i]) }}
{{ comp.b.name }} → {{ comp.a.name }} {% for i, theme in enumerate(comp.embedded_b) %} {% endfor %}
Theme Coverage
{{ theme.theme_name }} {{ "%.2f"|format(ot_k.ot.coverage_b[i]) }}
{% endfor %}

Best Matches (many:many)

Shows best match for each theme, allowing multiple themes to match the same target. OT columns show mass flow from optimal transport (default K={{ "%.2f"|format(comp.stats.default_k) }}).

{{ comp.a.name }} → {{ comp.b.name }}

For each theme in {{ comp.a.name }}, the most similar theme in {{ comp.b.name }}

{% for match in comp.stats.best_matches_a_to_b %} {% set theme_a = comp.embedded_a[match.theme_a_index] %} {% set theme_b = comp.embedded_b[match.theme_b_index] %} {% endfor %}
Theme in {{ comp.a.name }} Best Match in {{ comp.b.name }} Sim % Mass Transferred Coverage
{{ theme_a.theme_name }}
{{ theme_a.embedded_string }}
{{ theme_b.theme_name }}
{{ theme_b.embedded_string }}
{{ "%.2f"|format(match.similarity) }} {{ "%.0f"|format(match.mass_pct) }}% {{ "%.1f"|format(match.mass_total * 100) }}%
{{ comp.b.name }} → {{ comp.a.name }}

For each theme in {{ comp.b.name }}, the most similar theme in {{ comp.a.name }}

{% for match in comp.stats.best_matches_b_to_a %} {% set theme_b = comp.embedded_b[match.theme_b_index] %} {% set theme_a = comp.embedded_a[match.theme_a_index] %} {% endfor %}
Theme in {{ comp.b.name }} Best Match in {{ comp.a.name }} Sim % Mass Transferred Coverage
{{ theme_b.theme_name }}
{{ theme_b.embedded_string }}
{{ theme_a.theme_name }}
{{ theme_a.embedded_string }}
{{ "%.2f"|format(match.similarity) }} {{ "%.0f"|format(match.mass_pct) }}% {{ "%.1f"|format(match.mass_total * 100) }}%

Additional Distance Metrics

Alternative distance functions for specialised analyses.

Shepard Similarity (k={{ comp.stats.shepard_k_value }})

Exponential decay on angular distance. Cognitively realistic similarity function.

Within-set baseline: Mean = {{ "%.3f"|format(comp.stats.within_set_stats.mean) }}, SD = {{ "%.3f"|format(comp.stats.within_set_stats.std) }}

Shepard similarity heatmap
Percentile-Normalized

Cross-set similarity relative to within-set distribution. 0.80 = more similar than 80% of within-set pairs.

Percentile-normalized heatmap
Z-Score Normalized

Standard deviations above/below typical within-set similarity. Useful for identifying outliers.

Z-score normalized heatmap
{% endfor %}

Comparison Configuration

{{ comparison.config | tojson(indent=2) }}

Additional Data

Download raw data files for further analysis.

{% for key, comp in comparison.by_comparisons().items() %} {% if comparison.comparison_plots.embeddings_csv and comparison.comparison_plots.embeddings_csv[key] %} {{ comp.a.name }} vs {{ comp.b.name }} -- Embeddings CSV {% endif %} {% endfor %}