Tennis Grand Slam Champions#
This example demonstrates how to create an UpSet plot showing the intersection patterns of tennis Grand Slam tournament winners across different venues.
import altair as alt
import pandas as pd
import altair_upset as au
# Load intersection data
intersections = pd.read_csv("https://huggingface.co/datasets/edmundmiller/Upset/resolve/main/upset2_intersection_data_1737394769694.csv")
# Create an empty DataFrame with the correct columns
columns = ['French Open', 'Australian Open', 'US Open', 'Wimbledon']
total_players = sum(intersections['size'])
data = pd.DataFrame(0, index=range(total_players), columns=columns)
# Fill the DataFrame based on intersection data
current_idx = 0
for _, row in intersections.iterrows():
sets = row['elementName'].split(' & ')
size = int(row['size'])
end_idx = current_idx + size
for set_name in sets:
data.loc[current_idx:end_idx-1, set_name] = 1
current_idx = end_idx
au.UpSetAltair(
data=data,
sets=data.columns.tolist(),
sort_by="degree",
sort_order="descending",
title="Tennis Grand Slam Championships by Player",
subtitle=[
"This plot shows the overlap of tennis Grand Slam tournament winners.",
"Notably, the majority of champions have won only at one tournament venue.",
"Out of 117 champions, only 9 have won at least once at every Grand Slam tournament venue."
],
width=800,
height=500
).chart
The resulting visualization shows several interesting patterns:
Most tennis players have won at only one Grand Slam tournament
The French Open and Australian Open have the highest number of unique winners
Only 9 players have achieved the remarkable feat of winning all four Grand Slam tournaments
There’s a significant overlap between Australian Open, US Open, and Wimbledon winners
This example demonstrates how UpSet plots can effectively visualize complex set intersections in sports data, revealing patterns that would be difficult to see in traditional Venn diagrams.