How to Load the Vienna 4x22 Corpus

MatchfileLoader, 22 performances, AlignmentBundle

How to Load the Vienna 4x22 Corpus

This guide demonstrates how to load score-to-performance alignment data from the Vienna 4x22 Corpus using the MatchfileLoader.

About the Dataset

The full 4x22 Vienna Corpus (Grachten & Widmer, 2012) contains four classical piano pieces, each performed by 22 pianists, yielding 88 score-to-performance alignments in the .match file format. Each .match file encodes three things simultaneously:

  • A score representation (note identities, pitch, metrical position, duration in quarter beats).
  • A performance representation (MIDI pitch, onset/offset in ticks, velocity).
  • An alignment linking each score note to its performed counterpart (or marking it as a deletion when the pianist omitted it).

What we demonstrate here is a 1x22 sample — one piece (Chopin Etude Op. 10 No. 3 in E major) performed by all 22 pianists. This is one quarter of the full dataset, and it is the specimen shipped with the TimeToAlign! test suite. The workflow generalises straightforwardly to the remaining three pieces.

What you will learn:

  1. Load all 22 .match files through a single MatchfileLoader
  2. Inspect the resulting timelines and their TimeStamps
  3. Assemble the AlignmentBundle and query its MatchClaims
  4. Obtain MatchStamps — the cross-timeline coordinate cross-section

Setup

from pathlib import Path

from timetoalign import MatchfileLoader
from timetoalign.alignment import MatchGraph

_notebook_dir = Path(".").resolve()
DATA_DIR = _notebook_dir.parent.parent / "tests" / "data" / "vienna_1x22"
assert DATA_DIR.is_dir(), f"Data directory not found: {DATA_DIR}"

match_files = sorted(DATA_DIR.glob("*.match"))
len(match_files)
22

22 .match files — one per pianist — all sharing the same score (Chopin Op. 10 No. 3).


Step 1: Load All Match Files

The MatchfileLoader processes all .match files for a given piece through a single instance. It builds a shared score timeline from the first file and verifies each subsequent file against it. Incompatible files are rejected with a warning; compatible files contribute their performance timeline and match claims.

loader = MatchfileLoader()
loader.load(*match_files)
loader
MatchfileLoader(performances=22, claims=9988, rejected=0)

All 22 files loaded successfully — zero rejected.


Step 2: Discover and Inspect Timelines

Before assembling the bundle, we can access the timelines directly from the loader. This is useful for inspecting what was parsed.

create_timelines() returns all timelines as a list (score first):

all_timelines = loader.create_timelines()
[(tl.id, tl.class_name, tl.n_events) for tl in all_timelines]
[('score:Chopin_op10_no3', 'ContinuousLogicalTimeline', 454),
 ('perf:Chopin_op10_no3_p01', 'DiscreteLogicalTimeline', 451),
 ('perf:Chopin_op10_no3_p02', 'DiscreteLogicalTimeline', 448),
 ('perf:Chopin_op10_no3_p03', 'DiscreteLogicalTimeline', 452),
 ('perf:Chopin_op10_no3_p04', 'DiscreteLogicalTimeline', 450),
 ('perf:Chopin_op10_no3_p05', 'DiscreteLogicalTimeline', 450),
 ('perf:Chopin_op10_no3_p06', 'DiscreteLogicalTimeline', 451),
 ('perf:Chopin_op10_no3_p07', 'DiscreteLogicalTimeline', 451),
 ('perf:Chopin_op10_no3_p08', 'DiscreteLogicalTimeline', 434),
 ('perf:Chopin_op10_no3_p09', 'DiscreteLogicalTimeline', 436),
 ('perf:Chopin_op10_no3_p10', 'DiscreteLogicalTimeline', 447),
 ('perf:Chopin_op10_no3_p11', 'DiscreteLogicalTimeline', 450),
 ('perf:Chopin_op10_no3_p12', 'DiscreteLogicalTimeline', 453),
 ('perf:Chopin_op10_no3_p13', 'DiscreteLogicalTimeline', 452),
 ('perf:Chopin_op10_no3_p14', 'DiscreteLogicalTimeline', 450),
 ('perf:Chopin_op10_no3_p15', 'DiscreteLogicalTimeline', 449),
 ('perf:Chopin_op10_no3_p16', 'DiscreteLogicalTimeline', 448),
 ('perf:Chopin_op10_no3_p17', 'DiscreteLogicalTimeline', 451),
 ('perf:Chopin_op10_no3_p18', 'DiscreteLogicalTimeline', 450),
 ('perf:Chopin_op10_no3_p19', 'DiscreteLogicalTimeline', 452),
 ('perf:Chopin_op10_no3_p20', 'DiscreteLogicalTimeline', 447),
 ('perf:Chopin_op10_no3_p21', 'DiscreteLogicalTimeline', 451),
 ('perf:Chopin_op10_no3_p22', 'DiscreteLogicalTimeline', 452)]

23 timelines: 1 score + 22 performances. Each performance has a slightly different event count because some pianists omit notes (deletions).

Individual timelines are accessed by role shorthand:

score = loader.create_timeline("score")
score
ContinuousLogicalTimeline[score:Chopin_op10_no3] (454 events, 2 cmaps)
                      0 ________________________________ 41.5 quarters

The score is a ContinuousLogicalTimeline in quarter-beat coordinates, carrying two conversion maps: raw_to_normalised (a ShiftMap for anacrusis offset) and quarters_to_divs (a ScalarMap to MIDI divisions).

perf_01 = loader.create_timeline("perf:p01")
perf_01
DiscreteLogicalTimeline[perf:Chopin_op10_no3_p01] (451 events, 1 cmaps)
                      0 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 79900 ticks

Each performance is a DiscreteLogicalTimeline in MIDI tick coordinates, with a ticks_to_seconds ScalarMap attached.


Step 3: TimeStamps — the Cross-Section View

A TimeStamp is the primary interface for querying what happens at a given coordinate on a timeline. It returns the coordinate itself plus all conversion map results in a single cross-section.

Score TimeStamp

At quarter-beat 10.0, the score timestamp shows the coordinate in quarters, plus the raw (un-normalised) partitura value and the MIDI divisions equivalent:

score.get_timestamp(10.0)
TimeStamp interpolated
ID Coordinate Type
score:Chopin_op10_no3 10 quarters axis
quarters 9.5 quarters cmap
ticks 4800 ticks cmap

Performance TimeStamp

At tick 10000, the performance timestamp shows the tick coordinate plus the converted seconds value:

perf_01.get_timestamp(10000.0)
TimeStamp interpolated
ID Coordinate Type
perf:Chopin_op10_no3_p01 10000 ticks axis
seconds 10.416667 seconds cmap

Step 4: Create the AlignmentBundle

create_alignment_bundle() assembles an AlignmentBundle from the loaded data. The score goes into its own group; each performance is a standalone timeline; MatchClaims connect them.

bundle = loader.create_alignment_bundle()
bundle
AlignmentBundle[bundle:AlignmentBundle_1]

  TimelineGroup[score] (1 timelines, 2 timestamps)
  ┌────────────────────────────────────────────────────────────────────────────┐
  │ ContinuousLogicalTimeline[score:Chopin_op10_no3] (454 events, 2 cmaps)     │
  │                       0 ____________________________________ 41.5 quarters │
  └────────────────────────────────────────────────────────────────────────────┘
  Timestamps: 2

  Standalone timelines (22):
    perf:Chop...     0 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 79900 ticks (451 ev)
    perf:Chop...     0 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 72365 ticks (448 ev)
    perf:Chop...     0 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 76999 ticks (452 ev)
    ... (16 more)
    perf:Chop...     0 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 86143 ticks (447 ev)
    perf:Chop...     0 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 83149 ticks (451 ev)
    perf:Chop...     0 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 77497 ticks (452 ev)

  MatchClaims: 9988

The diagram shows the score group (with its conversion maps reflected in the timeline), 22 standalone performance timelines (with proportional bar widths reflecting their different lengths in ticks), and the total number of cross-group MatchClaims.


Step 5: Viewing and Querying MatchClaims

MatchClaims are stored on the bundle as cross_group_claims. Each claim connects a score event to a performance event (synchronous match) or records a deletion (non-synchronous NOMATCH).

claims = bundle.cross_group_claims
len(claims)
9988

9,988 claims across 22 performers (22 x 454 snote records per file).

Filtering claims for a specific performer:

p01_claims = [c for c in claims if c.connects(perf_01.id)]
synch_p01 = [c for c in p01_claims if c.is_synchronous]
nomatch_p01 = [c for c in p01_claims if not c.is_synchronous]

{
    "performer": perf_01.id,
    "total_claims": len(p01_claims),
    "synchronous (matched notes)": len(synch_p01),
    "nomatch (deletions)": len(nomatch_p01),
}
{'performer': 'perf:Chopin_op10_no3_p01',
 'total_claims': 454,
 'synchronous (matched notes)': 451,
 'nomatch (deletions)': 3}

A single claim looks like this:

synch_p01[0]
MatchClaim synchronous, interval
Timeline A score:Chopin_op10_no3 [0 – 0.5]
Timeline B perf:Chopin_op10_no3_p01 [0 – 261]
Metadata agent=vienna_match_v1.0.0

The claim shows: interval match between score coordinate [0.0, 0.5] quarters and performance coordinate [0, 261] ticks. Both anchors (start and end) are present because this is a synchronous interval match.


Step 6: MatchStamps from Individual Claims

A MatchStamp is the cross-timeline analogue of a TimeStamp. Where a TimeStamp shows coordinates within one timeline (plus C-Map conversions), a MatchStamp shows the synchronised coordinate across multiple timelines linked by MatchClaims.

To obtain a MatchStamp, wrap one or more claims in a MatchGraph and call get_stamps():

mg_single = MatchGraph(claims=[synch_p01[0]])
stamps = mg_single.get_stamps()
stamps[0]
MatchStamp 2 timelines, 1 edges
ID Coordinate Type
score:Chopin_op10_no3 0 anchor
perf:Chopin_op10_no3_p01 0 anchor

The MatchStamp shows the score coordinate (0.0 quarters) and the corresponding performance coordinate (0.0 ticks) — the union of what both TimeStamps would show individually.

For an interval claim, there are two stamps (start and end):

len(stamps)
2
stamps[1]
MatchStamp 2 timelines, 1 edges
ID Coordinate Type
perf:Chopin_op10_no3_p01 261 anchor
score:Chopin_op10_no3 0.5 anchor

Step 7: MatchGraph Across All Performers

The real power emerges when building a MatchGraph from all synchronous claims across all 22 performers. Each connected component in the graph produces a single MatchStamp spanning every timeline that shares that score coordinate.

all_synch = [c for c in claims if c.is_synchronous]
mg_all = MatchGraph(claims=all_synch)

{
    "synchronous_claims": mg_all.n_claims,
    "graph_nodes": mg_all.n_nodes,
    "graph_edges": mg_all.n_edges,
    "timelines_in_graph": len(mg_all.timeline_ids),
}
{'synchronous_claims': 9875,
 'graph_nodes': 19574,
 'graph_edges': 19433,
 'timelines_in_graph': 23}
stamps_all = mg_all.get_stamps()
len(stamps_all)
142

Each stamp is a full cross-section. The first stamp (at score coordinate 0.0) spans all 23 timelines — the score plus all 22 performances that have a matched note at that position:

s0 = stamps_all[0]
s0.n_timelines
23
s0
MatchStamp 23 timelines, 22 edges
ID Coordinate Type
perf:Chopin_op10_no3_p06 0 anchor
perf:Chopin_op10_no3_p20 6 anchor
perf:Chopin_op10_no3_p01 0 anchor
perf:Chopin_op10_no3_p05 0 anchor
perf:Chopin_op10_no3_p15 3 anchor
perf:Chopin_op10_no3_p16 0 anchor
perf:Chopin_op10_no3_p04 0 anchor
perf:Chopin_op10_no3_p03 0 anchor
perf:Chopin_op10_no3_p13 3 anchor
perf:Chopin_op10_no3_p21 0 anchor
perf:Chopin_op10_no3_p02 0 anchor
perf:Chopin_op10_no3_p17 4 anchor
perf:Chopin_op10_no3_p19 0 anchor
perf:Chopin_op10_no3_p09 0 anchor
perf:Chopin_op10_no3_p12 0 anchor
perf:Chopin_op10_no3_p10 0 anchor
score:Chopin_op10_no3 0 anchor
perf:Chopin_op10_no3_p18 0 anchor
perf:Chopin_op10_no3_p08 0 anchor
perf:Chopin_op10_no3_p14 0 anchor
perf:Chopin_op10_no3_p11 0 anchor
perf:Chopin_op10_no3_p07 0 anchor
perf:Chopin_op10_no3_p22 0 anchor

This is the complete synchronised view: one score coordinate mapped to 22 different MIDI tick coordinates, each reflecting the expressive timing of a different pianist.


Summary

The complete workflow:

from timetoalign import MatchfileLoader
from timetoalign.alignment import MatchGraph

# Load
loader = MatchfileLoader()
loader.load(*sorted(data_dir.glob("*.match")))

# Inspect timelines and their TimeStamps
score = loader.create_timeline("score")
score.get_timestamp(10.0)  # quarters + raw + divs

# Assemble the bundle
bundle = loader.create_alignment_bundle()

# Query claims and build MatchStamps
claims = bundle.cross_group_claims
mg = MatchGraph(claims=[c for c in claims if c.is_synchronous])
stamps = mg.get_stamps()  # full cross-section across all timelines