Manual#

This page is a detailed guide for using ms3 for different tasks. The code examples suppose you are working in an interactive Python interpreter such as IPython, Jupyter, Google Colab, or simply in the Python console. The manual itself is written as a MyST markdown Notebook which can be run in Jupyter if the Jupytext extension is installed.

Good to know#

The principal raison d’être of ms3 is for extracting different types of information, “facets”, from MuseScore files and to store them in the form of tabular .tsv files, Tab-separated values, for further processing. Over the years, however, it has gained additional functionality related to corpus curation. Therefore, the package operates on a couple of principles that have co-evolved together with the DCML corpus initiative.

Corpus structure#

ms3 follows a one-folder-per-feature approach (rather than a one-folder-per-file approach).

The default#

The library’s main command, ms3 extract, creates this folder-based structure by default. Consider this fresh corpus_before containing only the folder MS3 with 7 uncompressed MuseScore files in .mscx format:

corpus_before
└── MS3
    ├── 05_symph_fant.mscx
    ├── 76CASM34A33UM.mscx
    ├── BWV_0815.mscx
    ├── D973deutscher01.mscx
    ├── Did03M-Son_regina-1762-Sarti.mscx
    ├── K281-3.mscx
    └── stabat_03_coloured.mscx

2 directories, 7 files

The default command that is used on all DCML corpora, ms3 extract -M -N -X -D -a extracts three TSV files (measures, notes and “eXpanded” harmonies) from each score and places them in a separate folder, plus an additional metadata.tsv file:

corpus_after
├── harmonies
│   ├── 76CASM34A33UM.tsv
│   ├── Did03M-Son_regina-1762-Sarti.tsv
│   ├── K281-3.tsv
│   └── stabat_03_coloured.tsv
├── measures
│   ├── 05_symph_fant.tsv
│   ├── 76CASM34A33UM.tsv
│   ├── BWV_0815.tsv
│   ├── D973deutscher01.tsv
│   ├── Did03M-Son_regina-1762-Sarti.tsv
│   ├── K281-3.tsv
│   └── stabat_03_coloured.tsv
├── metadata.mscx
├── MS3
│   ├── 05_symph_fant.mscx
│   ├── 76CASM34A33UM.mscx
│   ├── BWV_0815.mscx
│   ├── D973deutscher01.mscx
│   ├── Did03M-Son_regina-1762-Sarti.mscx
│   ├── K281-3.mscx
│   └── stabat_03_coloured.mscx
└── notes
    ├── 05_symph_fant.tsv
    ├── 76CASM34A33UM.tsv
    ├── BWV_0815.tsv
    ├── D973deutscher01.tsv
    ├── Did03M-Son_regina-1762-Sarti.tsv
    ├── K281-3.tsv
    └── stabat_03_coloured.tsv

5 directories, 26 files

ms3 operates on the fundamental principle that files that belong together need to have the same name (but not necessarily the same extension). For example, each of the folders has a file called BWV_0815 but the folder harmonies contains only four files because the other three did not contain harmony labels.

The folder structure results from the command’s default arguments which are equivalent to ms3 extract -M ../measures -N ../notes -X ../harmonies. While there are multiple ways of specifying output folders (see below), there is an additional mechanism for cases in which outputs are to be placed within the same folder (which is probematic when they have the same name):

Using suffixes#

Users who need their outputs in the same directory, say out, can specify the flag -s to add suffixes to the file names. For example, using ms3 extract -M out -N out -X out -s you would get:

out
├── 05_symph_fant_measures.tsv
├── 05_symph_fant_notes.tsv
├── 76CASM34A33UM_measures.tsv
├── 76CASM34A33UM_notes.tsv
├── BWV_0815_measures.tsv
├── BWV_0815_notes.tsv
├── csv-metadata.json
├── D973deutscher01_measures.tsv
├── D973deutscher01_notes.tsv
├── Did03M-Son_regina-1762-Sarti_expanded.tsv
├── Did03M-Son_regina-1762-Sarti_measures.tsv
├── Did03M-Son_regina-1762-Sarti_notes.tsv
├── K281-3_expanded.tsv
├── K281-3_measures.tsv
├── K281-3_notes.tsv
├── stabat_03_coloured_expanded.tsv
├── stabat_03_coloured_measures.tsv
└── stabat_03_coloured_notes.tsv

1 directory, 18 files

Based on these two principles, default folder and suffix names, ms3 is able to recognize which facet of which piece the files represent and to relate them to each other.

Keys and IDs#

Note

ms3 version 1.0.0 and successors widely replace the mechanisms related to the parameter key. Newer versions, instead, use IDs such as corpus_name and (corpus_name, piece_name). If you come across a method where the first parameter is called key, you are likely dealing with an older version, or with a ms3.Score object.

IDs are tuples that are used to identify corpora, pieces and files:

  • 'corpus_name' identifies a corpus, a collection of pieces.

    • ms3.Parse[corpus_name] -> Corpus.

  • 'fname' identifies a piece by its file name without any suffixes.

    • ms3.Corpus[fname] -> Piece

    • ms3.Parse[(corpus_name, fname)] -> Piece

  • Integers identify individual files and are unique within a Corpus.

    • ms3.Piece[i] -> File

    • ms3.Corpus[(fname, i)] -> File

    • ms3.Parse[(corpus_name, fname, i)] -> File

The importance of the fname ID#

The piece IDs fname relate to the file names in this way: fname[suffix].ext. In order to correctly match files together that belong together, without doing complicated string matching, ms3 relies on a list of fnames that it will expect to be present in a column called fname in a file called metadata.tsv (see below). Generally, this should be the first column in these files, used as index.

Hint

If you find yourself stuck with ms3 producing no output, it is likely because no metadata.tsv is present. In this case, use the option -a to parse everything regardless, and -D to create a metadata.tsv.

The important role of metadata#

As mentioned above, ms3 relies on the fname as ID of a piece and uses it to identify the various files belonging to it although they may come with additional suffixes (e.g. _reviewed) and be scattered all over the corpus. Importantly, it uses and expects a file called metadata.tsv that lists all piece IDs of the corpus in a column called fname. Scores and other files whose names do not begin with any of strings in that column are excluded. Files, on the other hand, that begin with any of the strings are recognized to belong to this piece and to have a suffix, if they do.

Therefore, creating a metadata.tsv is an important first step before using ms3 to its full potential. This is done by nagivating to the corpus directory and calling ms3 extract -a -D, where -D stands for “metadata” and -a for “all”, i.e. the directive to process all detected scores, including those not listed in a metadata.tsv file.

Relying on a particular control file in that manner makes it easy to systematically exclude particular scores from processing (by dropping them from the table) or to mark alternative versions of a score by adding a suffix and not listing them individually. ms3 will recognize them as alternatives and, based on the current View, include them or not. As an additional feature, you may pre-configure multiple views of the corpus by storing their selection of piece IDs in additional metadata[_suffix].tsv files. This example file would lead ms3 to make available an additional view called suffix. View the chapter on views below to learn more.

Views#

This chapter still needs to be written. In short:

You can access the view of Piece, Corpus, and Parse objects, using the accessor .view. The two main methods of a View object are .include(category, *strings_to_include) and .exclude(category, *strings_to_exclude). Every view has a name which you can use as an accessor to change the relevant object’s view. For example, new objects come with the views “default” and “all”, so if you have a Corpus object stored under the variable c, typing c.all will activate the view that shows everything.

Label types#

ms3 recognizes and disambiguates different types of labels, depending on how they are encoded in MuseScore, see harmony_layer.

Independent of the type, ms3 will also try to infer whether a label conforms to the DCML syntax and/or other regular expressions registered via new_type(). The column regex_match contains for each label the name of the first regEx that matched. information will appear with a subtype, e.g. 0 (dcml).

See also infer_label_types().

Measure counts (MC) vs. measure numbers (MN)#

Measure counts are strictly increasing numbers for all <measure> nodes in the score, regardless of their length. This information is crucial for correctly addressing positions in a MuseScore file and are shown in the software’s status bar. The first measure is always counted as 1 (following MuseScore’s convention), even if it is an anacrusis.

Measure numbers are the traditional way by which humans refer to positions in a score. They follow a couple of conventions which can be summarised as counting complete bars. Quite often, a complete bar (MN) can be made up of two <measure> nodes (MC). In the context of this library, score addressability needs to be maintained for humans and computers, therefore a mapping MC -> MN is preserved in the score information DataFrames.

Onset positions#

Onsets express positions of events in a score as their distance from the beginning of the corresponding MC or MN. The distances are expressed as fractions of a whole note. In other words, beat 1 has onset 0, an event on beat 2 of a 4/4 meter has onset 1/4 and so on.

Since there are two ways of referencing measures (MC and MN), there are also two ways of expressing onsets:

  • mc_onset expresses the distance from the corresponding MC

  • mn_onset expresses the distance from the corresponding MN

In most cases, the two values value will be identical, but take as an example the case where a 4/4 measure with MN 8 is divided into MC 9 of length 3/4 and MC 10 of length 1/4 because of a repeat sign or a double bar line. Since MC 9 corresponds to the first part of MN 8, the two onset values are identical. But for the anacrusis on beat 4, the values differ: mc_onset is 0 but mn_onset is 3/4 because this is the distance from MN 8.

Read-only mode#

For parsing faster using less memory. Scores parsed in read-only mode cannot be changed because the original XML structure is not kept in memory.

Stacks-of-fifths intervals#

In order to express note names (tonal pitch classes, tpc), and scale degrees, ms3 uses stacks of fifths (the only way to express these as a single integer). For note names, 0 corresponds to C, for scale degrees to the local tonic.

fifths

note name

interval

scale degree

-6

Gb

d5

b5

-5

Db

m2

b2

-4

Ab

m6

b6 (6 in minor)

-3

Eb

m3

b3 (3 in minor)

-2

Bb

m7

b7 (7 in minor)

-1

F

P4

4

0

C

P1

1

1

G

P5

5

2

D

M2

2

3

A

M6

6 (#6 in minor)

4

E

M3

3 (#3 in minor)

5

B

M7

7 (#7 in minor)

6

F#

A4

#4

Voltas#

“Prima/Seconda volta” is the Italian designation for “First/Second time”. Therefore, in the context of ms3, we refer to ‘a volta’ as one of several endings. By convention, all endings should have the same measure numbers (MN), which are often differentiated by lowercase letters, e.g. 8a for the first ending and 8b for the second ending. In MuseScore, correct bar numbers can be achieved by excluding 8b from the count or, if the endings have more than one bar, by subtracting the corresponding number from the second ending’s count. For example, in order to achieve the correct MNs [7a 8a][7b 8b], you would add -2 to 7b’s count which otherwise would come out as 9.

ms3 checks for incorrect MNs and warns you if the score needs correction. It will also ask you to make all voltas the same length. If this is not possible for editorial reasons (although often the length of the second volta is arbitrary), ignore the warning and check in the measures table if the MN are correct for your purposes.

Facets#

This section gives an overview of the various tables that ms3 exposes after parsing a MuseScore file. Their names, e.g. measures, correspond to the properties of Score and the methods of Parse with which they can be retrieved. They come as pandas.DataFrame objects. The available tables are:

All score information, except the metadata, is contained in the following two tables:

  • measures

  • notes

  • rests

  • notes_and_rests

  • chords: Not to be confounded with labels or chord annotations, a chord is a notational unit in which all included notes are part of the same notational layer and have the same onset and duration. Every chord has a chord_id and every note is part of a chord. These tables are used to convey score information that is not attached to a particular note, such as lyrics, staff text, dynamics and other markup.

  • labels

  • expanded

  • cadences

  • events

For each of the available tables you will see an example and you can click on the columns to learn about their meanings.

Measures#

DataFrame representing the measures in the MuseScore file (which can be incomplete measures, see Measure counts (MC) vs. measure numbers (MN)) together with their respective features. Required for unfolding repeats.

>>> s.mscx.measures()            # from a Score object
>>> P.measures()                 # from a Piece object
>>> c.measures()                 # from a Corpus object
>>> p.get_facet('measures')      # from a Parse object
mc mn quarterbeats duration_qb keysig timesig act_dur mc_offset numbering_offset dont_count barline breaks repeats next quarterbeats_all_endings volta markers jump_bwd jump_fwd play_until
fname measures_i
05_symph_fant 0 1 1 0 4.0 0 4/4 1 0 <NA> <NA> NaN NaN firstMeasure (2,) NaN <NA> NaN NaN NaN NaN
1 2 2 4 4.0 0 4/4 1 0 <NA> <NA> NaN NaN <NA> (3,) NaN <NA> NaN NaN NaN NaN
2 3 3 8 4.0 0 4/4 1 0 <NA> <NA> NaN line <NA> (4,) NaN <NA> NaN NaN NaN NaN
3 4 4 12 4.0 0 4/4 1 0 <NA> <NA> NaN NaN <NA> (5,) NaN <NA> NaN NaN NaN NaN
4 5 5 16 4.0 0 4/4 1 0 <NA> <NA> NaN NaN <NA> (6,) NaN <NA> NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
stabat_03_coloured 21 22 22 84 4.0 -2 4/4 1 0 <NA> <NA> NaN line <NA> (23,) NaN <NA> NaN NaN NaN NaN
22 23 23 88 4.0 -2 4/4 1 0 <NA> <NA> NaN NaN <NA> (24,) NaN <NA> NaN NaN NaN NaN
23 24 24 92 4.0 -2 4/4 1 0 <NA> <NA> NaN NaN <NA> (25,) NaN <NA> NaN NaN NaN NaN
24 25 25 96 4.0 -2 4/4 1 0 <NA> <NA> NaN NaN <NA> (26,) NaN <NA> NaN NaN NaN NaN
25 26 26 100 4.0 -2 4/4 1 0 <NA> <NA> NaN NaN lastMeasure (-1,) NaN <NA> NaN NaN NaN NaN

615 rows × 20 columns

Notes#

DataFrame representing the notes in the MuseScore file.

>>> s.mscx.notes()            # from a Score object
>>> P.notes()                 # from a Piece object
>>> c.notes()                 # from a Corpus object
>>> p.get_facet('notes')      # from a Parse object
mc mn quarterbeats duration_qb mc_onset mn_onset timesig staff voice duration ... tremolo nominal_duration scalar tied tpc midi name octave chord_id volta
fname notes_i
05_symph_fant 0 1 1 0 1.0 0 0 4/4 28 1 1/4 ... 1/4_r32_0 1/4 1 <NA> 10 70 A#4 4 30 <NA>
1 1 1 0 1.0 0 0 4/4 27 1 1/4 ... 1/4_r32_0 1/4 1 <NA> 7 73 C#5 5 26 <NA>
2 1 1 0 1.0 0 0 4/4 26 1 1/4 ... 1/4_r32_0 1/4 1 <NA> 4 76 E5 5 22 <NA>
3 1 1 0 1.0 0 0 4/4 25 1 1/4 ... 1/4_r32_0 1/4 1 <NA> 1 79 G5 5 18 <NA>
4 1 1 0 1.0 0 0 4/4 24 1 1/4 ... 1/4_r32_0 1/4 1 <NA> 10 82 A#5 5 14 <NA>
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
stabat_03_coloured 591 26 26 100 1.0 0 0 4/4 4 1 1/4 ... NaN 1/4 1 <NA> 1 55 G3 3 457 <NA>
592 26 26 100 0.5 0 0 4/4 3 1 1/8 ... NaN 1/8 1 <NA> -2 58 Bb3 3 454 <NA>
593 26 26 100 0.5 0 0 4/4 3 1 1/8 ... NaN 1/8 1 <NA> 1 67 G4 4 454 <NA>
594 26 26 201/2 0.5 1/8 1/8 4/4 3 1 1/8 ... NaN 1/8 1 <NA> 2 62 D4 4 455 <NA>
595 26 26 101 1.0 1/4 1/4 4/4 3 1 1/4 ... NaN 1/4 1 <NA> 1 55 G3 3 456 <NA>

12923 rows × 21 columns

Rests#

DataFrame representing the rests in the MuseScore file.

>>> s.mscx.rests()            # from a Score object
>>> P.rests()                 # from a Piece object
>>> c.rests()                 # from a Corpus object
>>> p.get_facet('rests')      # from a Parse object
mc mn quarterbeats duration_qb mc_onset mn_onset timesig staff voice duration nominal_duration scalar volta
fname rests_i
05_symph_fant 0 1 1 0 4.0 0 0 4/4 1 1 1 1 1 <NA>
1 1 1 0 4.0 0 0 4/4 2 1 1 1 1 <NA>
2 1 1 0 4.0 0 0 4/4 3 1 1 1 1 <NA>
3 1 1 0 4.0 0 0 4/4 4 1 1 1 1 <NA>
4 1 1 0 4.0 0 0 4/4 5 1 1 1 1 <NA>
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
stabat_03_coloured 66 26 26 100 4.0 0 0 4/4 1 1 1 1 1 <NA>
67 26 26 100 4.0 0 0 4/4 2 1 1 1 1 <NA>
68 26 26 102 2.0 1/2 1/2 4/4 3 1 1/2 1/2 1 <NA>
69 26 26 101 1.0 1/4 1/4 4/4 4 1 1/4 1/4 1 <NA>
70 26 26 102 2.0 1/2 1/2 4/4 4 1 1/2 1/2 1 <NA>

2134 rows × 13 columns

Notes and Rests#

DataFrame combining Notes and Rests.

>>> s.mscx.notes_and_rests()          # from a Score object
>>> P.notes_and_rests()               # from a Piece object
>>> c.notes_and_rests()               # from a Corpus object
>>> p.get_facet('notes_and_rests')    # from a Parse object
mc mn quarterbeats duration_qb mc_onset mn_onset timesig staff voice duration ... tremolo nominal_duration scalar tied tpc midi name octave chord_id volta
fname notes_and_rests_i
05_symph_fant 0 1 1 0 1.0 0 0 4/4 28 1 1/4 ... 1/4_r32_0 1/4 1 <NA> 10 70 A#4 4 30 <NA>
1 1 1 0 1.0 0 0 4/4 27 1 1/4 ... 1/4_r32_0 1/4 1 <NA> 7 73 C#5 5 26 <NA>
2 1 1 0 1.0 0 0 4/4 26 1 1/4 ... 1/4_r32_0 1/4 1 <NA> 4 76 E5 5 22 <NA>
3 1 1 0 1.0 0 0 4/4 25 1 1/4 ... 1/4_r32_0 1/4 1 <NA> 1 79 G5 5 18 <NA>
4 1 1 0 1.0 0 0 4/4 24 1 1/4 ... 1/4_r32_0 1/4 1 <NA> 10 82 A#5 5 14 <NA>
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
stabat_03_coloured 662 26 26 201/2 0.5 1/8 1/8 4/4 3 1 1/8 ... NaN 1/8 1 <NA> 2 62 D4 4 455 <NA>
663 26 26 101 1.0 1/4 1/4 4/4 3 1 1/4 ... NaN 1/4 1 <NA> 1 55 G3 3 456 <NA>
664 26 26 101 1.0 1/4 1/4 4/4 4 1 1/4 ... NaN 1/4 1 <NA> <NA> <NA> NaN <NA> <NA> <NA>
665 26 26 102 2.0 1/2 1/2 4/4 3 1 1/2 ... NaN 1/2 1 <NA> <NA> <NA> NaN <NA> <NA> <NA>
666 26 26 102 2.0 1/2 1/2 4/4 4 1 1/2 ... NaN 1/2 1 <NA> <NA> <NA> NaN <NA> <NA> <NA>

15057 rows × 21 columns

Chords#

Note

The word “chords”, here, is used in a very specific way and is misleading. It has been adopted from the MuseScore XML source code but is better understood as “note tuple with unique onset position”. If you are interested in chord labels, please refer to Labels or Expanded.

In a MuseScore file, every note is enclosed by a <Chord> tag. One <Chord> tag can enclose several notes, as long as they occur in the same staff and voice (notational layer). As a consequence, notes belonging to the same <Chord> have the same onset and the same duration.

Why chord lists? Most of the markup (such as articulation, lyrics etc.) in a MuseScore file is attached not to individual notes but instead to <Chord> tags. It might be a matter of interpretation to what notes exactly the symbols pertain, which is why it is left for the interested user to link the chord list with the corresponding note list by joining on the chord_id column of each.

Standard columns#

The output of the analogous commands depends on what markup is available in the score (see below). The columns that are always present in a chord list are exactly the same as (and correspond to) those of a note list except for tied, tpc, and midi.

Such a reduced table – or one with precisely selected features to extract – can be retrieved using Score.mscx.parsed.get_chords(mode=’strict’) bs4_parser._MSCX_bs4.get_chords. However, most of the time users will be interested to automatically retrieve all markup present in the score (as far as ms3 goes), see below.

Dynamic columns#

Leaving the standard columns aside, the normal interface for accessing chord lists calls Score.mscx.parsed.get_chords(mode='auto') meaning that only columns are included that have at least one non empty value. The following table shows the first two non-empty values for each column when parsing all scores included in the ms3 repository for demonstration purposes:

>>> s.mscx.chords()          # from a Score object
>>> P.chords()               # from a Piece object
>>> c.chords()               # from a Corpus object
>>> p.get_facet('chords')    # from a Parse object
mc mn quarterbeats duration_qb mc_onset mn_onset event timesig staff voice ... qpm slur decrescendo_hairpin crescendo_hairpin system_text pedal Pedal_<sym>keyboardPedalPed</sym> volta lyrics_1 diminuendo_line
fname chords_i
05_symph_fant 0 1 1 0 0.0 0 0 Tempo 4/4 1 1 ... 63 NaN NaN NaN NaN NaN NaN <NA> NaN NaN
1 1 1 3 0.0 3/4 3/4 Dynamic 4/4 15 1 ... <NA> NaN NaN NaN NaN NaN NaN <NA> NaN NaN
2 1 1 3 1.0 3/4 3/4 Chord 4/4 15 1 ... <NA> NaN NaN NaN NaN NaN NaN <NA> NaN NaN
3 1 1 0 0.0 0 0 StaffText 4/4 16 1 ... <NA> NaN NaN NaN NaN NaN NaN <NA> NaN NaN
4 1 1 3 0.0 3/4 3/4 Dynamic 4/4 16 1 ... <NA> NaN NaN NaN NaN NaN NaN <NA> NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
stabat_03_coloured 578 26 26 101 1.0 1/4 1/4 Chord 4/4 3 1 ... <NA> NaN <NA> <NA> NaN NaN NaN <NA> NaN NaN
579 26 26 100 1.0 0 0 Chord 4/4 4 1 ... <NA> NaN 19 <NA> NaN NaN NaN <NA> NaN NaN
580 26 26 102 0.0 1/2 1/2 Dynamic 4/4 4 1 ... <NA> NaN <NA> <NA> NaN NaN NaN <NA> NaN NaN
581 26 26 100 1.0 0 0 Chord 4/4 4 2 ... <NA> NaN <NA> <NA> NaN NaN NaN <NA> NaN NaN
582 26 26 101 0.0 1/4 1/4 Spanner 4/4 4 2 ... <NA> NaN <NA> <NA> NaN NaN NaN <NA> NaN NaN

12219 rows × 30 columns

Labels#

DataFrame representing the annotation labels contained in the score. The output can be controlled by changing the labels_cfg configuration.

>>> s.mscx.labels()          # from a Score object
>>> P.labels()               # from a Piece object
>>> c.labels()               # from a Corpus object
>>> p.get_facet('labels')    # from a Parse object
mc mn quarterbeats duration_qb mc_onset mn_onset timesig staff voice harmony_layer label regex_match color color_a color_b color_g color_r
fname labels_i
76CASM34A33UM 0 1 1 9/2 1.5 9/8 9/8 12/8 1 1 3 Fm/Ab NaN NaN NaN NaN NaN NaN
1 2 2 6 6.0 0 0 12/8 1 1 3 C/G NaN NaN NaN NaN NaN NaN
2 3 3 12 1.5 0 0 12/8 1 1 3 C9 NaN NaN NaN NaN NaN NaN
3 3 3 27/2 1.5 3/8 3/8 12/8 1 1 1 -/Bb NaN NaN NaN NaN NaN NaN
4 3 3 15 1.5 3/4 3/4 12/8 1 1 3 Am7 NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
stabat_03_coloured 80 25 25 98 0.5 1/2 1/2 4/4 3 1 0 i6 dcml NaN NaN NaN NaN NaN
81 25 25 197/2 0.5 5/8 5/8 4/4 3 1 0 iv6 dcml NaN NaN NaN NaN NaN
82 25 25 99 0.5 3/4 3/4 4/4 3 1 0 iio6 dcml NaN NaN NaN NaN NaN
83 25 25 199/2 0.5 7/8 7/8 4/4 3 1 0 V dcml NaN NaN NaN NaN NaN
84 26 26 100 4.0 0 0 4/4 3 1 0 i} dcml NaN NaN NaN NaN NaN

1031 rows × 17 columns

Expanded#

If the score contains DCML harmony labels, this DataFrames represents them after splitting them into the encoded features and translating them into scale degrees.

>>> s.mscx.expanded()          # from a Score object
>>> P.expanded()               # from a Piece object
>>> c.expanded()               # from a Corpus object
>>> p.get_facet('expanded')    # from a Parse object
mc mn quarterbeats duration_qb mc_onset mn_onset timesig staff voice label ... localkey_is_minor chord_tones added_tones root bass_note color color_a color_b color_g color_r
fname expanded_i
Did03M-Son_regina-1762-Sarti 0 1 0 0 0.5 0 7/8 4/4 5 1 d.V{ ... True (1, 5, 2) () 1 1 NaN NaN NaN NaN NaN
1 2 1 1/2 0.5 0 0 4/4 5 1 i ... True (0, -3, 1) () 0 0 NaN NaN NaN NaN NaN
2 2 1 1 1.5 1/8 1/8 4/4 5 1 iv ... True (-1, -4, 0) () -1 -1 NaN NaN NaN NaN NaN
3 2 1 5/2 1.0 1/2 1/2 4/4 5 1 V2 ... True (-1, 1, 5, 2) () 1 -1 NaN NaN NaN NaN NaN
4 2 1 7/2 0.5 3/4 3/4 4/4 5 1 i6 ... True (-3, 1, 0) () 0 -3 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
stabat_03_coloured 80 25 25 98 0.5 1/2 1/2 4/4 3 1 i6 ... True (-3, 1, 0) () 0 -3 NaN NaN NaN NaN NaN
81 25 25 197/2 0.5 5/8 5/8 4/4 3 1 iv6 ... True (-4, 0, -1) () -1 -4 NaN NaN NaN NaN NaN
82 25 25 99 0.5 3/4 3/4 4/4 3 1 iio6 ... True (-1, -4, 2) () 2 -1 NaN NaN NaN NaN NaN
83 25 25 199/2 0.5 7/8 7/8 4/4 3 1 V ... True (1, 5, 2) () 1 1 NaN NaN NaN NaN NaN
84 26 26 100 4.0 0 0 4/4 3 1 i} ... True (0, -3, 1) () 0 0 NaN NaN NaN NaN NaN

840 rows × 35 columns

Cadences#

If DCML harmony labels include cadence labels, return only those. This table is simply a filter on expanded. The table has the same columns and contains only rows that include a cadence label. Just for convenience…

>>> s.mscx.cadences()          # from a Score object
>>> P.cadences()               # from a Piece object
>>> c.cadences()               # from a Corpus object
>>> p.get_facet('cadences')    # from a Parse object
mc mn quarterbeats duration_qb mc_onset mn_onset timesig staff voice label ... relativeroot cadence phraseend chord_type globalkey_is_minor localkey_is_minor chord_tones added_tones root bass_note
fname cadences_i
K281-3 10 5 4 15 1.0 1/4 1/4 2/2 2 1 V|HC ... NaN HC NaN M False False (1, 5, 2) () 1 1
21 9 8 30 2.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
29 11 10 38 1.0 0 0 2/2 2 1 V|HC} ... NaN HC } M False False (1, 5, 2) () 1 1
37 13 12 46 1.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
44 15 14 54 1.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
51 17 16 62 1.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
75 28 27 107 3.0 1/4 1/4 2/2 2 1 V|HC ... NaN HC NaN M False False (1, 5, 2) () 1 1
97 40 39 154 1.0 0 0 2/2 2 1 I[I|PAC}{ ... NaN PAC }{ M False False (0, 4, 1) () 0 0
116 48 47 187 1.0 1/4 1/4 2/2 2 1 V|HC ... NaN HC NaN M False False (1, 5, 2) () 1 1
127 52 51 202 4.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
141 60 59 236 2.0 1/2 1/2 2/2 2 1 V|HC ... NaN HC NaN M False True (1, 5, 2) () 1 1
149 64 63 252 2.0 1/2 1/2 2/2 2 1 V|HC ... NaN HC NaN M False True (1, 5, 2) () 1 1
157 68 67 268 2.0 1/2 1/2 2/2 2 1 i|PAC ... NaN PAC NaN m False True (0, -3, 1) () 0 0
161 70 69 276 2.0 1/2 1/2 2/2 2 1 V|HC ... NaN HC NaN M False True (1, 5, 2) () 1 1
173 76 75 299 1.0 1/4 1/4 2/2 2 1 V|HC ... NaN HC NaN M False False (1, 5, 2) () 1 1
184 80 79 314 2.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
192 82 81 322 1.0 0 0 2/2 2 1 V|HC} ... NaN HC } M False False (1, 5, 2) () 1 1
200 84 83 330 1.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
207 86 85 338 1.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
214 88 87 346 1.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
241 102 101 402 4.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
257 110 109 435 3.0 1/4 1/4 2/2 2 1 V|HC ... NaN HC NaN M False False (1, 5, 2) () 1 1
275 119 118 470 6.0 0 0 2/2 2 1 V|HC} ... NaN HC } M False False (1, 5, 2) () 1 1
286 124 123 490 4.0 0 0 2/2 2 1 V]|HC} ... NaN HC } M False False (1, 5, 2) () 1 1
309 137 136 542 1.0 0 0 2/2 2 1 I[I|PAC}{ ... NaN PAC }{ M False False (0, 4, 1) () 0 0
332 147 146 583 1.0 1/4 1/4 2/2 2 1 V|HC ... NaN HC NaN M False False (1, 5, 2) () 1 1
343 151 150 598 2.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
351 153 152 606 1.0 0 0 2/2 2 1 V|HC} ... NaN HC } M False False (1, 5, 2) () 1 1
359 155 154 614 1.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
366 157 156 622 1.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0
373 159 158 630 1.0 0 0 2/2 2 1 I|PAC} ... NaN PAC } M False False (0, 4, 1) () 0 0

31 rows × 30 columns

Form labels#

>>> s.mscx.form_labels()          # from a Score object
>>> P.form_labels()               # from a Piece object
>>> c.form_labels()               # from a Corpus object
>>> p.get_facet('form_labels')    # from a Parse object
a
mc mn quarterbeats duration_qb mc_onset mn_onset staff voice timesig form_label 0 1 2 3 4 5 6
fname form_labels_i
Did03M-Son_regina-1762-Sarti 0 1 0 0 8.0 0 7/8 1 1 4/4 0: aria|ternary form.da capo, 1: A, 2: rit|pd,... aria|ternary form.da capo A ritornello|period antecedent|sentence presentation basic idea NaN
1 3 2 8 4.0 7/8 7/8 2 1 4/4 4: cont NaN NaN NaN NaN continuation NaN NaN
2 4 3 12 4.0 7/8 7/8 2 1 4/4 4: cont! NaN NaN NaN NaN continuation! NaN NaN
3 5 4 16 7.0 7/8 7/8 2 1 4/4 5: cad NaN NaN NaN NaN NaN cadential idea NaN
4 7 6 23 2.5 5/8 5/8 2 1 4/4 3: cons|sent, 4: pres, 5: si, 6: mod NaN NaN NaN consequent|sentence presentation secondary idea model
5 8 7 51/2 2.0 1/4 1/4 2 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
6 8 7 55/2 2.0 3/4 3/4 2 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
7 9 8 59/2 2.0 1/4 1/4 2 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
8 9 8 63/2 2.0 3/4 3/4 2 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
9 10 9 67/2 6.5 1/4 1/4 2 1 4/4 4: cont > cad NaN NaN NaN NaN continuation > cadential idea NaN NaN
10 11 10 40 8.0 7/8 7/8 1 1 4/4 2i: solo 1|sent - ii: Vocal section|pd, 3i: pr... NaN NaN i: solo 1|sentence - ii: Vocal section|period i: presentation - ii: antecedent|sentence i: basic idea - ii: presentation i: } - ii: basic idea NaN
11 13 12 48 4.0 7/8 7/8 1 1 4/4 3i: cont - ii: /, 4i: mod - ii: cont, 5i: } - ... NaN NaN NaN i: continuation - ii: / i: model - ii: continuation i: } - ii: model NaN
12 14 13 52 4.0 7/8 7/8 1 1 4/4 4i: mod!- 4ii: }, 5i: } - 5ii: mod! NaN NaN NaN NaN i: model!- - ii: } i: } - ii: model! NaN
13 15 14 56 9.0 7/8 7/8 1 1 4/4 4i: cad - ii: /, 5i: } - ii: cad NaN NaN NaN NaN i: cadential idea - ii: / i: } - ii: cadential idea NaN
14 18 17 65 2.5 1/8 1/8 2 1 4/4 2i: rit|sent - ii: /, 3i: pres - ii: cons|Rito... NaN NaN i: ritornello|sentence - ii: / i: presentation - ii: consequent|Ritornello ph... i: secondary idea - ii: presentation i: model - ii: secondary idea i: } - ii: model
15 18 17 135/2 2.0 3/4 3/4 2 1 4/4 5i: seq - ii: /, 6i: } - ii: seq NaN NaN NaN NaN NaN i: sequence - ii: / i: } - ii: sequence
16 19 18 139/2 2.0 1/4 1/4 2 1 4/4 5i: seq - ii: /, 6i: } - ii: seq NaN NaN NaN NaN NaN i: sequence - ii: / i: } - ii: sequence
17 19 18 143/2 2.0 3/4 3/4 2 1 4/4 5i: seq - ii: /, 6i: } - ii: seq NaN NaN NaN NaN NaN i: sequence - ii: / i: } - ii: sequence
18 20 19 147/2 2.0 1/4 1/4 2 1 4/4 5i: seq - ii: /, 6i: } - ii: seq NaN NaN NaN NaN NaN i: sequence - ii: / i: } - ii: sequence
19 20 19 151/2 6.5 3/4 3/4 2 1 4/4 3i: cont - ii: /, 4i: cad - ii: cont, 5i: } - ... NaN NaN NaN i: continuation - ii: / i: cadential idea - ii: continuation i: } - ii: cadential idea NaN
20 22 21 82 8.0 3/8 3/8 1 1 4/4 2: Vocal phrase 2|pd, 3: ant|sent, 4: pres, 5a... NaN NaN Vocal phrase 2|period antecedent|sentence presentation basic idea NaN
21 24 23 90 2.0 3/8 3/8 1 1 4/4 3: cont|sent, 4: pres, 5: mod NaN NaN NaN continuation|sentence presentation model NaN
22 24 23 92 2.0 7/8 7/8 1 1 4/4 5: mod! NaN NaN NaN NaN NaN model! NaN
23 25 24 94 6.0 3/8 3/8 1 1 4/4 5: mod > cad NaN NaN NaN NaN NaN model > cadential idea NaN
24 26 25 100 7.0 7/8 7/8 2 1 4/4 3: ant|sent, 4: pres, 5: bi NaN NaN NaN antecedent|sentence presentation basic idea NaN
25 28 27 107 2.5 5/8 5/8 1 1 4/4 3: cont|sent, 4: pres, 5: si, 6: mod NaN NaN NaN continuation|sentence presentation secondary idea model
26 29 28 219/2 2.0 1/4 1/4 1 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
27 29 28 223/2 2.0 3/4 3/4 1 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
28 30 29 227/2 2.0 1/4 1/4 1 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
29 30 29 231/2 2.0 3/4 3/4 1 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
30 31 30 235/2 2.0 1/4 1/4 1 1 4/4 5: si!, 6: mod NaN NaN NaN NaN NaN secondary idea! model
31 31 30 239/2 2.0 3/4 3/4 1 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
32 32 31 243/2 2.0 1/4 1/4 1 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
33 32 31 247/2 2.0 3/4 3/4 1 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
34 33 32 251/2 2.0 1/4 1/4 1 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
35 33 32 255/2 17.0 3/4 3/4 1 1 4/4 4: cont > cad NaN NaN NaN NaN continuation > cadential idea NaN NaN
36 38 37 289/2 7.5 0 0 1 1 4/4 2: rit|pd, 3: ant|sent, 4: pres, 5: bi NaN NaN ritornello|period antecedent|sentence presentation basic idea NaN
37 39 38 152 4.0 7/8 7/8 2 1 4/4 4: cont, 5: mod NaN NaN NaN NaN continuation model NaN
38 40 39 156 4.0 7/8 7/8 2 1 4/4 5: mod! NaN NaN NaN NaN NaN model! NaN
39 41 40 160 7.0 7/8 7/8 2 1 4/4 5: cad NaN NaN NaN NaN NaN cadential idea NaN
40 43 42 167 2.5 5/8 5/8 2 1 4/4 3: cons|sent, 4: pres, 5: si, 6: mod NaN NaN NaN consequent|sentence presentation secondary idea model
41 44 43 339/2 2.0 1/4 1/4 2 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
42 44 43 343/2 6.0 3/4 3/4 2 1 4/4 6: seq NaN NaN NaN NaN NaN NaN sequence
43 46 45 355/2 6.5 1/4 1/4 2 1 4/4 4: cont > cad NaN NaN NaN NaN continuation > cadential idea NaN NaN
44 47 46 184 8.0 7/8 7/8 1 1 4/4 1: B|pd, 2: ant|sent, 3: pres.¡, 4: bi NaN B|period antecedent|sentence presentation.¡ basic idea NaN NaN
45 49 48 192 6.0 7/8 7/8 1 1 4/4 4: bi! NaN NaN NaN NaN basic idea! NaN NaN
46 51 50 198 6.0 3/8 3/8 1 1 4/4 3: cont, 4: bi NaN NaN NaN continuation basic idea NaN NaN
47 52 51 204 4.0 7/8 7/8 1 1 4/4 4: seq NaN NaN NaN NaN sequence NaN NaN
48 53 52 208 4.0 7/8 7/8 1 1 4/4 4: cad NaN NaN NaN NaN cadential idea NaN NaN
49 54 53 212 7.0 7/8 7/8 2 1 4/4 4: bi NaN NaN NaN NaN basic idea NaN NaN
50 56 55 219 2.5 5/8 5/8 1 1 4/4 2: cons|sent, 3: pres, 4: si, 5: mod NaN NaN consequent|sentence presentation secondary idea model NaN
51 57 56 443/2 2.0 1/4 1/4 1 1 4/4 5: seq NaN NaN NaN NaN NaN sequence NaN
52 57 56 447/2 2.0 3/4 3/4 1 1 4/4 5: seq NaN NaN NaN NaN NaN sequence NaN
53 58 57 451/2 2.0 1/4 1/4 1 1 4/4 5: seq NaN NaN NaN NaN NaN sequence NaN
54 58 57 455/2 2.0 3/4 3/4 1 1 4/4 5: seq NaN NaN NaN NaN NaN sequence NaN
55 59 58 459/2 2.0 1/4 1/4 1 1 4/4 5: seq NaN NaN NaN NaN NaN sequence NaN
56 59 58 463/2 3.5 3/4 3/4 1 1 4/4 3: cont, 4: cad NaN NaN NaN continuation cadential idea NaN NaN
57 60 59 235 11.0 5/8 5/8 1 1 4/4 4: ci NaN NaN NaN NaN contrasting idea NaN NaN
58 63 62 246 4.5 3/8 3/8 2 1 2/4 1: A', 2: rit|pd, 3: ant|sent, 4: pres, 5: bi NaN A' ritornello|period antecedent|sentence presentation basic idea NaN

Events#

This DataFrame is the original tabular representation of the MuseScore file’s source code from which all other tables, except measures are generated. The nested XML tags are transformed into column names.

The value '∅' is used for empty tags. For example, in the column Chord/Spanner/Slur it would correspond to the tag structure (formatting as in an MSCX file):

<Chord>
  <Spanner type="Slur">
    <Slur>
      </Slur>
    </Spanner>
  </Chord>

The value '/' on the other hand represents a shortcut empty tag. For example, in the column Chord/grace16 it would correspond to the tag structure (formatting as in an MSCX file):

<Chord>
  <grace16/>
  </Chord>

Parsing#

This chapter explains how to

  • parse a single score to access and manipulate the contained information using a Score object

  • parse a group of scores to access and manipulate the contained information using a Parse object.

Parsing a single score#

Import the library.

To parse a single score, we will use the class Score. We could import the whole library:

>>> import ms3
>>> s = ms3.Score()

or simply import the class:

>>> from ms3 import Score
>>> s = Score()
Locate the MuseScore 3 score you want to parse.

Tip

MSCZ files are ZIP files containing the uncompressed MSCX. In order to trace the score’s version history, it is recommended to always work with MSCX files.

In the examples, we parse the annotated first page of Giovanni Battista Pergolesi’s influential Stabat Mater. The file is called stabat.mscx and can be downloaded from here (open link and key Ctrl + S to save the file or right-click on the link to Save link as...).

Create a Score object.

In the example, the MuseScore 3 file is located at ~/ms3/docs/stabat.mscx so we can simply create the object and bind it to the variable s like so:

>>> from ms3 import Score
>>> s = Score('~/ms3/docs/stabat.mscx')
Inspect the object.

To have a look at the created object we can simply evoke its variable:

>>> s
MuseScore file
--------------

~/ms3/docs/stabat.mscx

Attached annotations
--------------------

48 labels:
staff  voice  label_type  color_name
3      2      0 (dcml)    default       48

Parsing options#

Score.__init__(musescore_file=None, match_regex=['dcml', 'form_labels'], read_only=False, labels_cfg={}, parser='bs4', ms=None, **logger_cfg)[source]
Parameters
  • musescore_file (str, optional) – Path to the MuseScore file to be parsed.

  • match_regex (list or dict, optional) – Determine which label types are determined automatically. Defaults to [‘dcml’]. Pass {'type_name': r"^(regular)(Expression)$"} to call ms3.Score.new_type().

  • read_only (bool, optional) – Defaults to False, meaning that the parsing is slower and uses more memory in order to allow for manipulations of the score, such as adding and deleting labels. Set to True if you’re only extracting information.

  • labels_cfg (dict) – Store a configuration dictionary to determine the output format of the Annotations object representing the currently attached annotations. See MSCX.labels_cfg.

  • logger_cfg (dict, optional) – The following options are available: ‘name’: LOGGER_NAME -> by default the logger name is based on the parsed file(s) ‘level’: {‘W’, ‘D’, ‘I’, ‘E’, ‘C’, ‘WARNING’, ‘DEBUG’, ‘INFO’, ‘ERROR’, ‘CRITICAL’} ‘file’: PATH_TO_LOGFILE to store all log messages under the given path.

  • parser ('bs4', optional) – The only XML parser currently implemented is BeautifulSoup 4.

  • ms (str, optional) – If you want to parse musicXML files or MuseScore 2 files by temporarily converting them, pass the path or command of your local MuseScore 3 installation. If you’re using the standard path, you may try ‘auto’, or ‘win’ for Windows, ‘mac’ for MacOS, or ‘mscore’ for Linux.

Parsing multiple scores#

Import the library.

To parse multiple scores, we will use the class ms3.Parse. We could import the whole library:

>>> import ms3
>>> p = ms3.Parse()

or simply import the class:

>>> from ms3 import Parse
>>> p = Parse()
Locate the folder containing MuseScore files.

In this example, we are going to parse all files included in the ms3 repository which has been cloned into the home directory and therefore has the path ~/ms3.

Create a Parse object

The object is created by calling it with the directory to scan, and bound to the typical variable p. ms3 scans the subdirectories for corpora (see Corpus structure) and assigns keys automatically based on folder names (here ‘docs’, and ‘tests’):

>>> from ms3 import Parse
>>> p = Parse('~/ms3')
>>> p
WARNING  ms3.Parse.docs -- /home/hentsche/miniconda3/envs/ms3/lib/python3.10/site-packages/ms3/corpus.py (line 865) fnames_in_metadata():
	No metadata.tsv file has been detected for this Corpus object.
WARNING  ms3.Parse.new_tests -- /home/hentsche/miniconda3/envs/ms3/lib/python3.10/site-packages/ms3/corpus.py (line 865) fnames_in_metadata():
	No metadata.tsv file has been detected for this Corpus object.
[default|all]
All corpora
-----------
View: This view is called 'default'. It 
	- excludes fnames that are not contained in the metadata,
	- filters out file extensions requiring conversion (such as .xml), and
	- excludes review files and folders.

               has   active   scores measures    notes   labels expanded   events
          metadata     view detected detected detected detected detected detected
corpus                                                                           
docs            no  default        0        0        0        0        0        0
new_tests       no  default        0        0        0        0        0        0
old_tests      yes  default        7       21       21        8        4        7

4/229 files are excluded from this view.
3/11 fnames are excluded from this view.

4 files have been excluded based on their file name.


There are 3 orphans that could not be attributed to any of the respective corpus's fnames.

Without any further parameters, ms3 detects only file types that it can potentially parse, i.e. MSCX, MSCZ, and TSV. In the following example, we infer the location of our local MuseScore 3 installation (if ‘auto’ fails, indicate the path to your executable). As a result, ms3 also shows formats that MuseScore can convert, such as XML, MIDI, or CAP.

>>> from ms3 import Parse
>>> p = Parse('~/ms3', ms='auto')
>>> p
WARNING  ms3.Parse.docs -- /home/hentsche/miniconda3/envs/ms3/lib/python3.10/site-packages/ms3/corpus.py (line 865) fnames_in_metadata():
	No metadata.tsv file has been detected for this Corpus object.
WARNING  ms3.Parse.new_tests -- /home/hentsche/miniconda3/envs/ms3/lib/python3.10/site-packages/ms3/corpus.py (line 865) fnames_in_metadata():
	No metadata.tsv file has been detected for this Corpus object.
[default|all]
All corpora
-----------
View: This view is called 'default'. It 
	- excludes fnames that are not contained in the metadata,
	- filters out file extensions requiring conversion (such as .xml), and
	- excludes review files and folders.

               has   active   scores measures    notes   labels expanded   events
          metadata     view detected detected detected detected detected detected
corpus                                                                           
docs            no  default        0        0        0        0        0        0
new_tests       no  default        0        0        0        0        0        0
old_tests      yes  default        7       21       21        8        4        7

4/229 files are excluded from this view.
3/11 fnames are excluded from this view.

4 files have been excluded based on their file name.


There are 3 orphans that could not be attributed to any of the respective corpus's fnames.

By default, present TSV files are detected and can be parsed as well, allowing one to access already extracted information without parsing the scores anew. In order to select only particular files, a regular expression can be passed to the parameter file_re. In the following example, only files ending on mscx are collected in the object ($ stands for the end of the filename, without it, files including the string ‘mscx’ anywhere in their names would be selected, too):

Caution

The parameter key will be deprecated from version 0.6.0 onwards. See Keys and IDs.

>>> from ms3 import Parse
>>> p = Parse('~/ms3', file_re='mscx$', key='ms3')
>>> p
WARNING  ms3.Parse.docs -- /home/hentsche/miniconda3/envs/ms3/lib/python3.10/site-packages/ms3/corpus.py (line 865) fnames_in_metadata():
	No metadata.tsv file has been detected for this Corpus object.
WARNING  ms3.Parse.new_tests -- /home/hentsche/miniconda3/envs/ms3/lib/python3.10/site-packages/ms3/corpus.py (line 865) fnames_in_metadata():
	No metadata.tsv file has been detected for this Corpus object.
[ferfried|default|all]
All corpora
-----------
View: This view is called 'ferfried'. It 
	- excludes fnames that are not contained in the metadata,
	- filters out file extensions requiring conversion (such as .xml),
	- excludes review files and folders, and
	- includes only files containing 'mscx$'.

               has    active   scores
          metadata      view detected
corpus                               
docs            no  ferfried        0
new_tests       no  ferfried        0
old_tests      yes  ferfried        7

187/229 files are excluded from this view.
3/11 fnames are excluded from this view.

187 files have been excluded based on their file name.


There are 3 orphans that could not be attributed to any of the respective corpus's fnames.

In this example, we assigned the key 'ms3'. Note that the same MSCX files that were distributed over several keys in the previous example are now grouped together. Keys allow operations to be performed on a particular group of selected files. For example, we could add MSCX files from another folder using the method add_dir():

>>> p.add_dir('~/other_folder', file_re='mscx$')
>>> p
WARNING  ms3.Parse.docs -- /home/hentsche/miniconda3/envs/ms3/lib/python3.10/site-packages/ms3/corpus.py (line 865) fnames_in_metadata():
	No metadata.tsv file has been detected for this Corpus object.
WARNING  ms3.Parse.new_tests -- /home/hentsche/miniconda3/envs/ms3/lib/python3.10/site-packages/ms3/corpus.py (line 865) fnames_in_metadata():
	No metadata.tsv file has been detected for this Corpus object.
[edwin|default|all|fons]
All corpora
-----------
This is a mixed view. Call _.info(view_name) to see a homogeneous one.

                          has active   scores
                     metadata   view detected
corpus                                       
docs                       no  edwin        0
new_tests                  no  edwin        0
old_tests                 yes  edwin        7
mozart_piano_sonatas      yes   fons      112

187/229 files are excluded from this view.
3/11 fnames are excluded from this view.

187 files have been excluded based on their file name.


There are 3 orphans that could not be attributed to any of the respective corpus's fnames.
Parse the scores.

In order to simply parse all registered MuseScore files, call the method parse_mscx(). Instead, you can pass the argument keys to parse only one (or several) selected group(s) to save time. The argument level controls how many log messages you see; here, it is set to ‘critical’ or ‘c’ to suppress all warnings:

>>> p.parse_mscx(keys='ms3', level='c')
>>> p
WARNING  ms3.Parse.docs -- /home/hentsche/miniconda3/envs/ms3/lib/python3.10/site-packages/ms3/corpus.py (line 865) fnames_in_metadata():
	No metadata.tsv file has been detected for this Corpus object.
WARNING  ms3.Parse.new_tests -- /home/hentsche/miniconda3/envs/ms3/lib/python3.10/site-packages/ms3/corpus.py (line 865) fnames_in_metadata():
	No metadata.tsv file has been detected for this Corpus object.
[gudula|default|all|throals]
All corpora
-----------
This is a mixed view. Call _.info(view_name) to see a homogeneous one.

                          has   active scores         
                     metadata     view parsed detected
corpus                                                
docs                       no   gudula      0        0
new_tests                  no   gudula      0        0
old_tests                 yes   gudula      7        7
mozart_piano_sonatas      yes  throals    112      112

187/229 files are excluded from this view.
3/11 fnames are excluded from this view.

187 files have been excluded based on their file name.


There are 3 orphans that could not be attributed to any of the respective corpus's fnames.

As we can see, only the files with the key ‘ms3’ were parsed and the table shows an overview of the counts of the included label types in the different notational layers (i.e. staff & voice), grouped by their colours.

Parsing options#

Parse.__init__(directory: Optional[Union[str, Collection[str]]] = None, recursive: bool = True, only_metadata_fnames: bool = True, include_convertible: bool = False, include_tsv: bool = True, exclude_review: bool = True, file_re: Optional[Union[str, Pattern]] = None, folder_re: Optional[Union[str, Pattern]] = None, exclude_re: Optional[Union[str, Pattern]] = None, file_paths: Optional[Collection[str]] = None, labels_cfg: dict = {}, ms=None, **logger_cfg)[source]

Initialize a Parse object and try to create corpora if directories and/or file paths are specified.

Parameters
  • directory – Path to scan for corpora.

  • recursive – Pass False if you don’t want to scan directory for subcorpora, but force making it a corpus instead.

  • only_metadata_fnames – The default view excludes piece names that are not listed in the corpus’ metadata.tsv file (e.g. when none was found). Pass False to include all pieces regardless. This might be needed when setting recursive to False.

  • include_convertible – The default view excludes scores that would need conversion to MuseScore format prior to parsing. Pass True to include convertible scores in .musicxml, .midi, .cap or any other format that MuseScore 3 can open. For on-the-fly conversion, however, the parameter ms needs to be set.

  • include_tsv – The default view includes TSV files. Pass False to disregard them and parse only scores.

  • exclude_review – The default view excludes files and folders whose name contains ‘review’. Pass False to include these as well.

  • file_re – Pass a regular expression if you want to create a view filtering out all files that do not contain it.

  • folder_re – Pass a regular expression if you want to create a view filtering out all folders that do not contain it.

  • exclude_re – Pass a regular expression if you want to create a view filtering out all files or folders that contain it.

  • file_paths – If directory is specified, the file names of these paths are used to create a filtering view excluding all other files. Otherwise, all paths are expected to be part of the same parent corpus which will be inferred from the first path by looking for the first parent directory that either contains a ‘metadata.tsv’ file or is a git. This parameter is deprecated and file_re should be used instead.

  • labels_cfg – Pass a configuration dict to detect only certain labels or change their output format.

  • ms – If you pass the path to your local MuseScore 3 installation, ms3 will attempt to parse musicXML, MuseScore 2, and other formats by temporarily converting them. If you’re using the standard path, you may try ‘auto’, or ‘win’ for Windows, ‘mac’ for MacOS, or ‘mscore’ for Linux. In case you do not pass the ‘file_re’ and the MuseScore executable is detected, all convertible files are automatically selected, otherwise only those that can be parsed without conversion.

  • **logger_cfg – Keyword arguments for changing the logger configuration. E.g. level='d' to see all debug messages.

Extracting score information#

One of ms3’s main functionalities is storing the information contained in parsed scores as tabular files (TSV format). More information on the generated files is summarized here

Using the commandline#

The most convenient way to achieve this is the command ms3 extract and its capital-letter parameters summarize the available tables:

-M [folder], --measures [folder]
                    Folder where to store TSV files with measure information needed for tasks such as unfolding repetitions.
-N [folder], --notes [folder]
                    Folder where to store TSV files with information on all notes.
-R [folder], --rests [folder]
                    Folder where to store TSV files with information on all rests.
-L [folder], --labels [folder]
                    Folder where to store TSV files with information on all annotation labels.
-X [folder], --expanded [folder]
                    Folder where to store TSV files with expanded DCML labels.
-F [folder], --form_labels [folder]
                    Folder where to store TSV files with all form labels.
-E [folder], --events [folder]
                    Folder where to store TSV files with all events (chords, rests, articulation, etc.) without further processing.
-C [folder], --chords [folder]
                    Folder where to store TSV files with <chord> tags, i.e. groups of notes in the same voice with identical onset and duration. The tables
                    include lyrics, dynamics, articulation, staff- and system texts, tempo marking, spanners, and thoroughbass figures.
-D [suffix], --metadata [suffix]
                    Set -D to update the 'metadata.tsv' files of the respective corpora with the parsed scores. Add a suffix if you want to update
                    'metadata{suffix}.tsv' instead.

The typical way to use this command for a corpus of scores is to keep the MuseScore files in a subfolder (called, for example, MS3) and to use the parameters’ default values, effectively creating additional subfolders for each extracted aspect next to each folder containing MuseScore files. For example if we take the folder structure of the ms3 repository:

ms3
├── docs
│   ├── cujus.mscx
│   ├── o_quam.mscx
│   ├── quae.mscx
│   └── stabat.mscx
└── tests
    ├── MS3
    │   ├── 05_symph_fant.mscx
    │   ├── 76CASM34A33UM.mscx
    │   ├── BWV_0815.mscx
    │   ├── D973deutscher01.mscx
    │   ├── Did03M-Son_regina-1762-Sarti.mscx
    │   ├── K281-3.mscx
    │   └── stabat_03_coloured.mscx
    └── repeat_dummies
        ├── repeats0.mscx
        ├── repeats1.mscx
        └── repeats2.mscx

Upon calling ms3 extract -N, two new notes folders containing note lists are created:

ms3
├── docs
│   ├── cujus.mscx
│   ├── o_quam.mscx
│   ├── quae.mscx
│   └── stabat.mscx
├── notes
│   ├── cujus.tsv
│   ├── o_quam.tsv
│   ├── quae.tsv
│   └── stabat.tsv
└── tests
    ├── MS3
    │   ├── 05_symph_fant.mscx
    │   ├── 76CASM34A33UM.mscx
    │   ├── BWV_0815.mscx
    │   ├── D973deutscher01.mscx
    │   ├── Did03M-Son_regina-1762-Sarti.mscx
    │   ├── K281-3.mscx
    │   └── stabat_03_coloured.mscx
    ├── notes
    │   ├── 05_symph_fant.tsv
    │   ├── 76CASM34A33UM.tsv
    │   ├── BWV_0815.tsv
    │   ├── D973deutscher01.tsv
    │   ├── Did03M-Son_regina-1762-Sarti.tsv
    │   ├── K281-3.tsv
    │   ├── repeats0.tsv
    │   ├── repeats1.tsv
    │   ├── repeats2.tsv
    │   └── stabat_03_coloured.tsv
    └── repeat_dummies
        ├── repeats0.mscx
        ├── repeats1.mscx
        └── repeats2.mscx

We witness this behaviour because the default value is ../notes, interpreted as relative path in relation to each MuseScore file. Alternatively, a relative path can be specified without initial ./ or ../, e.g. ms3 extract -N notes, to store the note lists in a recreated sub-directory structure:

ms3
├── docs
├── notes
│   ├── docs
│   └── tests
│       ├── MS3
│       └── repeat_dummies
└── tests
    ├── MS3
    └── repeat_dummies

A third option consists in specifying an absolute path which causes all note lists to be stored in the specified folder, e.g. ms3 extract -N ~/notes:

~/notes
├── 05_symph_fant.tsv
├── 76CASM34A33UM.tsv
├── BWV_0815.tsv
├── cujus.tsv
├── D973deutscher01.tsv
├── Did03M-Son_regina-1762-Sarti.tsv
├── K281-3.tsv
├── o_quam.tsv
├── quae.tsv
├── repeats0.tsv
├── repeats1.tsv
├── repeats2.tsv
├── stabat_03_coloured.tsv
└── stabat.tsv

Note that this leads to problems if MuseScore files from different subdirectories have identical filenames. In any case it is good practice to not use nested folders to allow for easier file access. For example, a typical DCML corpus will store all MuseScore files in the MS3 folder and include at least the folders created by ms3 extract -N -M -X:

.
├── harmonies
├── measures
├── MS3
└── notes

Extracting score information manually#

What ms3 extract effectively does is creating a Parse object, calling its method parse_mscx() and then store_lists(). In addition to the command, the method allows for storing two additional aspects, namely notes_and_rests and cadences (if the score contains cadence labels). For each of the available aspects, {notes, measures, rests, notes_and_rests, events, labels, chords, cadences, expanded}, the method provides two parameters, namely _folder (where to store TSVs) and _suffix, i.e. a slug appended to the respective filenames. If the parameter simulate=True is passed, no files are written but the file paths to be created are returned. Since corpora might have quite diverse directory structures, ms3 gives you various ways of specifying folders which will be explained in detail in the following section.

Briefly, the rules for specifying the folders are as follows:

  • absolute folder (e.g. ~/labels): Store all files in this particular folder without creating subfolders.

  • relative folder starting with ./ or ../: relative folders are created “at the end” of the original subdirectory structure, i.e. relative to the MuseScore files.

  • relative folder not starting with ./ or ../ (e.g. rests): relative folders are created at the top level (of the original directory or the specified root_dir) and the original subdirectory structure is replicated in each of them.

To see examples for the three possibilities, see the following section.

Specifying folders#

Consider a two-level folder structure contained in the root directory . which is the one passed to Parse:

.
├── docs
│   ├── cujus.mscx
│   ├── o_quam.mscx
│   ├── quae.mscx
│   └── stabat.mscx
└── tests
    └── MS3
        ├── 05_symph_fant.mscx
        ├── 76CASM34A33UM.mscx
        ├── BWV_0815.mscx
        ├── D973deutscher01.mscx
        ├── Did03M-Son_regina-1762-Sarti.mscx
        └── K281-3.mscx

The first level contains the subdirectories docs (4 files) and tests (6 files in the subdirectory MS3). Now we look at the three different ways to specify folders for storing notes and measures.

Absolute Folders#

When we specify absolute paths, all files are stored in the specified directories. In this example, the measures and notes are stored in the two specified subfolders of the home directory ~, regardless of the original subdirectory structure.

>>> p.store_lists(notes_folder='~/notes', measures_folder='~/measures')
~
├── measures
│   ├── 05_symph_fant.tsv
│   ├── 76CASM34A33UM.tsv
│   ├── BWV_0815.tsv
│   ├── cujus.tsv
│   ├── D973deutscher01.tsv
│   ├── Did03M-Son_regina-1762-Sarti.tsv
│   ├── K281-3.tsv
│   ├── o_quam.tsv
│   ├── quae.tsv
│   └── stabat.tsv
└── notes
    ├── 05_symph_fant.tsv
    ├── 76CASM34A33UM.tsv
    ├── BWV_0815.tsv
    ├── cujus.tsv
    ├── D973deutscher01.tsv
    ├── Did03M-Son_regina-1762-Sarti.tsv
    ├── K281-3.tsv
    ├── o_quam.tsv
    ├── quae.tsv
    └── stabat.tsv
Relative Folders#

In contrast, specifying relative folders recreates the original subdirectory structure. There are two different possibilities for that. The first possibility is naming relative folder names, meaning that the subdirectory structure (docs and tests) is recreated in each of the folders:

>>> p.store_lists(root_dir='~/tsv', notes_folder='notes', measures_folder='measures')
~/tsv
├── measures
│   ├── docs
│   │   ├── cujus.tsv
│   │   ├── o_quam.tsv
│   │   ├── quae.tsv
│   │   └── stabat.tsv
│   └── tests
│       └── MS3
│           ├── 05_symph_fant.tsv
│           ├── 76CASM34A33UM.tsv
│           ├── BWV_0815.tsv
│           ├── D973deutscher01.tsv
│           ├── Did03M-Son_regina-1762-Sarti.tsv
│           └── K281-3.tsv
└── notes
    ├── docs
    │   ├── cujus.tsv
    │   ├── o_quam.tsv
    │   ├── quae.tsv
    │   └── stabat.tsv
    └── tests
        └── MS3
            ├── 05_symph_fant.tsv
            ├── 76CASM34A33UM.tsv
            ├── BWV_0815.tsv
            ├── D973deutscher01.tsv
            ├── Did03M-Son_regina-1762-Sarti.tsv
            └── K281-3.tsv

Note that in this example, we have specified a root_dir. Leaving this argument out will create the same structure in the directory from which the Parse object was created, i.e. the folder structure would be:

.
├── docs
├── measures
│   ├── docs
│   └── tests
│       └── MS3
├── notes
│   ├── docs
│   └── tests
│       └── MS3
└── tests
    └── MS3

If, instead, you want to create the specified relative folders relative to each MuseScore file’s location, specify them with an initial dot. ./ means “relative to the original path” and ../ one level up from the original path. To exemplify both:

>>> p.store_lists(root_dir='~/tsv', notes_folder='./notes', measures_folder='../measures')
~/tsv
├── docs
│   └── notes
│       ├── cujus.tsv
│       ├── o_quam.tsv
│       ├── quae.tsv
│       └── stabat.tsv
├── measures
│   ├── cujus.tsv
│   ├── o_quam.tsv
│   ├── quae.tsv
│   └── stabat.tsv
└── tests
    ├── measures
    │   ├── 05_symph_fant.tsv
    │   ├── 76CASM34A33UM.tsv
    │   ├── BWV_0815.tsv
    │   ├── D973deutscher01.tsv
    │   ├── Did03M-Son_regina-1762-Sarti.tsv
    │   └── K281-3.tsv
    └── MS3
        └── notes
            ├── 05_symph_fant.tsv
            ├── 76CASM34A33UM.tsv
            ├── BWV_0815.tsv
            ├── D973deutscher01.tsv
            ├── Did03M-Son_regina-1762-Sarti.tsv
            └── K281-3.tsv

The notes folders are created in directories where MuseScore files are located, and the measures folders one directory above, respectively. Leaving out the root_dir argument would lead to the same folder structure but in the directory from which the Parse object has been created. In a similar manner, the arguments p.store_lists(notes_folder='.', measures_folder='.') would create the TSV files just next to the MuseScore files. However, this would lead to warnings such as

Warning

The notes at ~/ms3/docs/cujus.tsv have been overwritten with measures.

In such a case we need to specify a suffix for at least one of both aspects:

p.store_lists(notes_folder='.', notes_suffix='_notes',
              measures_folder='.', measures_suffix='_measures')
Examples#

Before you are sure to have picked the right parameters for your desired output, you can simply use the simulate=True argument which lets you view the paths without actually creating any files. In this variant, all aspects are stored each in individual folders but with identical filenames:

Caution

The parameter key will be deprecated from version 0.6.0 onwards. See Keys and IDs.

>>> p = Parse('~/ms3/docs', key='pergo')
>>> p.parse_mscx()
>>> p.store_lists(  notes_folder='./notes',
                    rests_folder='./rests',
                    notes_and_rests_folder='./notes_and_rests',
                    simulate=True
                    )
['~/ms3/docs/notes/cujus.tsv',
 '~/ms3/docs/rests/cujus.tsv',
 '~/ms3/docs/notes_and_rests/cujus.tsv',
 '~/ms3/docs/notes/o_quam.tsv',
 '~/ms3/docs/rests/o_quam.tsv',
 '~/ms3/docs/notes_and_rests/o_quam.tsv',
 '~/ms3/docs/notes/quae.tsv',
 '~/ms3/docs/rests/quae.tsv',
 '~/ms3/docs/notes_and_rests/quae.tsv',
 '~/ms3/docs/notes/stabat.tsv',
 '~/ms3/docs/rests/stabat.tsv',
 '~/ms3/docs/notes_and_rests/stabat.tsv']

In this variant, the different ways of specifying folders are exemplified. To demonstrate all subtleties we parse the same four files but this time from the perspective of ~/ms3:

>>> p = Parse('~/ms3', folder_re='docs', key='pergo')
>>> p.parse_mscx()
>>> p.store_lists(  notes_folder='./notes',            # relative to ms3/docs
                    measures_folder='../measures',     # one level up from ms3/docs
                    rests_folder='rests',              # relative to the parsed directory
                    labels_folder='~/labels',          # absolute folder
                    expanded_folder='~/labels', expanded_suffix='_exp',
                    simulate = True
                    )
['~/ms3/docs/notes/cujus.tsv',
 '~/ms3/rests/docs/cujus.tsv',
 '~/ms3/measures/cujus.tsv',
 '~/labels/cujus.tsv',
 '~/labels/cujus_exp.tsv',
 '~/ms3/docs/notes/o_quam.tsv',
 '~/ms3/rests/docs/o_quam.tsv',
 '~/ms3/measures/o_quam.tsv',
 '~/labels/o_quam.tsv',
 '~/labels/o_quam_exp.tsv',
 '~/ms3/docs/notes/quae.tsv',
 '~/ms3/rests/docs/quae.tsv',
 '~/ms3/measures/quae.tsv',
 '~/labels/quae.tsv',
 '~/labels/quae_exp.tsv',
 '~/ms3/docs/notes/stabat.tsv',
 '~/ms3/rests/docs/stabat.tsv',
 '~/ms3/measures/stabat.tsv',
 '~/labels/stabat.tsv',
 '~/labels/stabat_exp.tsv']

Column Names#

Glossary of the meaning and types of column types. In order to correctly restore the types when loading TSV files, either use an Annotations object or the function load_tsv().

General Columns#

duration#

fractions.Fraction

Duration of an event expressed in fractions of a whole note. Note that in note lists, the duration does not take into account if notes are tied together; in other words, the column expresses no durations that surpass the final bar line.

duration_qb#

float

Duration expressed in quarter notes. If the column {{ duration }} is present it corresponds to that column times four. Otherwise (e.g. for labels) it is computed from an IntervalIndex created from the {{ quarterbeats }} column.

keysig Key Signatures#

int

The feature keysig represents the key signature of a particular measure. It is an integer which, if positive, represents the number of sharps, and if negative, the number of flats. E.g.: 3: three sharps, -2: two flats, 0: no accidentals.

mc Measure Counts#

int

Measure count, identifier for the measure units in the XML encoding. Always starts with 1 for correspondence to MuseScore’s status bar. For more detailed information, please refer to Measure counts (MC) vs. measure numbers (MN).

mn Measure Numbers#

int

Measure number, continuous count of complete measures as used in printed editions. Starts with 1 except for pieces beginning with a pickup measure, numbered as 0. MNs are identical for first and second endings! For more detailed information, please refer to Measure counts (MC) vs. measure numbers (MN).

mc_onset#

fractions.Fraction

The value for mc_onset represents, expressed as fraction of a whole note, a position in a measure where 0 corresponds to the earliest possible position (in most cases beat 1). For more detailed information, please refer to Onset positions.

mn_onset#

fractions.Fraction

The value for mn_onset represents, expressed as fraction of a whole note, a position in a measure where 0 corresponds to the earliest possible position of the corresponding measure number (MN). For more detailed information, please refer to Onset positions.

quarterbeats#

fractions.Fraction

This column expresses positions, otherwise accessible only as a tuple (mc, mc_onset), as a running count of quarter notes from the piece’s beginning (quarterbeat = 0). If second endings are present in the score, only the second ending is counted in order to give authentic values to such a score, as if played without repetitions (third endings and more are also ignored). If repetitions are unfolded, i.e. the table corresponds to a full play-through of the score, all endings are taken into account correctly.

For the specific case you need continuous quarterbeats including all endings, please refer to {{ quarterbeats_all_endings }}.

Computation of quarterbeats requires an offset_dict that is computed from the column {{ act_dur }} contained in every Measures table. Quarterbeats are based on the cumulative sum of that column, meaning that they take the length of irregular measures into account.

staff#

int

In which staff an event occurs. 1 = upper staff.

timesig Time Signatures#

str

The time signature timesig of a particular measure is expressed as a string, e.g. '2/2'. The actual duration of a measure can deviate from the time signature for notational reasons: For example, a pickup bar could have an actual duration of 1/4 but still be part of a '3/8' meter, which usually has an actual duration of 3/8.

volta#

int

In the case of first and second (third etc.) endings, this column holds the number of every “bracket”, “house”, or _volta_, which should increase from 1. This is required for MS3’s unfold repeats function to work. For more information, see here.

voice#

int

In which notational layer an event occurs. Each staff has (can have) up to four layers:

  • 1 = upper, default layer (blue)

  • 2 = second layer, downward stems (green)

  • 3 = third layer, upward stems (orange)

  • 4 = fourth layer, downward stems (purple)

Measures#

act_dur Actual duration of a measure#

fractions.Fraction

The value of act_dur in most cases equals the time signature, expressed as a fraction; meaning for example that a “normal” measure in 6/8 has act_dur = 3/4. If the measure has an irregular length, for example a pickup measure of length 1/8, would have act_dur = 1/8.

The value of act_dur plays an important part in inferring MNs from MCs. See also the columns dont_count and numbering_offset.

barline#

str

The column barline encodes information about the measure’s final bar line.

breaks#

str

The column breaks may include three different values: {'line', 'page', 'section'} which represent the different breaks types. In the case of section breaks, MuseScore

dont_count Measures excluded from bar count#

int

This is a binary value that corresponds to MuseScore’s setting Exclude from bar count from the Bar Properties menu. The value is 1 for pickup bars, second MCs of divided MNs and some volta measures, and NaN otherwise.

mc_offset Offset of a MC#

fractions.Fraction

The column mc_offset , in most cases, has the value 0 because it expresses the deviation of this MC’s mc_onset 0 (beginning of the MC) from beat 1 of the corresponding MN. If the value is a fraction > 0, it means that this MC is part of a MN which is composed of at least two MCs, and it expresses the current MC’s offset in terms of the duration of all (usually 1) preceding MCs which are also part of the corresponding MN. In the standard case that one MN would be split in two MCs, the first MC would have mc_offset = 0 , and the second one mc_offset = the previous MC's act_dur .

next#

tuple

Every cell in this column has at least one integer, namely the MC of the subsequent bar, or -1 in the cast of the last. In the case of repetitions, measures can have more than one subsequent MCs, in which case the integers are separated by ', ' .

The column is used for checking whether irregular measure lengths even themselves out because otherwise the inferred MNs might be wrong. Also, it is needed for MS3’s unfold repetitions functionality.

numbering_offset Offsetting MNs#

int

MuseScore’s measure number counter can be reset at a given MC by using the Add to bar number setting from the Bar Properties menu. If numbering_offset ≠ 0, the counting offset is added to the current MN and all subsequent MNs are inferred accordingly.

Scores which include several pieces (e.g. in variations or a suite), sometimes, instead of using section breaks, use numbering_offset to simulate a restart for counting MNs at every new section. This leads to ambiguous MNs.

quarterbeats_all_endings#

fractions.Fraction

Since the computation of {{ quarterbeats }} for pieces including alternative endings (voltas) excludes all but the second endings, the measures of such pieces get this additional column, allowing to create an offset_dict for users who need continuous quarterbeats including all endings. In that case one would call

from ms3 import add_quarterbeats_col
offset_dict = measures.quarterbeats_all_endings.to_dict()
df_with_gapless_quarterbeats = add_quarterbeats_col(df, offset_dict)

repeats#

str

The column repeats indicates the presence of repeat signs and can have the values {'start', 'end', 'startend', 'firstMeasure', 'lastMeasure'}. MS3 performs a test on the repeat signs’ plausibility and throws warnings when some inference is required for this.

The repeats column needs to have the correct repeat sign structure in order to have a correct next column which, in return, is required for MS3’s unfolding repetitions functionality.

Notes and Rests#

chord_id#

int

Every note keeps the ID of the <Chord> tag to which it belongs in the score. This is necessary because in MuseScore XML, most markup (e.g. articulation, lyrics etc.) are attached to chords rather than to individual notes. This column allows for relating markup to notes at a later point.

gracenote#

str

For grace notes, type of the grace note as encoded in the MuseScore source code. They are assigned a duration of 0.

midi Piano key#

int

MIDI pitch with 60 = C4, 61 = C#4/Db4/B##3 etc. For the actual note name, refer to the tpc column.

nominal_duration#

fractions.Fraction

Note’s or rest’s duration without taking into account dots or tuplets. Multiplying by scalar results in the actual duration.

scalar#

fractions.Fraction

Value reflecting dots and tuples by which to multiply a note’s or rest’s nominal_duration.

tied#

int

Encodes ties on the note’s left (-1), on its right (1) or both (0). A tie merges a note with an adjacent one having the same pitch.

value

explanation

<NA>

No ties. This note represents an onset and ends after the given duration.

1

This note is tied to the next one. It represents an onset but not a note ending.

0

This note is being tied to and tied to the next one. It represents neither an onset nor a note ending.

-1

This note is being tied to. That is, it does not represent an onset, instead it adds to the duration of a previous note on the same pitch and ends it.

tpc Tonal pitch class#

int

Encodes note names by their position on the line of fifth with 0 = C, 1 = G, 2 = D, -1 = F, -2 = Bb etc. The octave is defined by midi DIV 12 - 1

tremolo#

str

The syntax for this column is <dur>_<type>_<component> where <dur> is half the duration of the tremolo, <type> is the tremolo type, e.g. c32 for 3 beams or c64 for 4 (values taken from the source code), and <component> is 1 for notes in the first and 2 for notes in the second <Chord>.

Explanation: MuseScore 3 encodes the two components of a tremolo as two separate <Chord> tags with half the duration of the tremolo. This column serves to keep the information of the two components although onsets and durations in the Notes are corrected to represent the fact that all notes are sounding through the duration of the tremolo.

For example, an octave tremolo with duration of a dotted half note and tremolo frequency of 32nd notes will appear in the score as a dotted half on beat 1 and another dotted half 3 eights later. In the note list, however, both notes have mc_onset 0 and duration 3/4. The column tremolo has the value 3/8_c32_1 for the first note and 3/8_c32_1 for the second.

Chords#

The various <Chord> tags are identified by increasing integer counts in the column chord_id. Within a note list, a column of the same name specifies which note belongs to which <Chord> tag. A chord and all the notes belonging to it have identical values in the columns mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, gracenote, nominal_duration, scalar, volta, and of course chord_id.

articulation#

str

Articulation signs named as in the MuseScore file, e.g. articStaccatoBelow.

dynamics#

str

Dynamic signs such as p, ff etc. Other dynamic markings such as dolce are currently displayed as other-dynamics. Velocity values are currently not extracted. These features can easily be implemented upon request.

lyrics:1#

str

When a voice includes only a single verse, all syllables are contained in the column lyrics:1. If it has more than one verse, for each <Chord> the last verse’s syllable is contained in the respective column, e.g. lyrics:3 if the 3rd verse is the last one with a syllable for this chord. Each syllable has a trailing - if it’s the first syllable of a word, a leading - if it’s the last syllable of a word, and both if it’s in the middle of a word.

qpm Quarter notes per minute#

int

Defined for every {{ tempo }} mark. Normalizes the metronome value to quarter notes. For example, 𝅘𝅥. = 112 gets the value qbm = 112 * 1.5 = 168.

staff_text#

str

Free-form text such as dolce or div.. Depending on the encoding standard, this layer may include dynamics such as cresc., articulation such as stacc., movement titles, and many more. Staff texts are added in MuseScore via [C] + T.

system_text#

Free-form text not attached to a particular staff but to the entire system. This frequently includes movement names or playing styles such as Swing. System texts are added in MuseScore via [C] + [S] + T.

tempo#

Metronome markings and tempo texts. Unfortunately, for tempo texts that include a metronome mark, e.g. Larghetto. (𝅘𝅥 = 63), the text before the 𝅘𝅥 symbol is lost. This can be fixed upon request.

Spanners#

str (-> tuple)

Spanners designate markup that spans several <Chord> tags, such as slurs, hairpins, pedal, trill and ottava lines. The values in a spanner column are IDs such that all chords with the same ID belong to the same spanner. Each cell can have more than one ID, separated by commas. For evaluating spanner columns, the values should be turned into tuples.

Spanners span all chords belonging to the same {{ staff }}, except for slurs and trills which span only chords in the same {{ voice }}. In other words, won’t find the ending of a slur that goes from one {{ voice }} to another.

slur#

str (-> tuple)

Slurs expressing legato and/or phrasing. These spanners always pertain to a particular {{ voice }}.

(de)crescendo_hairpin#

str (-> tuple)

crescendo_hairpin is a < spanner, decrescendo_hairpin a > spanner. These always pertain to an entire {{ staff }}.

crescendo_line, diminuendo_line#

str (-> tuple)

These are spanners starting with a word, by default cresc. or dim., followed by a dotted line. These always pertain to an entire {{ staff }}.

Ottava#

str (-> tuple)

These spanners are always specified with a subtype such as Ottava:8va or Ottava:15mb. They always pertain to an entire {{ staff }}

pedal#

str (-> tuple)

Pedal line spanners always pertain to an entire {{ staff }}.

TextLine#

str (-> tuple)

Custom staff text with a line that can be prolonged at will.

Trill#

str

Trills spanners can have different subtypes specified after a colon, e.g. 'Trill:trill'. They always pertain to a particular {{ voice }}.

Labels#

harmony_layer#

int

This column indicates the harmony layer, or label type, in/as which a label has been stored. It is an integer within [0, 3] that indicates how it is encoded in MuseScore.

harmony_layer

explanation

0

Label encoded in MuseScore’s chord layer (Add->Text->Chord Symbol, or [C]+K) that does not start with a note name, i.e. MuseScore did not recognize it as an absolute chord and encoded it as plain text (compare type 3).

1

Roman Numeral (Add->Text->Roman Numeral Analysis).

2

Nashville number (Add->Text->Nashville Number).

3

Label encoded in MuseScore’s chord layer (Add->Text->Chord Symbol, or [C]+K) that does start with a note name, i.e. MuseScore did recognize it as an absolute chord and encoded its root (and bass note) as numerical values.

label#

str

Annotation labels from MuseScores <Harmony> tags. Depending on the {{ label_type }} the column can include complete strings (decoded) or partial strings (encoded).

regex_match#

str

Name of the first regular expression that matched a label, e.g. ‘dcml’.

label_type#

Warning

Deprecated since 0.6.0 where this column has been split and replaced by {{ harmony_layer }} and {{ regex_match }}

str

See label types above.

offset_x and offset_y#

float

Offset positions for labels whose position has been manually altered. Of importance mainly for re-inserting labels into a score at the exact same position.

Expanded#

str

Alternative reading to the {{ label }}. Generally considered “second choice” compared to the “main label” that has been expanded.

bass_note#

int

The bass note designated by the label, expressed as scale degree.

cadence#

str

Currently allows for the values

value

cadence

PAC

perfect authentic

IAC

imperfect authentic

HC

half

DC

deceptive

EC

evaded

PC

plagal

chord#

str

This column stands in no relation to the <Chord> tags discussed above. Instead, it holds the substring of the original labels that includes only the actual chord label, i.e. excluding information about modulations, pedal tones, phrases, and cadences. In other words, it comprises the features {{ numeral }}, {{ form }}, {{ figbass }}, {{ changes }}, and {{ relativeroot }}.

chord_tones, added_tones#

str (-> tuple)

Chord tones designated by the label, expressed as scale degrees. Includes 3 scale degrees for triads, 4 for tetrads, ordered according to the inversion (i.e. the first value is the {{ bass_note }}). Accounts for chord tone replacement expressed through intervals <= 8 within parentheses, without leading +. added_tones reflects only those non-chord tones that were added using, again within parentheses, intervals preceded by + or/and greater than 8.

chord_type#

str

A summary of information that otherwise depends on the three columns {{ numeral }}, {{ form }}, {{ figbass }}. It can be one of the wide-spread abbreviations for triads: M, m, o, + or for seventh chords: o7, %7, +7, +M7 (for diminished, half-diminished and augmented chords with minor/major seventh), or Mm7, mm7, MM7, mM7 for all combinations of a major/minor triad with a minor/major seventh.

figbass Inversion#

Figured bass notation of the chord inversion. For triads, this feature can be <NA>, '6', '64', for seventh chords '7', '65', '43', '2'. This column plays into computing the {{ chord_type }}. This feature is decisive for which chord tone is in the bass.

form#

str

This column conveys part of the information what {{ chord_type }} a label expresses.

value

chord type

<NA>

If {{ figbass }} is one of <NA>, '6', '64', the chord is either a major or minor triad. Otherwise, it is either a major or a minor chord with a minor seventh.

o, +

Diminished or augmented chord. Again, it depends on {{ figbass }} whether it is a triad or a seventh chord.

%, M, +M

Half diminished or major seventh chord. For the latter, the chord form (MM7 or mM7) depends on the {{ numeral }}.

globalkey#

str

Tonality of the piece, expressed as absolute note name, e.g. Ab for A flat major, or g# for G sharp minor.

globalkey_is_minor#

bool

Auxiliary column which is True if the {{ globalkey }} is a minor key, False otherwise.

localkey#

str

Local key expressed as Roman numeral relative to the {{ globalkey }}, e.g. IV for the major key on the 4th scale degree or #iv for the minor scale on the raised 4th scale degree.

localkey_is_minor#

bool

Auxiliary column which is True if the {{ localkey }} is a minor key, False otherwise.

numeral#

str

Roman numeral defining the chordal root relative to the local key. An uppercase numeral stands for a major chordal third, lowercase for a minor third. The column {{ root }} expresses the same information as scale degree.

phraseend Phrase annotations#

In versions < 2.2.0, only phrase endings where annotated, designated by \\. From version 2.2.0 onwards, { means beginning and } ending of a phrase. Everything between } and the subsequent { is to be considered as part of the previous phrase, a ‘codetta’ after the strong end point.

relativeroot Tonicized key#

str

This feature designates a lower-level key to which the current chord relates. It is expressed relative to the local key. For example, if the current {{ numeral }} is a V and it is a secondary dominant, relativeroot is the Roman numeral of the key that is being tonicized.

root#

int

The {{ numeral }} expressed as scale degree.

Metadata#

If not otherwise specified, metadata fields are of type str.

fname#

str

File name without extension. Serves as ID for linking files that belong to the same piece although they might have different suffixes and file extensions. It follows that only files will be detected as belonging to this score whose file names are at least as long. In other words, the main score file that is to be considered as the most up-to-date version of the data should ideally not come with a suffix.

rel_path#

str

Relative file path of the score, including extension.

Metadata extracted with older versions of ms3 (<1.0.0) would come instead with the column rel_paths which would include the relative folder path without the file itself. This value can now be found in the column subdirectory.

subdirectory#

str

Folder where the score is located, relative to the corpus_path. Equivalent to rel_path but without the file.

composer#

str

Composer name as it would figure in the English Wikipedia (although middle names may be dropped).

workTitle#

str

Title of the whole composition (cycle), even if the score holds only a part of it. It should not contain opus or other catalogue numbers, which go into the workNumber column/field.

The title of the part included in this score, be it a movement or, for instance, a song within a song cycle, goes into the movementTitle column/field.

workNumber#

str

Catalogue number(s), e.g. op. 30a.

movementNumber#

str

If applicable, the sequential number of the movement or part of a cycle contained in this score. In other words, the string should probably be interpretable as a number; a second movement should have the value 2, not II.

movementTitle#

str

If applicable, the name of the movement or part of a cycle contained in this score.

source#

str

If applicable, the URL to the online score that this file has been derived from.

typesetter#

str

Name or user profile URL of the person who first engraved this score.

annotators#

str

Creator(s) of the chord, phrase, cadence, and/or form labels pertaining to the DCML harmony annotation standard.

reviewers#

str

Reviewer(s) of the chord, phrase, cadence, and/or form labels pertaining to the DCML harmony annotation standard.

wikidata#

str

URL of the WikiData item describing the piece that this score represents.

viaf#

str

URL of the Virtual International Authority File (VIAF) entry identifying the piece that this score represents.

musicbrainz#

str

MusicBrainz URI identifying the piece that this score represents.

imslp#

str

URL to the wiki page within the International Music Score Library Project (IMSLP) that identifies this score.

composed_start#

str of length 4 or ..

Year in which the composing began. If there is evidence that composing the piece took more than one year but only the composed_end of the time span is known, this value should be ... In all other cases the string should be composed of four integers so that it can be converted to a number.

Collecting (composed_start, composed_end) year values was a conscious decision against more elaborate indications such as the Extended Date/Time Format (EDTF), based on a trade-off.

composed_end#

str of length 4 or ..

Year in which the composition was finished, or in which it was published for the first time. If there is evidence that composing the piece took more than one year but only the composed_start of the time span is known, this value should be ... In all other cases the string should be composed of four integers so that it can be converted to a number.

Collecting (composed_start, composed_end) year values was a conscious decision against more elaborate indications such as the Extended Date/Time Format (EDTF), based on a trade-off.

last_mn#

int

Last measure number (i.e., the length of the score as number of complete measures).

last_mn_unfolded#

int

Number of measures when playing all repeats.

length_qb#

float

Length of the piece, measured in quarter notes.

length_qb_unfolded#

float

Length of the piece when playing all repeats, measured in quarter notes.

volta_mcs#

tuple

Measure counts of first and second (and further) endings. For example, (((16,), (17,)), ((75, 76), (77, 78))) would stand for two sets of two brackets, the first one with two endings of length 1 (probably measure numbers 16a and 16b) and the second one for two endings of length 2, starting in MC 75.

The name comes from Italian “prima/seconda volta” for “first/second time”.

all_notes_qb#

float

Summed up duration of all notes, measured in quarter notes.

n_onsets#

int

Number of all note onsets. This number is at most the number of rows in the corresponding notes table which, in return, is the number of all note heads. n_onsets does not count tied-to note heads (which do not represent onsets).

n_onset_positions#

int

Number of unique note onsets (“slices”).

guitar_chord_count#

int

Number of all <Harmony> labels that do not match the DCML harmony annotation standard. In most cases, they will be so-called guitar or Jazz chords (“changes”) as used in lead sheets, pop and folk songs, etc.

label_count#

int

Number of chord labels that match the DCML harmony annotation standard.

For metadata extracted with older versions of ms3 (<1.0.0) this value would represent the number of all <Harmony> labels including guitar/Jazz chords.

KeySig Key signatures#

str

Key signature(s) (negative = flats, positive = sharps) and their position(s) in the score. A score in C major would have the value 1: 0, i.e. zero accidentals in MC 1, the first <Measure> tag. A score with the key signatures of C minor (3 flats), G minor (1 flat) and G major (1 sharp) could have, for example, 1: -3, 39: -1, 67: 1. In other words, the values are like dictionaries without curly braces.

The column name is in CamelCase, other than the keysig Key Signatures column found in Measures tables.

TimeSig Time Signatures#

str

Time signature(s) and their position(s) in the score. A score entirely in 4/4 would have the value 1: 4/4, where 1 is the MC of the first <Measure> tag. A score with time signature changes could have, for example, 1: 4/4, 39: 6/8, 67: 4/4. In other words, the values are like dictionaries without curly braces.

The column name is in CamelCase, other than the timesig Time Signatures column found in Measures tables.

musescore#

str

MuseScore version that has been used to save this score, e.g. 3.6.2.