EventData

EventData(table, unit, number_type=NumberType.float)

PyArrow-based storage for timeline events.

EventData wraps a PyArrow table containing events. Events are rows in the table, not Python wrapper objects. The primary API is bulk operations:

  • from_dicts(): Create from list of row dictionaries
  • from_arrays(): Create from column-oriented arrays
  • from_dataframe(): Create from pandas DataFrame

The schema is fixed at class definition time but can be extended by subclasses to add domain-specific columns (e.g., pitch, velocity for notes).

NOTE: This class was renamed from EventStore to EventData in the 2026-01 API refactoring. EventStore now refers to the container class (formerly EventBundle) that holds one or more EventData tables.

Attributes: table: The underlying PyArrow table. unit: The time unit for all coordinates. number_type: The number type used for coordinates.

Examples: >>> data = EventData.from_dicts([ … {“id”: “e1”, “temporal_type”: “instant”, “event_type”: “Beat”, … “instant”: 0.0}, … {“id”: “e2”, “temporal_type”: “interval”, “event_type”: “Note”, … “start”: 0.0, “end”: 1.0}, … ], unit=TimeUnit.seconds) >>> len(data) 2

Attributes

Name Description
count The number of events in the store.
number_type The number type for coordinate interpretation.
schema The PyArrow schema of the underlying table.
table The underlying PyArrow table.
unit The time unit for coordinates.

Methods

Name Description
column_names Get the list of column names for this EventData class.
concat Concatenate with other EventData, returning a new EventData.
coordinate_range Get the min and max coordinates across all events.
count_by Count events grouped by a column’s values.
create_timeline Create a Timeline from this EventData.
empty Create an empty EventData.
event_types Get the list of unique event types.
extend Extend this data with events from another EventData (in-place).
filter Filter events by criteria, returning a new EventData.
from_arrays Create EventData from column-oriented arrays (VECTORIZED).
from_arrays_legacy Legacy from_arrays using row-based coordinate_to_struct.
from_dataframe Create EventData from a pandas DataFrame.
from_dicts Create EventData from a list of row dictionaries.
from_parquet Load EventData from a Parquet file.
get_schema Get the canonical PyArrow schema for this EventData class.
prefix_ids Return a new EventData with all event IDs prefixed.
select Select specific columns from the table.
summary Get a comprehensive summary of the store.
to_dataframe Convert to a DataFrame in the specified format.
to_pandas Convert to a pandas DataFrame.
to_parquet Save the EventData to a Parquet file.
where Filter with a custom PyArrow compute expression.

column_names

EventData.column_names()

Get the list of column names for this EventData class.

Returns: List of all column names (base + extra).

concat

EventData.concat(*others)

Concatenate with other EventData, returning a new EventData.

Args: *others: Other EventData to concatenate (extra columns are allowed and will be merged using schema promotion).

Returns: A new EventData containing all events.

Raises: ValueError: If any units don’t match.

coordinate_range

EventData.coordinate_range()

Get the min and max coordinates across all events.

Returns: Tuple of (min, max) coordinates, or None if store is empty.

count_by

EventData.count_by(column)

Count events grouped by a column’s values.

Args: column: The column to group by.

Returns: Dict mapping column values to counts.

create_timeline

EventData.create_timeline(uid=None, filters=None)

Create a Timeline from this EventData.

This is a convenience method that creates a timeline with the data’s events directly. The timeline class and number_type are inferred from the data’s unit (e.g., ticks -> DiscreteLogicalTimeline with int).

Args: uid: Unique ID for the timeline. Auto-generated if None. filters: Filter kwargs to apply before timeline creation. Example: {“event_type”: “Note”} to exclude rests.

Returns: A Timeline containing the (filtered) events.

Examples: >>> timeline = data.create_timeline(uid=“notes”) >>> filtered = data.create_timeline(filters={“event_type”: “Note”})

empty

EventData.empty(unit, number_type=NumberType.float)

Create an empty EventData.

Args: unit: The time unit for coordinates. number_type: The number type for coordinates.

Returns: An empty EventData with the appropriate schema.

event_types

EventData.event_types()

Get the list of unique event types.

Returns: List of event type names.

extend

EventData.extend(other)

Extend this data with events from another EventData (in-place).

Args: other: Another EventData with compatible schema (extra columns are allowed and will be merged using schema promotion).

Raises: ValueError: If units don’t match.

filter

EventData.filter(
    temporal_type=None,
    event_type=None,
    min_coord=None,
    max_coord=None,
    **kwargs,
)

Filter events by criteria, returning a new EventData.

All criteria are AND-ed together.

Args: temporal_type: Filter by “instant” or “interval”. event_type: Filter by event type name. min_coord: Minimum coordinate (inclusive). max_coord: Maximum coordinate (exclusive). **kwargs: Exact match filters for other columns (e.g. event_category=“note”).

Returns: A new EventData with filtered events.

from_arrays

EventData.from_arrays(
    columns,
    unit,
    number_type=NumberType.float,
    *,
    validate=True,
    extra_fields=None,
)

Create EventData from column-oriented arrays (VECTORIZED).

This is the PRIMARY construction method for loaders. All operations are vectorized - NO row iteration occurs.

Args: columns: Dict mapping column names to arrays. Supports: - np.ndarray: NumPy arrays - pa.Array: PyArrow arrays (including StructArray for coords) - list: Python lists (converted to numpy)

    For coordinate columns (start, end, duration):
    - If pa.StructArray: used directly
    - If numeric/string array: parsed via CoordinateParser

unit: The time unit for coordinates.
number_type: The number type for coordinates.
validate: Whether to validate arrays before table construction.
extra_fields: Optional list of PyArrow fields for extra columns.
    These fields include metadata (e.g., unit for CoordinateFields).
    If not provided, fields are inferred from the data arrays.

Returns: A new EventData containing the events.

Raises: ValueError: If validation fails (missing columns, length mismatch, etc.)

Examples: >>> # Vectorized construction from arrays >>> data = EventData.from_arrays({ … “id”: np.array([“e1”, “e2”]), … “temporal_type”: np.array([“instant”, “instant”]), … “event_type”: np.array([“Beat”, “Beat”]), … “start”: CoordinateParser.parse([0, 480], NumberType.int, unit), … }, unit=TimeUnit.ticks)

>>> # Direct from loader output (StructArrays already parsed)
>>> data = EventData.from_arrays(loader_columns, unit=TimeUnit.quarters)

from_arrays_legacy

EventData.from_arrays_legacy(columns, unit, number_type=NumberType.float)

Legacy from_arrays using row-based coordinate_to_struct.

DEPRECATED: Use from_arrays() instead for vectorized operations.

Args: columns: Dict mapping column names to lists of values. unit: The time unit for coordinates. number_type: The number type for coordinates.

Returns: A new EventData containing the events.

from_dataframe

EventData.from_dataframe(df, unit, number_type=NumberType.float)

Create EventData from a pandas DataFrame.

Args: df: DataFrame with event data. Column names should match the schema. unit: The time unit for coordinates. number_type: The number type for coordinates.

Returns: A new EventData containing the events.

from_dicts

EventData.from_dicts(rows, unit, number_type=NumberType.float)

Create EventData from a list of row dictionaries.

Coordinate values (instant, start, end, duration) are automatically converted to the internal struct format. Convenience defaults are applied so that callers can omit boilerplate fields:

  • id: Auto-generated as {event_type}:{counter} if missing, e.g. note:000001, rest:000001, beat:000001. When events are placed on a timeline, the timeline’s ID is prepended, yielding e.g. clt:1:note:000001.
  • temporal_type: Inferred from the keys present in the dict – "interval" when both start and end (or duration) are given, "instant" otherwise.

Args: rows: List of event dictionaries. At minimum each dict needs a coordinate (instant or start/end) and an event_type. All other fields have sensible defaults. unit: The time unit for coordinates. number_type: The number type for coordinates.

Returns: A new EventData containing the events.

Examples: >>> data = EventData.from_dicts([ … {“event_type”: “Beat”, “instant”: 0}, … {“event_type”: “Note”, “start”: 0, “end”: 0.5}, … ], unit=TimeUnit.seconds)

from_parquet

EventData.from_parquet(path)

Load EventData from a Parquet file.

Args: path: Path to the Parquet file.

Returns: An EventData loaded from the file.

Raises: ValueError: If the file lacks required TimeToAlign! metadata.

get_schema

EventData.get_schema(unit)

Get the canonical PyArrow schema for this EventData class.

This is a class-level method that returns the schema for a given unit, independent of any specific instance. Useful for constructing empty tables or validating incoming data.

Args: unit: The time unit for coordinate columns.

Returns: The complete schema including base and extra fields.

prefix_ids

EventData.prefix_ids(prefix)

Return a new EventData with all event IDs prefixed.

Prepends prefix: to every event ID. Used when events are placed onto a timeline so that IDs become globally unique and informative, e.g. clt1:note:000001.

If the IDs already start with the prefix, they are left unchanged.

Args: prefix: The prefix to prepend (without trailing colon).

Returns: A new EventData with prefixed IDs.

select

EventData.select(columns)

Select specific columns from the table.

Args: columns: List of column names to select.

Returns: A PyArrow table with only the selected columns.

summary

EventData.summary()

Get a comprehensive summary of the store.

Returns: Dict with count, temporal type counts, event type counts, coordinate range, unit, and number type.

to_dataframe

EventData.to_dataframe(format='pandas', *, raw=False, coordinates=False)

Convert to a DataFrame in the specified format.

Higher-level method that dispatches to format-specific implementations. Currently supports pandas; polars support can be added later.

Args: format: DataFrame format (“pandas”). Default “pandas”. raw: If True, return raw conversion with struct dicts for coordinates. coordinates: If True, wrap values in Coordinate objects with unit info.

Returns: A DataFrame in the requested format.

Raises: ValueError: If format is not supported.

Examples: >>> df = events.to_dataframe() # pandas DataFrame >>> df = events.to_dataframe(“pandas”, raw=True)

to_pandas

EventData.to_pandas(raw=False, coordinates=False)

Convert to a pandas DataFrame.

By default, coordinate columns (start, end, duration) are converted from the internal struct representation to the appropriate Python number type: - Fraction if numerator/denominator are present - float otherwise

Args: raw: If True, return the raw PyArrow-to-pandas conversion with struct dicts for coordinate columns. Default False shows cleaned numbers. coordinates: If True, wrap coordinate values in Coordinate objects that include unit information. Only effective when raw=False.

Returns: A pandas DataFrame with the event data.

Examples: >>> # Default: clean number format >>> df = events.to_pandas() >>> df.iloc[0][‘start’] # Fraction(1, 4) or 0.25

>>> # Raw struct dicts (for debugging)
>>> df = events.to_pandas(raw=True)
>>> df.iloc[0]['start']  # {'value': 0.25, 'numerator': 1, 'denominator': 4}

>>> # Coordinate objects with unit
>>> df = events.to_pandas(coordinates=True)
>>> df.iloc[0]['start']  # Coordinate(value=Fraction(1, 4), unit=quarters)

to_parquet

EventData.to_parquet(path)

Save the EventData to a Parquet file.

Args: path: Path to write the Parquet file.

where

EventData.where(expression)

Filter with a custom PyArrow compute expression.

Args: expression: A PyArrow compute expression.

Returns: A new EventData with filtered events.