Menu

Schema System

Relevant source files

The Schema System provides typed data structure definitions that serve as contracts for data flowing through the Superlinked framework. Schemas define the structure, field types, and relationships of entities processed by spaces, indices, and queries.

For information about how schemas integrate with vector embeddings, see Space System. For query definition and execution, see Index and Query System.

Schema Definition

Schemas are defined as Python classes that inherit from sl.Schema or use the @sl.schema decorator. Each schema represents a data entity with typed fields that can be used throughout the framework pipeline.

Class-based Definition

class ParagraphSchema(sl.Schema):
        body: sl.String
        created_at: sl.Timestamp
        usefulness: sl.Float
        id: sl.IdField

Decorator-based Definition

@sl.schema
    class ParagraphSchema:
        body: sl.String
        id: sl.IdField

Schema Definition Architecture

Sources: notebook/feature/basic_building_blocks.ipynb50-62 notebook/rag_hr_knowledgebase.ipynb324-328

Field Types

The schema system supports several built-in field types for different data categories:

Field TypePurposeUsage Example
sl.StringText data and categorical valuesdescription: sl.String
sl.TimestampTemporal data and datescreated_at: sl.Timestamp
sl.FloatFloating-point numerical valuesprice: sl.Float
sl.IdFieldUnique entity identifiersid: sl.IdField

Field Type Implementation

Sources: notebook/rag_hr_knowledgebase.ipynb324-328 notebook/feature/basic_building_blocks.ipynb50-52

Schema Instantiation and Usage

Schemas are instantiated to create pipeline objects that reference specific data entities and their fields.

Pipeline Integration

# Schema instantiation
    paragraph = ParagraphSchema()
    
    # Field access in spaces
    relevance_space = sl.TextSimilaritySpace(
        text=paragraph.body, 
        model="sentence-transformers/all-mpnet-base-v2"
    )
    
    # Field access in data parsing
    parser = sl.DataFrameParser(
        paragraph, 
        mapping={
            paragraph.id: "index", 
            paragraph.created_at: "creation_date"
        }
    )

Schema Pipeline Flow

Sources: notebook/feature/basic_building_blocks.ipynb77-78 notebook/rag_hr_knowledgebase.ipynb338-376

Data Parsing and Mapping

Schemas integrate with data sources through parsing mechanisms that map external data formats to schema fields.

DataFrame Integration

# Map DataFrame columns to schema fields
    paragraph_parser = sl.DataFrameParser(
        paragraph, 
        mapping={
            paragraph.id: "index",
            paragraph.created_at: "creation_date"
        }
    )
    
    # Source configuration
    source = sl.InMemorySource(paragraph, parser=paragraph_parser)

Field Mapping Patterns

Sources: notebook/rag_hr_knowledgebase.ipynb372-373

Event Schemas

The schema system supports event-based data modeling for behavioral analytics and dynamic updates.

Event Schema Definition

Event schemas capture behavioral data and interactions that can modify entity embeddings over time through the Event Effects System.

For detailed information about event schemas and behavioral modeling, see Event Schemas. For event effects implementation, see Event Effects System.

Sources: notebook/recommendations_e_commerce.ipynb16-40

Schema Validation and Type Safety

The schema system provides compile-time and runtime type safety through Python type annotations and framework validation.

Type Checking Integration

# Type-safe field access
    class ProductSchema(sl.Schema):
        name: sl.String
        price: sl.Float
        id: sl.IdField
    
    product = ProductSchema()
    # product.name is recognized as sl.String by type checkers
    # product.price is recognized as sl.Float by type checkers

Schema Type System

Sources: notebook/feature/basic_building_blocks.ipynb50-62