Menu

Index and Query System

Relevant source files

The Index and Query System provides the core querying architecture for building complex vector searches with filtering, dynamic parameters, and multi-space weighting. The system centers around the QueryDescriptor class which builds queries through a fluent interface, supporting both similarity search and hard filtering operations.

This system transforms vector embeddings from Spaces into executable search queries with configurable parameters and returns structured results with metadata.

Core Architecture Overview

The Index and Query System consists of three primary layers: the Index that combines multiple Space objects, the Query construction that builds queries through a fluent interface, and the QueryResult structure that provides search results with metadata.

Core System Flow

Sources: notebook/feature/basic_building_blocks.ipynb93-143 notebook/recommendations_e_commerce.ipynb388-426 notebook/semantic_search_netflix_titles.ipynb238-250

Index Creation and Space Combination

An Index combines multiple Space objects into a unified searchable structure that can be queried through the QueryDescriptor interface. The index maintains references to constituent spaces and validates schema compatibility during query construction.

Index Components

ComponentPurposeImplementation
Index._spacesStores constituent spacesInternal space collection
Index._schemasManages schema objectsSchema validation and access
Index.has_schema()Validates schema compatibilityUsed by QueryDescriptorValidator

Sources: framework/src/framework/dsl/query/query_descriptor.py125-131 framework/src/framework/dsl/query/query_descriptor.py574-576

Query Building and Parameter System

The QueryDescriptor class provides a fluent API for building complex search operations with dynamic parameters. Queries are constructed through method chaining, where each method adds a specific QueryClause to the descriptor.

Core Query Methods

The system provides two primary search methods that can be combined and weighted:

MethodPurposeParametersUsage
similar()Search using user-provided inputspace, param, weightText queries, user inputs
with_vector()Search using stored item vectorsschema, id_param, weightItem-to-item recommendations

Similar Clause Usage

The .similar() method transforms user input into query vectors through the specified space:

# From basic_building_blocks.ipynb
query = sl.Query(paragraph_index).find(paragraph).similar(relevance_space, sl.Param("query_text")).select_all()

# From netflix semantic search - weighted similarity
movies_query = (
    sl.Query(movie_index, weights={
        description_space: sl.Param("description_weight"),
        genre_space: sl.Param("genre_weight"), 
        recency_space: sl.Param("recency_weight")
    })
    .find(movie_schema)
    .similar(description_space, sl.Param("description_query"))
    .similar(genre_space, sl.Param("genre_query"))
    .select_all()
    .limit(sl.Param("limit"))
)

With Vector Clause Usage

The .with_vector() method uses existing item vectors for search:

# From e-commerce recommendations
user_query = (
    sl.Query(product_index)
    .find(product_schema)
    .with_vector(product_schema, sl.Param("product_id"))
    .select_all()
    .limit(sl.Param("limit"))
)

# Per-space weighting for with_vector
weight_dict = {text_space: 0.8, category_space: 0.2}
query = sl.Query(index).find(schema).with_vector(schema, "item_id", weight_dict).select_all()

Parameter System

The Param class enables dynamic query construction with runtime value binding:

ComponentPurposeImplementation
sl.Param("name")Parameter placeholderRuntime value substitution
param.defaultFallback valueUsed when no runtime value provided
param.descriptionNLQ contextEnables natural language processing
param.optionsAllowed valuesConstrains parameter inputs

Query Clause Implementation

Each query method generates specific QueryClause objects:

Clause TypeGenerated ByKey MethodsPurpose
SimilarFilterClausesimilar()from_param(), evaluate()Vector similarity search
LooksLikeFilterClausewith_vector()from_param(), evaluate()Item-based search
HardFilterClausefilter()from_param(), evaluate()Exact field filtering
SelectClauseselect()from_param(), evaluate()Field selection
LimitClauselimit()from_param(), get_value()Result count limit
RadiusClauseradius()from_param(), get_value()Distance constraint

Sources: notebook/feature/querying_options.ipynb125-131 docs/reference/dsl/query/query_descriptor.md181-198 docs/reference/dsl/query/query_clause.md316-366

Query Execution and Parameter Processing

Query execution involves parameter resolution through the QueryParamValueSetter class, which handles parameter binding, NLQ processing, and clause evaluation.

Parameter Resolution Process

The QueryParamValueSetter coordinates the complete parameter resolution pipeline:

StagePurposeImplementation
append_missing_mandatory_clauses()Adds required clausesLimitClause, RadiusClause, SelectClause
validate_params()Validates parameter namesChecks against clause parameters
__alter_query_descriptor()Applies parameter valuesUpdates clause parameters
__calculate_nlq_params()Processes NLQ parametersCalls NLQHandler.fill_params()
__calculate_default_params()Sets default valuesUses Param.default values

Clause Evaluation

Each QueryClause implements parameter evaluation through key methods:

MethodPurposeReturn Type
evaluate()Converts clause to executable formClause-specific type
alter_param_values()Updates parameter valuesModified clause instance
get_param_value_by_param_name()Retrieves parameter valuesdict[str, PythonTypes]

Sources: framework/src/framework/dsl/query/query_param_value_setter.py32-91 framework/src/framework/dsl/query/query_descriptor.py479-520 framework/src/framework/dsl/query/query_clause.py102-149

Advanced Query Features

Natural Language Query Processing

The system supports natural language query processing through the NLQClause and NLQHandler classes, which integrate with OpenAI for automated parameter extraction from natural language queries.

Hard Filtering and Comparison Operations

The query system implements hard filtering through HardFilterClause with comprehensive comparison operation support:

Operation TypeImplementationUsage Example
EqualityComparisonOperation.EQUALschema.field == "value"
InequalityComparisonOperation.NOT_EQUALschema.field != "value"
Numeric comparisonComparisonOperation.GREATER_THANschema.price > 100
List operationsComparisonOperation.CONTAINSschema.categories.contains(["tag"])
Combined filters_Or[SchemaField]`(field == "a")

Query Weighting System

The system implements two distinct weighting mechanisms that control different aspects of similarity calculation:

Space Weights vs Clause Weights

Space Weights Implementation

Space weights determine the relative contribution of each vector space to the overall similarity score:

ComponentPurposeImplementation
space_weights(weight_dict)Sets inter-space importanceReweights normalized per-space vectors
SpaceWeightClauseIndividual space weightevaluate() returns tuple[Space, float]
get_param_value_for_unset_space_weights()Default weight calculationUses DEFAULT_WEIGHT for unspecified spaces

Clause Weights Implementation

Clause weights control how individual clauses contribute to query vector formation within each space:

ComponentPurposeImplementation
similar(weight=0.7)Controls single-space clause contributionAffects query vector construction
with_vector(weight=0.3)Controls multi-space clause contributionApplies across all spaces in index
Per-space with_vector weightsFine-grained space control{space: weight} dictionary format

Weight Processing Order

  1. Clause weights influence query vector construction per space
  2. Per-space query vectors are normalized
  3. Space weights reweight normalized vectors for final aggregation
  4. Final similarity scores computed against aggregated query vector

Sources: notebook/feature/querying_options.ipynb1005-1020 notebook/feature/querying_options.ipynb494-501 docs/reference/dsl/query/query_descriptor.md199-227

Query Validation and Error Handling

The system implements comprehensive validation through multiple validator classes:

ValidatorPurposeKey Methods
QueryDescriptorValidatorQuery structure validationvalidate(), __validate_schema()
QueryFilterValidatorFilter operation validationvalidate_operation_is_supported()
QueryFilterValidatorParameter type validationvalidate_operation_operand_type()

Sources: framework/src/framework/dsl/query/query_descriptor.py278-477 framework/src/framework/dsl/query/query_clause.py504-560 framework/src/framework/dsl/query/query_clause.py173-219

Result Structure and Metadata

Query results are structured through the QueryResult class and related result classes that provide comprehensive search result information and metadata.

Result Classes Implementation

The result system implements structured data through ImmutableBaseModel classes:

ClassPurposeKey Fields
QueryResultTop-level result containerentries, metadata, __str__()
ResultEntryIndividual search resultid, fields, metadata
ResultEntryMetadataPer-result metadatascore, partial_scores, vector_parts
ResultMetadataQuery-level metadataschema_name, search_vector, search_params

Metadata Access and Processing

The system provides metadata access through the include_metadata() method:

ComponentPurposeImplementation
include_metadata()Enables metadata collectionSets QueryUserConfig.with_metadata = True
QueryUserConfigQuery execution configurationControls metadata inclusion
with_metadata propertyMetadata inclusion flagUsed by query execution pipeline

Result Processing and Conversion

The system provides multiple ways to process and analyze query results:

Result Access Patterns

Practical Result Usage Patterns

# From basic building blocks - pandas conversion
result = app.query(query, query_text="This is a happy person")
df = sl.PandasConverter.to_pandas(result)
print(df[['body', 'id', 'similarity_score']])

# From Netflix search - direct result iteration
result = app.query(movie_query, **query_params)
for entry in result.entries:
    print(f"Title: {entry.fields['title']}")
    print(f"Score: {entry.metadata.score}")
    print(f"Description: {entry.fields['description']}")

# From e-commerce - top results processing
result = app.query(cold_start_query, **params)
top_products = result.entries[:5]
for i, product in enumerate(top_products, 1):
    print(f"{i}. {product.fields['name']} - ${product.fields['price']}")

Metadata Analysis for Query Tuning

# Enable metadata collection for analysis
query = query.include_metadata()

# Analyze partial scores per space
result = app.query(query, **params)
for entry in result.entries[:3]:
    print(f"\nItem {entry.id}: total_score={entry.metadata.score:.3f}")
    if hasattr(entry.metadata, 'partial_scores'):
        for i, partial in enumerate(entry.metadata.partial_scores):
            print(f"  Space {i} contribution: {partial:.3f}")

Result String Representation

The QueryResult.__str__() method provides formatted result display for debugging:

# Format: "#<rank> id:<entry_id>, object:<entry_fields>"
print(str(result))  # Shows ranked results with IDs and field values

Sources: notebook/feature/querying_options.ipynb1046-1053 docs/reference/dsl/query/result.md164-176 docs/reference/dsl/query/query_descriptor.md68-72

Integration Patterns

Common Query Patterns

The framework supports several common search patterns demonstrated across applications:

  1. Semantic Search: Text similarity with optional filtering
  2. Recommendation Systems: Item-to-item similarity with behavioral weighting
  3. Faceted Search: Multiple filter combinations with relevance scoring
  4. Temporal Search: Recency-weighted results with content similarity

Parameter Binding Examples

# Static parameter binding
app.query(query, query_text="romantic comedy", min_rating=4.0)

# Dynamic weighting
app.query(query, text_weight=0.7, recency_weight=0.3, limit=10)

# Natural language processing
query.with_natural_query("Find recent action movies with high ratings")

Sources: notebook/semantic_search_netflix_titles.ipynb238 notebook/recommendations_e_commerce.ipynb426 notebook/rag_hr_knowledgebase.ipynb388

The Index and Query System provides the core search functionality that transforms vector embeddings into actionable search results, supporting both simple similarity searches and complex multi-faceted queries with dynamic parameter binding and comprehensive result metadata.