The Index and Query System provides the core querying architecture for building complex vector searches with filtering, dynamic parameters, and multi-space weighting. The system centers around the QueryDescriptor class which builds queries through a fluent interface, supporting both similarity search and hard filtering operations.
This system transforms vector embeddings from Spaces into executable search queries with configurable parameters and returns structured results with metadata.
The Index and Query System consists of three primary layers: the Index that combines multiple Space objects, the Query construction that builds queries through a fluent interface, and the QueryResult structure that provides search results with metadata.
Sources: notebook/feature/basic_building_blocks.ipynb93-143 notebook/recommendations_e_commerce.ipynb388-426 notebook/semantic_search_netflix_titles.ipynb238-250
An Index combines multiple Space objects into a unified searchable structure that can be queried through the QueryDescriptor interface. The index maintains references to constituent spaces and validates schema compatibility during query construction.
| Component | Purpose | Implementation |
|---|---|---|
Index._spaces | Stores constituent spaces | Internal space collection |
Index._schemas | Manages schema objects | Schema validation and access |
Index.has_schema() | Validates schema compatibility | Used by QueryDescriptorValidator |
Sources: framework/src/framework/dsl/query/query_descriptor.py125-131 framework/src/framework/dsl/query/query_descriptor.py574-576
The QueryDescriptor class provides a fluent API for building complex search operations with dynamic parameters. Queries are constructed through method chaining, where each method adds a specific QueryClause to the descriptor.
The system provides two primary search methods that can be combined and weighted:
| Method | Purpose | Parameters | Usage |
|---|---|---|---|
similar() | Search using user-provided input | space, param, weight | Text queries, user inputs |
with_vector() | Search using stored item vectors | schema, id_param, weight | Item-to-item recommendations |
The .similar() method transforms user input into query vectors through the specified space:
# From basic_building_blocks.ipynb query = sl.Query(paragraph_index).find(paragraph).similar(relevance_space, sl.Param("query_text")).select_all() # From netflix semantic search - weighted similarity movies_query = ( sl.Query(movie_index, weights={ description_space: sl.Param("description_weight"), genre_space: sl.Param("genre_weight"), recency_space: sl.Param("recency_weight") }) .find(movie_schema) .similar(description_space, sl.Param("description_query")) .similar(genre_space, sl.Param("genre_query")) .select_all() .limit(sl.Param("limit")) )
The .with_vector() method uses existing item vectors for search:
# From e-commerce recommendations user_query = ( sl.Query(product_index) .find(product_schema) .with_vector(product_schema, sl.Param("product_id")) .select_all() .limit(sl.Param("limit")) ) # Per-space weighting for with_vector weight_dict = {text_space: 0.8, category_space: 0.2} query = sl.Query(index).find(schema).with_vector(schema, "item_id", weight_dict).select_all()
The Param class enables dynamic query construction with runtime value binding:
| Component | Purpose | Implementation |
|---|---|---|
sl.Param("name") | Parameter placeholder | Runtime value substitution |
param.default | Fallback value | Used when no runtime value provided |
param.description | NLQ context | Enables natural language processing |
param.options | Allowed values | Constrains parameter inputs |
Each query method generates specific QueryClause objects:
| Clause Type | Generated By | Key Methods | Purpose |
|---|---|---|---|
SimilarFilterClause | similar() | from_param(), evaluate() | Vector similarity search |
LooksLikeFilterClause | with_vector() | from_param(), evaluate() | Item-based search |
HardFilterClause | filter() | from_param(), evaluate() | Exact field filtering |
SelectClause | select() | from_param(), evaluate() | Field selection |
LimitClause | limit() | from_param(), get_value() | Result count limit |
RadiusClause | radius() | from_param(), get_value() | Distance constraint |
Sources: notebook/feature/querying_options.ipynb125-131 docs/reference/dsl/query/query_descriptor.md181-198 docs/reference/dsl/query/query_clause.md316-366
Query execution involves parameter resolution through the QueryParamValueSetter class, which handles parameter binding, NLQ processing, and clause evaluation.
The QueryParamValueSetter coordinates the complete parameter resolution pipeline:
| Stage | Purpose | Implementation |
|---|---|---|
append_missing_mandatory_clauses() | Adds required clauses | LimitClause, RadiusClause, SelectClause |
validate_params() | Validates parameter names | Checks against clause parameters |
__alter_query_descriptor() | Applies parameter values | Updates clause parameters |
__calculate_nlq_params() | Processes NLQ parameters | Calls NLQHandler.fill_params() |
__calculate_default_params() | Sets default values | Uses Param.default values |
Each QueryClause implements parameter evaluation through key methods:
| Method | Purpose | Return Type |
|---|---|---|
evaluate() | Converts clause to executable form | Clause-specific type |
alter_param_values() | Updates parameter values | Modified clause instance |
get_param_value_by_param_name() | Retrieves parameter values | dict[str, PythonTypes] |
Sources: framework/src/framework/dsl/query/query_param_value_setter.py32-91 framework/src/framework/dsl/query/query_descriptor.py479-520 framework/src/framework/dsl/query/query_clause.py102-149
The system supports natural language query processing through the NLQClause and NLQHandler classes, which integrate with OpenAI for automated parameter extraction from natural language queries.
The query system implements hard filtering through HardFilterClause with comprehensive comparison operation support:
| Operation Type | Implementation | Usage Example |
|---|---|---|
| Equality | ComparisonOperation.EQUAL | schema.field == "value" |
| Inequality | ComparisonOperation.NOT_EQUAL | schema.field != "value" |
| Numeric comparison | ComparisonOperation.GREATER_THAN | schema.price > 100 |
| List operations | ComparisonOperation.CONTAINS | schema.categories.contains(["tag"]) |
| Combined filters | _Or[SchemaField] | `(field == "a") |
The system implements two distinct weighting mechanisms that control different aspects of similarity calculation:
Space weights determine the relative contribution of each vector space to the overall similarity score:
| Component | Purpose | Implementation |
|---|---|---|
space_weights(weight_dict) | Sets inter-space importance | Reweights normalized per-space vectors |
SpaceWeightClause | Individual space weight | evaluate() returns tuple[Space, float] |
get_param_value_for_unset_space_weights() | Default weight calculation | Uses DEFAULT_WEIGHT for unspecified spaces |
Clause weights control how individual clauses contribute to query vector formation within each space:
| Component | Purpose | Implementation |
|---|---|---|
similar(weight=0.7) | Controls single-space clause contribution | Affects query vector construction |
with_vector(weight=0.3) | Controls multi-space clause contribution | Applies across all spaces in index |
Per-space with_vector weights | Fine-grained space control | {space: weight} dictionary format |
Sources: notebook/feature/querying_options.ipynb1005-1020 notebook/feature/querying_options.ipynb494-501 docs/reference/dsl/query/query_descriptor.md199-227
The system implements comprehensive validation through multiple validator classes:
| Validator | Purpose | Key Methods |
|---|---|---|
QueryDescriptorValidator | Query structure validation | validate(), __validate_schema() |
QueryFilterValidator | Filter operation validation | validate_operation_is_supported() |
QueryFilterValidator | Parameter type validation | validate_operation_operand_type() |
Sources: framework/src/framework/dsl/query/query_descriptor.py278-477 framework/src/framework/dsl/query/query_clause.py504-560 framework/src/framework/dsl/query/query_clause.py173-219
Query results are structured through the QueryResult class and related result classes that provide comprehensive search result information and metadata.
The result system implements structured data through ImmutableBaseModel classes:
| Class | Purpose | Key Fields |
|---|---|---|
QueryResult | Top-level result container | entries, metadata, __str__() |
ResultEntry | Individual search result | id, fields, metadata |
ResultEntryMetadata | Per-result metadata | score, partial_scores, vector_parts |
ResultMetadata | Query-level metadata | schema_name, search_vector, search_params |
The system provides metadata access through the include_metadata() method:
| Component | Purpose | Implementation |
|---|---|---|
include_metadata() | Enables metadata collection | Sets QueryUserConfig.with_metadata = True |
QueryUserConfig | Query execution configuration | Controls metadata inclusion |
with_metadata property | Metadata inclusion flag | Used by query execution pipeline |
The system provides multiple ways to process and analyze query results:
# From basic building blocks - pandas conversion result = app.query(query, query_text="This is a happy person") df = sl.PandasConverter.to_pandas(result) print(df[['body', 'id', 'similarity_score']]) # From Netflix search - direct result iteration result = app.query(movie_query, **query_params) for entry in result.entries: print(f"Title: {entry.fields['title']}") print(f"Score: {entry.metadata.score}") print(f"Description: {entry.fields['description']}") # From e-commerce - top results processing result = app.query(cold_start_query, **params) top_products = result.entries[:5] for i, product in enumerate(top_products, 1): print(f"{i}. {product.fields['name']} - ${product.fields['price']}")
# Enable metadata collection for analysis query = query.include_metadata() # Analyze partial scores per space result = app.query(query, **params) for entry in result.entries[:3]: print(f"\nItem {entry.id}: total_score={entry.metadata.score:.3f}") if hasattr(entry.metadata, 'partial_scores'): for i, partial in enumerate(entry.metadata.partial_scores): print(f" Space {i} contribution: {partial:.3f}")
The QueryResult.__str__() method provides formatted result display for debugging:
# Format: "#<rank> id:<entry_id>, object:<entry_fields>" print(str(result)) # Shows ranked results with IDs and field values
Sources: notebook/feature/querying_options.ipynb1046-1053 docs/reference/dsl/query/result.md164-176 docs/reference/dsl/query/query_descriptor.md68-72
The framework supports several common search patterns demonstrated across applications:
# Static parameter binding
app.query(query, query_text="romantic comedy", min_rating=4.0)
# Dynamic weighting
app.query(query, text_weight=0.7, recency_weight=0.3, limit=10)
# Natural language processing
query.with_natural_query("Find recent action movies with high ratings")
Sources: notebook/semantic_search_netflix_titles.ipynb238 notebook/recommendations_e_commerce.ipynb426 notebook/rag_hr_knowledgebase.ipynb388
The Index and Query System provides the core search functionality that transforms vector embeddings into actionable search results, supporting both simple similarity searches and complex multi-faceted queries with dynamic parameter binding and comprehensive result metadata.