API Reference¶
This page provides the detailed API documentation for EmbedRAG, automatically generated from the docstrings in the source code.
CLI¶
The command-line interface for managing and running EmbedRAG nodes.
CLI entry point for embedrag writer/query node.
This module provides the primary command-line interface (CLI) for managing EmbedRAG nodes. It uses a sub-command pattern to provide different functionalities such as starting servers, downloading remote snapshots, and performing data migrations between versions.
The embedrag command is the single entry point for both operators (running
production nodes) and developers (migrating data or testing snapshots).
main()
¶
The main entry point for the embedrag command-line tool.
This function parses command-line arguments using argparse and dispatches
to the appropriate sub-command handler.
Available Sub-commands
writer: Starts the Writer Node server for ingestion and indexing. query: Starts the Query Node server for serving search traffic. migrate: Upgrades local data/manifests to the current version. pull: Downloads and extracts snapshots from a remote URL.
Source code in src/embedrag/cli.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
Configuration¶
EmbedRAG's configuration system, including writer and query node settings.
Configuration for writer and query nodes, loaded from YAML with env var support.
EmbedRAG uses Pydantic models for configuration, providing automatic validation, type safety, and the ability to override settings via environment variables. The configuration is hierarchically structured into logical groups such as server settings, object store credentials, and search parameters.
Any field ending in _env in the YAML configuration is treated as a pointer
to an environment variable, allowing secrets (like AWS keys) to be managed
outside of the version-controlled configuration files.
DBConfig
¶
Bases: BaseModel
Configuration for the writer's SQLite metadata database.
Attributes:
| Name | Type | Description |
|---|---|---|
path |
str
|
File path to the database. If empty, it's auto-resolved relative to NodeConfig.data_dir. |
wal_autocheckpoint |
int
|
SQLite WAL checkpoint interval in pages. |
cache_size_mb |
int
|
SQLite page cache size in megabytes. |
Source code in src/embedrag/config.py
250 251 252 253 254 255 256 257 258 259 260 261 262 | |
EmbeddingConfig
¶
Bases: BaseModel
Root configuration for all embedding services.
EmbedRAG supports multiple "spaces" (e.g., one for text, one for images).
If the spaces dictionary is empty, the top-level fields are used
to define a single default "text" space.
Attributes:
| Name | Type | Description |
|---|---|---|
service_url |
str
|
Default service URL. |
api_format |
Literal['embedrag', 'openai']
|
Default API format. |
api_key |
str
|
Default API key. |
model |
str
|
Default model name. |
batch_size |
int
|
Default batch size. |
timeout_seconds |
int
|
Default timeout. |
retry_count |
int
|
Default retry count. |
spaces |
dict[str, EmbeddingSpaceConfig]
|
Dictionary mapping space names to their specific configurations. |
Source code in src/embedrag/config.py
288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 | |
get_all_spaces()
¶
Get a list of all configured space names.
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of space identifiers. |
Source code in src/embedrag/config.py
341 342 343 344 345 346 347 348 349 | |
get_space_config(space='text')
¶
Get the configuration for a specific embedding space.
If the space is not found in the spaces dictionary, returns a
configuration built from the default top-level fields.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
space
|
str
|
The name of the space to retrieve. Defaults to "text". |
'text'
|
Returns:
| Name | Type | Description |
|---|---|---|
EmbeddingSpaceConfig |
EmbeddingSpaceConfig
|
The configuration for the requested space. |
Source code in src/embedrag/config.py
316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 | |
EmbeddingSpaceConfig
¶
Bases: BaseModel
Configuration for a specific embedding space/model.
Attributes:
| Name | Type | Description |
|---|---|---|
service_url |
str
|
The endpoint of the external embedding service. |
api_format |
Literal['embedrag', 'openai']
|
The API protocol used by the service. |
api_key |
str
|
Optional API key for the service. |
model |
str
|
The model identifier to send in the request. |
batch_size |
int
|
Max number of texts to send in a single batch. |
timeout_seconds |
int
|
Request timeout in seconds. |
retry_count |
int
|
Number of retry attempts on network failure. |
Source code in src/embedrag/config.py
265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 | |
HotfixConfig
¶
Bases: BaseModel
Configuration for real-time incremental updates (hotfixes).
Hotfixes allow inserting or deleting documents in the query node's memory between snapshot updates.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Whether to enable the hotfix buffer. |
max_vectors |
int
|
The maximum number of vectors to keep in the in-memory hotfix index. Defaults to 10,000. |
Source code in src/embedrag/config.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 | |
IndexBuildConfig
¶
Bases: BaseModel
Configuration for building FAISS indexes on the writer node.
Attributes:
| Name | Type | Description |
|---|---|---|
num_shards |
int
|
Number of shards to split the index into. |
ivf_nlist |
int
|
Number of IVF centroids to train. |
pq_m |
int
|
Number of PQ sub-vectors. |
train_sample_size |
int
|
Maximum number of vectors to use for training. |
compression |
Literal['zstd', 'none']
|
Compression algorithm for shards. |
compression_level |
int
|
Zstd compression level (1-22). |
Source code in src/embedrag/config.py
352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 | |
IndexConfig
¶
Bases: BaseModel
Configuration for FAISS index loading on the query node.
Attributes:
| Name | Type | Description |
|---|---|---|
num_shards |
int
|
The expected number of shards in the index. |
nprobe |
int
|
Number of IVF cells to visit during search. Higher values increase recall but also increase latency. Defaults to 32. |
mmap |
bool
|
Whether to use memory-mapped loading for FAISS indexes. Required for handling indexes larger than available RAM. Defaults to True. |
Source code in src/embedrag/config.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
LoggingConfig
¶
Bases: BaseModel
Structured logging configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
level |
str
|
Log level (DEBUG, INFO, WARNING, ERROR). |
format |
Literal['json', 'console']
|
The log output format. Use "json" for production/ELK stacks and "console" for local development. |
access_log |
bool
|
Whether to enable HTTP access logs. |
Source code in src/embedrag/config.py
198 199 200 201 202 203 204 205 206 207 208 209 210 | |
MetricsConfig
¶
Bases: BaseModel
Prometheus metrics configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Whether to start the Prometheus metrics exporter. |
port |
int
|
The port to expose the /metrics endpoint on. Defaults to 9090. |
Source code in src/embedrag/config.py
213 214 215 216 217 218 219 220 221 222 | |
NodeConfig
¶
Bases: BaseModel
Basic node identification and role configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
role |
Literal['query', 'writer']
|
The operating mode of the node. |
node_id |
str
|
A unique identifier for this node instance. Use "auto" to use the machine's hostname. |
data_dir |
str
|
The local directory used for storing databases, shards, and temporary build files. |
port |
int
|
Port override. If 0, uses ServerConfig.port. |
Source code in src/embedrag/config.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
ObjectStoreConfig
¶
Bases: BaseModel
Configuration for S3-compatible object storage providers.
This class manages connection details for services like AWS S3, Google Cloud Storage (in S3-compatibility mode), MinIO, and ByteDance TOS. It is used by the writer to upload snapshots and by the query node to download them.
Attributes:
| Name | Type | Description |
|---|---|---|
provider |
Literal['tos', 's3', 'minio']
|
The storage provider type. Defaults to "s3". |
endpoint |
str
|
Custom endpoint URL (e.g., "http://localhost:9000" for MinIO). |
bucket |
str
|
The name of the bucket where snapshots are stored. |
prefix |
str
|
A key prefix (folder) within the bucket for all EmbedRAG data. |
access_key_env |
str
|
Name of the environment variable holding the access key. |
secret_key_env |
str
|
Name of the environment variable holding the secret key. |
region |
str
|
The AWS region identifier (e.g., "us-east-1"). |
Source code in src/embedrag/config.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | |
QueryNodeConfig
¶
Bases: BaseModel
The root configuration for an EmbedRAG Query Node.
Attributes:
| Name | Type | Description |
|---|---|---|
node |
NodeConfig
|
Basic node settings. |
object_store |
ObjectStoreConfig
|
Snapshot source settings. |
snapshot |
SnapshotConfig
|
Local snapshot management. |
sync |
SyncConfig
|
Background update settings. |
index |
IndexConfig
|
FAISS loading parameters. |
search |
SearchConfig
|
Retrieval and fusion settings. |
hotfix |
HotfixConfig
|
Real-time update buffer. |
embedding |
EmbeddingConfig
|
Embedding service settings. |
server |
ServerConfig
|
FastAPI server settings. |
logging |
LoggingConfig
|
Logging settings. |
metrics |
MetricsConfig
|
Monitoring settings. |
Source code in src/embedrag/config.py
375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 | |
SearchConfig
¶
Bases: BaseModel
High-level retrieval and ranking configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
default_top_k |
int
|
Default number of results to return if not specified. |
max_top_k |
int
|
Absolute maximum results allowed per request. |
enable_sparse |
bool
|
Whether to include the keyword (BM25) search path. |
enable_hierarchy_expand |
bool
|
Whether to automatically fetch parent context for retrieved chunks. |
context_depth |
int
|
How many levels of parent context to traverse. |
dense_weight |
float
|
Multiplier for dense scores in RRF fusion. |
sparse_weight |
float
|
Multiplier for sparse scores in RRF fusion. |
Source code in src/embedrag/config.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
ServerConfig
¶
Bases: BaseModel
Web server (FastAPI/Uvicorn) configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
host |
str
|
The interface to bind the server to. |
port |
int
|
The port to listen on. |
workers |
int
|
Number of worker processes. Note: For FAISS mmap stability, 1 worker is recommended. |
readiness_delay_seconds |
float
|
Time to wait before reporting the node as ready during startup. |
shutdown_drain_seconds |
int
|
Maximum time to wait for active requests to complete during shutdown. |
Source code in src/embedrag/config.py
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
SnapshotConfig
¶
Bases: BaseModel
Configuration for local snapshot management on the query node.
Attributes:
| Name | Type | Description |
|---|---|---|
bootstrap_version |
str
|
The version ID to load on startup. Use "latest" to automatically pull the most recent version from the source. |
poll_interval_seconds |
int
|
How often to check for new snapshots when using basic polling. Defaults to 300. |
download_concurrency |
int
|
Max number of concurrent downloads for shards and data files. Defaults to 4. |
download_timeout_seconds |
int
|
Timeout for individual file downloads. Defaults to 600. |
disk_reserve_bytes |
int
|
Minimum free disk space (in bytes) to maintain on the data partition. Defaults to 5GB. |
Source code in src/embedrag/config.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |
SyncConfig
¶
Bases: BaseModel
Background snapshot synchronization configuration.
When enabled, the query node will periodically poll the source (either object storage or an HTTP server) for newer snapshot versions and automatically perform zero-downtime hot-swaps.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Whether to activate background synchronization. |
source |
Literal['object_store', 'http']
|
The metadata source type. |
http_url |
str
|
The base URL for fetching |
cron |
str
|
An optional 5-field cron expression for scheduling checks. |
poll_interval_seconds |
int
|
Interval between checks if cron is not set. |
download_concurrency |
int
|
Max concurrent downloads for new snapshots. |
download_timeout_seconds |
int
|
Timeout for sync downloads. |
Source code in src/embedrag/config.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
WriterNodeConfig
¶
Bases: BaseModel
The root configuration for an EmbedRAG Writer Node.
Attributes:
| Name | Type | Description |
|---|---|---|
node |
NodeConfig
|
Basic node settings. |
object_store |
ObjectStoreConfig
|
Snapshot upload settings. |
db |
DBConfig
|
Metadata database settings. |
embedding |
EmbeddingConfig
|
Embedding service settings. |
index_build |
IndexBuildConfig
|
FAISS build parameters. |
server |
ServerConfig
|
FastAPI server settings (defaults to port 8001). |
logging |
LoggingConfig
|
Logging settings. |
metrics |
MetricsConfig
|
Monitoring settings. |
Source code in src/embedrag/config.py
405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 | |
load_config(path)
¶
Load a configuration file and return the appropriate node config.
The function reads the node.role field to decide whether to return
a QueryNodeConfig or a WriterNodeConfig.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the YAML configuration file. |
required |
Returns:
| Type | Description |
|---|---|
QueryNodeConfig | WriterNodeConfig
|
QueryNodeConfig | WriterNodeConfig: The loaded and validated config. |
Source code in src/embedrag/config.py
436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 | |
load_query_config(path=None)
¶
Load configuration for a query node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the YAML file. If None, returns the default configuration. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
QueryNodeConfig |
QueryNodeConfig
|
The validated query node configuration. |
Source code in src/embedrag/config.py
457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 | |
load_writer_config(path=None)
¶
Load configuration for a writer node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the YAML file. If None, returns the default configuration. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
WriterNodeConfig |
WriterNodeConfig
|
The validated writer node configuration. |
Source code in src/embedrag/config.py
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 | |
Models¶
Data structures used for communication and internal storage.
API Models¶
Request and response models for the FastAPI endpoints.
Request/Response Pydantic models for both writer and query APIs.
This module defines the data transfer objects (DTOs) that form the API contract for EmbedRAG. These models ensure type safety, provide automatic validation, and generate high-quality OpenAPI/Swagger documentation. They are used for everything from document ingestion and index building to complex hybrid searches and administrative synchronization tasks.
ArchiveRequest
¶
Bases: BaseModel
Parameters for creating a portable snapshot archive.
Attributes:
| Name | Type | Description |
|---|---|---|
format |
str
|
The archive format (e.g., "tar.zst"). |
compression_level |
int
|
Zstd compression level (1-22). Defaults to 3. |
Source code in src/embedrag/models/api.py
129 130 131 132 133 134 135 136 137 138 139 | |
ArchiveResponse
¶
Bases: BaseModel
Location and metadata for a created snapshot archive.
Attributes:
| Name | Type | Description |
|---|---|---|
version |
str
|
The snapshot version that was archived. |
format |
str
|
The archive format used. |
path |
str
|
The relative path to the archive file on the local disk. |
size_bytes |
int
|
The size of the resulting archive file. |
build_time_seconds |
float
|
Time taken to compress and archive. |
Source code in src/embedrag/models/api.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | |
BuildRequest
¶
Bases: BaseModel
Parameters for triggering a new FAISS index build.
Attributes:
| Name | Type | Description |
|---|---|---|
force_full_rebuild |
bool
|
If True, ignores incremental state and rebuilds the entire index from all documents in the database. Defaults to False. |
Source code in src/embedrag/models/api.py
83 84 85 86 87 88 89 90 91 92 | |
BuildResponse
¶
Bases: BaseModel
Detailed statistics for a successfully completed index build.
Attributes:
| Name | Type | Description |
|---|---|---|
version |
str
|
The new unique version ID for the built generation. |
doc_count |
int
|
The total number of documents included in the index. |
chunk_count |
int
|
The total number of chunks across all documents. |
vector_count |
int
|
The total number of vectors embedded in FAISS. |
num_shards |
int
|
The number of physical index shards created. |
build_time_seconds |
float
|
The wall-clock time taken for the build. |
Source code in src/embedrag/models/api.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
BulkDeleteRequest
¶
Bases: BaseModel
Parameters for deleting documents in bulk.
Attributes:
| Name | Type | Description |
|---|---|---|
doc_ids |
list[str]
|
A specific list of document IDs to delete. |
doc_type |
str
|
If provided, deletes all documents of this type. |
Source code in src/embedrag/models/api.py
246 247 248 249 250 251 252 253 254 255 | |
BulkDeleteResponse
¶
Bases: BaseModel
Summary of a bulk deletion operation.
Attributes:
| Name | Type | Description |
|---|---|---|
deleted_docs |
int
|
The number of documents successfully removed. |
deleted_chunks |
int
|
The total number of chunks removed. |
Source code in src/embedrag/models/api.py
258 259 260 261 262 263 264 265 266 267 | |
ChunkResult
¶
Bases: BaseModel
A single retrieved search result (a chunk of a document).
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_id |
str
|
Unique identifier for the chunk. |
doc_id |
str
|
Identifier of the parent document. |
text |
str
|
The text content of this chunk. |
score |
float
|
The final relevance score (e.g., RRF or Cosine). |
level |
int
|
Hierarchical level (0 for base chunks). |
level_type |
str
|
Name of the level (e.g., "chunk", "section"). |
metadata |
dict
|
Metadata associated with the document/chunk. |
parent_text |
str
|
Content of the parent node if context expansion was performed. |
Source code in src/embedrag/models/api.py
370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 | |
ClusterRequest
¶
Bases: BaseModel
Run clustering over the loaded corpus (or a filtered subset).
Source code in src/embedrag/models/api.py
498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 | |
DebugDenseHit
¶
Bases: BaseModel
An intermediate hit from the dense retrieval path.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_id |
str
|
Chunk identifier. |
score |
float
|
Original dense similarity score. |
Source code in src/embedrag/models/api.py
696 697 698 699 700 701 702 703 704 705 | |
DebugFusedHit
¶
Bases: BaseModel
Detailed ranking state for a hit after RRF fusion.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_id |
str
|
Chunk identifier. |
rrf_score |
float
|
The final calculated RRF score. |
dense_score |
float
|
Score contribution from dense path. |
sparse_score |
float
|
Score contribution from sparse path. |
dense_rank |
int
|
Rank of this chunk in the dense list (-1 if not found). |
sparse_rank |
int
|
Rank of this chunk in the sparse list (-1 if not found). |
Source code in src/embedrag/models/api.py
720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 | |
DebugSearchRequest
¶
Bases: BaseModel
A detailed search request that returns internal ranking state.
Attributes:
| Name | Type | Description |
|---|---|---|
query_text |
str
|
The search query in plain text. |
top_k |
int
|
Number of results. |
filters |
dict
|
Metadata filters. |
expand_context |
bool
|
Whether to fetch adjacent chunks. |
context_depth |
int
|
Context window size. |
mode |
str
|
Search algorithm. |
space |
str
|
Target embedding space. |
Source code in src/embedrag/models/api.py
674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 | |
DebugSearchResponse
¶
Bases: BaseModel
Detailed diagnostic information for a search query.
Attributes:
| Name | Type | Description |
|---|---|---|
query_text |
str
|
The original search query. |
mode |
str
|
The search mode used. |
fts_query |
str
|
The actual FTS5 query string generated. |
embedding_time_ms |
float
|
Embedding latency. |
score_type |
str
|
The fusion algorithm used. |
dense_results |
list[DebugDenseHit]
|
The raw top hits from dense path. |
sparse_results |
list[DebugSparseHit]
|
The raw top hits from sparse path. |
fused_results |
list[DebugFusedHit]
|
Rank details after fusion. |
final_chunks |
list[ChunkResult]
|
The final hydrated results returned. |
timing |
DebugTiming
|
Detailed latency breakdown. |
config_snapshot |
dict
|
A snapshot of search-relevant config settings. |
Source code in src/embedrag/models/api.py
762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 | |
config_snapshot = Field(default_factory=dict)
class-attribute
instance-attribute
¶
Snapshot of relevant configuration settings at query time.
DebugSparseHit
¶
Bases: BaseModel
An intermediate hit from the sparse retrieval path.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_id |
str
|
Chunk identifier. |
score |
float
|
Original sparse relevance score. |
Source code in src/embedrag/models/api.py
708 709 710 711 712 713 714 715 716 717 | |
DebugTiming
¶
Bases: BaseModel
Fine-grained breakdown of search latency components.
Attributes:
| Name | Type | Description |
|---|---|---|
embedding_ms |
float
|
Latency of external embedding call. |
dense_ms |
float
|
Latency of FAISS search. |
sparse_ms |
float
|
Latency of SQLite FTS5 search. |
fusion_ms |
float
|
Latency of the fusion algorithm. |
fetch_ms |
float
|
Latency of fetching text from the database. |
expand_ms |
float
|
Latency of hierarchical context expansion. |
total_ms |
float
|
Total end-to-end request time. |
Source code in src/embedrag/models/api.py
740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 | |
DeleteDocumentResponse
¶
Bases: BaseModel
Response confirming document deletion.
Attributes:
| Name | Type | Description |
|---|---|---|
doc_id |
str
|
The identifier of the deleted document. |
chunks_deleted |
int
|
The number of related chunks removed from storage. |
Source code in src/embedrag/models/api.py
160 161 162 163 164 165 166 167 168 169 | |
DocumentDetailResponse
¶
Bases: BaseModel
Full detail view of a single document and its structure.
Attributes:
| Name | Type | Description |
|---|---|---|
doc_id |
str
|
Unique document identifier. |
title |
str
|
Document title. |
source |
str
|
Origin identifier. |
doc_type |
str
|
Category/type. |
metadata |
dict
|
Arbitrary key-value metadata. |
chunk_ids |
list[str]
|
List of all chunk IDs belonging to this document. |
Source code in src/embedrag/models/api.py
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 | |
DocumentInput
¶
Bases: BaseModel
Input structure for a single document during the ingestion process.
This model represents the raw data that will be chunked, embedded, and stored in the writer's database.
Attributes:
| Name | Type | Description |
|---|---|---|
doc_id |
str
|
A unique global identifier for the document. |
title |
str
|
The title of the document. Defaults to "". |
text |
str
|
The raw text content of the document to be indexed. |
source |
str
|
An identifier for the document's origin (e.g., a URL or file path). Defaults to "". |
doc_type |
str
|
A category for the document (e.g., "wiki", "manual"). Useful for filtering. Defaults to "". |
chunking |
str
|
The chunking strategy to use (e.g., "auto", "character", "none"). Defaults to "auto". |
chunk_size |
int
|
Override for the default number of characters per chunk. |
chunk_overlap |
int
|
Override for the default overlap between adjacent chunks. |
metadata |
dict
|
Arbitrary key-value pairs to store with the document for filtering and display. |
modality |
str
|
The data modality (e.g., "text", "image"). Defaults to "text". |
content_ref |
str
|
A reference to external raw content if the index only stores metadata/vectors. Defaults to "". |
Source code in src/embedrag/models/api.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | |
DocumentListResponse
¶
Bases: BaseModel
Paginated list of documents stored on the writer node.
Attributes:
| Name | Type | Description |
|---|---|---|
documents |
list[DocumentSummary]
|
The list of document summaries. |
total |
int
|
The total number of documents in the system. |
limit |
int
|
The pagination limit used. |
offset |
int
|
The pagination offset used. |
Source code in src/embedrag/models/api.py
210 211 212 213 214 215 216 217 218 219 220 221 222 223 | |
DocumentSummary
¶
Bases: BaseModel
A minimal representation of a document for list views.
Attributes:
| Name | Type | Description |
|---|---|---|
doc_id |
str
|
Unique document identifier. |
title |
str
|
Document title. |
source |
str
|
Origin identifier. |
doc_type |
str
|
Category/type. |
created_at |
str
|
ISO timestamp of ingestion. |
Source code in src/embedrag/models/api.py
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | |
HealthResponse
¶
Bases: BaseModel
Basic health check status.
Attributes:
| Name | Type | Description |
|---|---|---|
status |
str
|
The node status ("ok", "starting", etc.). |
node_type |
str
|
Either "query" or "writer". |
version |
str
|
The code version of the running node. |
Source code in src/embedrag/models/api.py
465 466 467 468 469 470 471 472 473 474 475 476 | |
HotfixAddRequest
¶
Bases: BaseModel
Request to add a new chunk to the query node's real-time buffer.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_id |
str
|
Unique identifier for the new chunk. |
doc_id |
str
|
Identifier for the parent document. |
text |
str
|
The chunk's text content. |
embedding |
list[float]
|
The pre-calculated embedding vector. |
metadata |
dict
|
Key-value metadata for the chunk. |
space |
str
|
The target embedding space. Defaults to "text". |
Source code in src/embedrag/models/api.py
419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 | |
HotfixDeleteRequest
¶
Bases: BaseModel
Request to logically delete chunks from the query node's search results.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_ids |
list[str]
|
List of identifiers to exclude from search results. |
space |
str
|
The target embedding space. Defaults to "text". |
Source code in src/embedrag/models/api.py
439 440 441 442 443 444 445 446 447 448 | |
HotfixResponse
¶
Bases: BaseModel
Confirmation of a real-time hotfix operation.
Attributes:
| Name | Type | Description |
|---|---|---|
operation |
str
|
The operation type ("add" or "delete"). |
affected |
int
|
Number of chunks modified in the buffer. |
buffer_size |
int
|
The new total size of the hotfix buffer for the space. |
Source code in src/embedrag/models/api.py
451 452 453 454 455 456 457 458 459 460 461 462 | |
IngestRequest
¶
Bases: BaseModel
Request payload for the bulk ingestion endpoint.
Attributes:
| Name | Type | Description |
|---|---|---|
documents |
list[DocumentInput]
|
A list of one or more documents to be processed and stored. |
Source code in src/embedrag/models/api.py
58 59 60 61 62 63 64 65 66 | |
IngestResponse
¶
Bases: BaseModel
Success response from the ingestion endpoint.
Attributes:
| Name | Type | Description |
|---|---|---|
ingested |
int
|
The number of documents successfully processed. |
chunk_count |
int
|
The total number of chunks generated and stored. |
doc_ids |
list[str]
|
The list of document IDs that were processed. |
Source code in src/embedrag/models/api.py
69 70 71 72 73 74 75 76 77 78 79 80 | |
MultiSpaceSearchRequest
¶
Bases: BaseModel
Advanced search across multiple model generations or modalities.
Attributes:
| Name | Type | Description |
|---|---|---|
queries |
list[SpaceQuery]
|
List of individual space queries to execute. |
top_k |
int
|
Number of final results. Defaults to 10. |
fusion |
str
|
The fusion algorithm (e.g., "rrf"). |
filters |
dict
|
Metadata filters applied across all spaces. |
expand_context |
bool
|
Whether to fetch adjacent chunks. |
context_depth |
int
|
Adjacent context window size. |
Source code in src/embedrag/models/api.py
588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 | |
MultiSpaceSearchResponse
¶
Bases: BaseModel
The unified results from a multi-space search operation.
Attributes:
| Name | Type | Description |
|---|---|---|
chunks |
list[ChunkResult]
|
The final ranked and fused hits. |
total |
int
|
The total number of hits found. |
per_space |
dict[str, int]
|
Count of matching documents found per space. |
total_time_ms |
float
|
End-to-end request latency. |
Source code in src/embedrag/models/api.py
608 609 610 611 612 613 614 615 616 617 618 619 620 621 | |
PublishResponse
¶
Bases: BaseModel
Metadata for a newly published snapshot generation.
Attributes:
| Name | Type | Description |
|---|---|---|
version |
str
|
The generation version that was published. |
upload_time_seconds |
float
|
Time taken to transfer files to storage. |
snapshot_size_bytes |
int
|
Total size of the published files. |
Source code in src/embedrag/models/api.py
115 116 117 118 119 120 121 122 123 124 125 126 | |
ReadinessResponse
¶
Bases: BaseModel
Detailed probe to determine if the node can serve traffic.
Attributes:
| Name | Type | Description |
|---|---|---|
ready |
bool
|
True if the node is fully initialized (e.g., index loaded). |
active_version |
str
|
The snapshot version currently being served. |
vector_count |
int
|
Total vectors available for search. |
doc_count |
int
|
Total documents available for search. |
Source code in src/embedrag/models/api.py
479 480 481 482 483 484 485 486 487 488 489 490 491 492 | |
RerankRequest
¶
Bases: BaseModel
Input for an external reranking service.
Attributes:
| Name | Type | Description |
|---|---|---|
query |
str
|
The search query text. |
texts |
list[str]
|
The candidate texts to be reranked. |
url |
str
|
Override for the reranker service URL. |
model |
str
|
The model name for reranking. |
Source code in src/embedrag/models/api.py
273 274 275 276 277 278 279 280 281 282 283 284 285 286 | |
RerankResponse
¶
Bases: BaseModel
Result of a cross-encoder reranking operation.
Attributes:
| Name | Type | Description |
|---|---|---|
results |
list[RerankResult]
|
Results sorted by descending score. |
elapsed_ms |
float
|
Time taken for the reranking operation. |
Source code in src/embedrag/models/api.py
301 302 303 304 305 306 307 308 309 310 | |
RerankResult
¶
Bases: BaseModel
A single item from a reranking operation.
Attributes:
| Name | Type | Description |
|---|---|---|
index |
int
|
The index of the text in the original input list. |
score |
float
|
The new relevance score assigned by the reranker. |
Source code in src/embedrag/models/api.py
289 290 291 292 293 294 295 296 297 298 | |
SearchRequest
¶
Bases: BaseModel
Standard vector search request parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
query_embedding |
list[float]
|
The query vector, pre-embedded in the appropriate space. |
query_text |
str
|
The raw query text. Required for hybrid and sparse search modes. |
top_k |
int
|
The number of results to return. Defaults to 10. |
filters |
dict
|
Metadata filters to apply (e.g., |
expand_context |
bool
|
If True, retrieves adjacent chunks to provide broader context for each hit. Defaults to True. |
context_depth |
int
|
Number of surrounding chunks to fetch per result. Defaults to 1. |
space |
str
|
The name of the embedding space to search. Defaults to "text". |
Source code in src/embedrag/models/api.py
316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 | |
SearchResponse
¶
Bases: BaseModel
The standard response containing search hits and performance timing.
Attributes:
| Name | Type | Description |
|---|---|---|
chunks |
list[ChunkResult]
|
The ranked list of matching chunks. |
total |
int
|
The total number of hits matching the query. |
score_type |
str
|
The type of scoring used (e.g., "rrf"). |
embedding_time_ms |
float
|
Time taken to embed the query. |
dense_time_ms |
float
|
Time taken for the dense FAISS search. |
sparse_time_ms |
float
|
Time taken for the FTS5 sparse search. |
fusion_time_ms |
float
|
Time taken to fuse ranked lists. |
total_time_ms |
float
|
Total end-to-end request latency. |
Source code in src/embedrag/models/api.py
395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 | |
SpaceQuery
¶
Bases: BaseModel
A sub-query targeting a specific embedding space.
Attributes:
| Name | Type | Description |
|---|---|---|
space |
str
|
The identifier of the space (e.g., "v1", "v2", "images"). |
query_embedding |
list[float]
|
The pre-calculated vector for this space. |
query_text |
str
|
The raw text for sparse path in this space. |
weight |
float
|
Contribution weight during fusion. Defaults to 1.0. |
Source code in src/embedrag/models/api.py
572 573 574 575 576 577 578 579 580 581 582 583 584 585 | |
StatsResponse
¶
Bases: BaseModel
Comprehensive health and size statistics for the writer node.
Attributes:
| Name | Type | Description |
|---|---|---|
doc_count |
int
|
Total documents in the SQLite database. |
chunk_count |
int
|
Total chunks in the SQLite database. |
embedding_spaces |
list[str]
|
Names of configured embedding spaces. |
vectors_per_space |
dict[str, int]
|
Count of vectors stored for each space. |
current_version |
str
|
The version ID of the last successful build. |
db_size_bytes |
int
|
Size of the SQLite database file on disk. |
Source code in src/embedrag/models/api.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
SyncStatusResponse
¶
Bases: BaseModel
Real-time monitoring information for the snapshot sync process.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Whether background sync is active. |
source |
str
|
The source type ("object_store" or "http"). |
cron |
str
|
The cron expression being used for scheduling. |
poll_interval_seconds |
int
|
Polling interval in use. |
last_check_at |
float
|
Unix timestamp of the last check for updates. |
last_sync_at |
float
|
Unix timestamp of the last successful index swap. |
last_result |
str
|
Outcome of the last sync check. |
last_version |
str
|
The snapshot version found during the last check. |
next_check_at |
float
|
Unix timestamp of the next scheduled sync check. |
consecutive_errors |
int
|
Number of failed sync attempts in a row. |
current_version |
str
|
The snapshot version currently being served. |
Source code in src/embedrag/models/api.py
627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 | |
SyncTriggerRequest
¶
Bases: BaseModel
Manual request to trigger a snapshot pull or swap.
Attributes:
| Name | Type | Description |
|---|---|---|
source_url |
str
|
A specific URL to pull a snapshot from, bypassing the configured global source. |
snapshot_dir |
str
|
A local directory path to swap to directly from the filesystem. |
Source code in src/embedrag/models/api.py
657 658 659 660 661 662 663 664 665 666 667 668 | |
TextSearchRequest
¶
Bases: BaseModel
Natural language search request where the node handles embedding.
Attributes:
| Name | Type | Description |
|---|---|---|
query_text |
str
|
The search query in plain text. |
top_k |
int
|
Number of results. Defaults to 10. |
filters |
dict
|
Metadata filters. |
expand_context |
bool
|
Whether to fetch adjacent chunks. |
context_depth |
int
|
Surrounding context window size. |
mode |
str
|
Search algorithm to use ("dense", "sparse", "hybrid"). Defaults to "hybrid". |
space |
str
|
The embedding space to target. Defaults to "text". |
Source code in src/embedrag/models/api.py
345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 | |
Chunking Models¶
Internal models for representing document chunks.
Core document and chunk data models.
This module defines the internal data structures used to represent documents and their hierarchical decompositions within the EmbedRAG system. These models are foundational for the chunking, embedding, and retrieval processes, allowing for a rich, tree-like representation of content.
ChunkNode
dataclass
¶
A single node in the document's hierarchical chunk tree.
EmbedRAG represents documents as trees of chunks. A ChunkNode can represent
anything from the entire document at the root level down to a single
sentence or fixed-size window at the leaf level. This hierarchical structure
enables features like "hierarchical expansion," where small chunks are
retrieved but their parent context (e.g., the containing paragraph or section)
is also returned to the user.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_id |
str
|
A globally unique identifier for this specific chunk. |
doc_id |
str
|
The identifier of the parent |
text |
str
|
The text content of this chunk. |
parent_chunk_id |
str
|
The ID of the parent node in the hierarchy. If None, this is a root node. |
level |
int
|
The depth in the tree. Conventionally: 0=document, 1=section, 2=paragraph, 3=leaf chunk. Defaults to 0. |
level_type |
str
|
A descriptive name for the level (e.g., 'chunk', 'section', 'document'). Defaults to 'chunk'. |
seq_in_parent |
int
|
The 0-indexed position of this chunk among its siblings under the same parent. Defaults to 0. |
metadata |
dict
|
Key-value pairs specific to this chunk. |
embedding |
list[float]
|
The pre-calculated vector embedding for this chunk's text. |
children |
list[ChunkNode]
|
A list of child |
Source code in src/embedrag/models/chunk.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
Document
dataclass
¶
Represents a complete, logical document ingested into the system.
A Document is the top-level unit of information. It contains the raw
content and global metadata that applies to all of its child chunks.
Attributes:
| Name | Type | Description |
|---|---|---|
doc_id |
str
|
A globally unique identifier for the document. |
title |
str
|
The human-readable title of the document. |
source |
str
|
The origin of the document (e.g., a URL, filepath, or database key). |
doc_type |
str
|
A category or classification for the document (e.g., 'technical_manual', 'news_article'). |
metadata |
dict
|
A dictionary of arbitrary key-value pairs stored with the document. |
Source code in src/embedrag/models/chunk.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | |
Manifest¶
Models for the snapshot manifest that coordinates index loading.
Manifest v3: self-describing snapshot metadata with per-space indexes and checksums.
This module defines the schema for the EmbedRAG snapshot manifest. The manifest is the central piece of metadata that coordinates how the query node loads an index, verifying checksums and mapping shards to the correct embedding spaces.
DeltaInfo
dataclass
¶
Information about the difference between this and a previous version.
Attributes:
| Name | Type | Description |
|---|---|---|
from_version |
str
|
The base version this delta is calculated from. |
unchanged_files |
list[str]
|
List of files that haven't changed. |
changed_files |
list[str]
|
List of files that were added or modified. |
delta_compressed_size |
int
|
Total size of the changed files when compressed. |
Source code in src/embedrag/models/manifest.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
FileEntry
dataclass
¶
Metadata for a single non-index file in the snapshot (e.g., SQLite DB).
Attributes:
| Name | Type | Description |
|---|---|---|
file |
str
|
The relative path to the uncompressed file. |
compressed_file |
str
|
The relative path to the compressed version. |
sha256_raw |
str
|
The SHA256 checksum of the uncompressed file. |
sha256_compressed |
str
|
The SHA256 checksum of the compressed file. |
raw_size |
int
|
The size of the uncompressed file in bytes. |
compressed_size |
int
|
The size of the compressed file in bytes. |
doc_count |
int
|
For databases, the total number of documents. |
chunk_count |
int
|
For databases, the total number of chunks. |
Source code in src/embedrag/models/manifest.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
IndexInfo
dataclass
¶
Metadata describing the FAISS index for a specific embedding space.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
str
|
The FAISS index factory string (e.g., 'IVF4096,PQ64'). |
dim |
int
|
The dimensionality of the vectors in this index. |
metric |
str
|
The distance metric used (e.g., 'IP' for Inner Product). |
nprobe_default |
int
|
The default nprobe value for search. |
num_shards |
int
|
The number of shards the index is split into. |
total_vectors |
int
|
The total number of vectors across all shards. |
shards |
list[ShardEntry]
|
A list of individual shard metadata. |
Source code in src/embedrag/models/manifest.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
Manifest
dataclass
¶
The root metadata object for an EmbedRAG snapshot.
Attributes:
| Name | Type | Description |
|---|---|---|
manifest_version |
int
|
The version of the manifest schema itself (currently 3). |
snapshot_version |
str
|
The unique version identifier for this snapshot. |
created_at |
str
|
ISO timestamp of when the snapshot was created. |
previous_version |
str
|
The version ID of the snapshot this one was built upon. |
schema_version |
int
|
The version of the underlying database schema. |
indexes |
dict[str, IndexInfo]
|
Mapping of space names to index metadata. |
db |
FileEntry
|
Metadata for the primary SQLite database. |
id_maps |
dict[str, FileEntry]
|
Mapping of space names to ID mapping files. |
total_raw_size |
int
|
Sum of all uncompressed file sizes. |
total_compressed_size |
int
|
Sum of all compressed file sizes. |
delta |
DeltaInfo
|
Differential information relative to a previous version. |
Source code in src/embedrag/models/manifest.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 | |
spaces
property
¶
list[str]: A list of all embedding space names defined in this manifest.
all_compressed_files()
¶
Return a list of all compressed file paths required for this snapshot.
This method is used by the syncer to determine which files need to be downloaded from the object store.
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of relative file paths. |
Source code in src/embedrag/models/manifest.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
get_file_entry_by_raw(raw_path)
¶
Look up a file or shard entry using its uncompressed (raw) path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_path
|
str
|
The relative path to the uncompressed file. |
required |
Returns:
| Type | Description |
|---|---|
FileEntry | ShardEntry | None
|
FileEntry | ShardEntry | None: The matching metadata entry, or None if not found. |
Source code in src/embedrag/models/manifest.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | |
ShardEntry
dataclass
¶
Represents a single FAISS index shard file within a snapshot.
Attributes:
| Name | Type | Description |
|---|---|---|
file |
str
|
The relative path to the uncompressed shard file. |
compressed_file |
str
|
The relative path to the compressed version of the shard. |
sha256_raw |
str
|
The SHA256 checksum of the uncompressed file. |
sha256_compressed |
str
|
The SHA256 checksum of the compressed file. |
raw_size |
int
|
The size of the uncompressed file in bytes. |
compressed_size |
int
|
The size of the compressed file in bytes. |
num_vectors |
int
|
The number of vectors contained in this shard. |
Source code in src/embedrag/models/manifest.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | |
Shared Utilities¶
Core utilities shared across the project.
Object Store¶
Abstraction layer for S3, TOS, and MinIO storage.
Abstraction over S3-compatible object storage (S3, TOS, MinIO).
This module provides a unified interface for interacting with various object storage providers. It is primarily used for uploading snapshots from the writer node and downloading them on the query node for synchronization.
ObjectStoreClient
¶
Thin wrapper around boto3 S3 client for snapshot upload/download.
This client abstracts away provider-specific configurations (like custom endpoints for MinIO or ByteDance TOS) while providing a simplified API for common file and JSON operations.
Source code in src/embedrag/shared/object_store.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | |
__init__(config)
¶
Initialize the ObjectStoreClient.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ObjectStoreConfig
|
The configuration object containing credentials, bucket name, and provider settings. |
required |
Source code in src/embedrag/shared/object_store.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
download_file(remote_path, local_path)
¶
Download a file from the object store to the local filesystem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path
|
str
|
The source path (key) in the bucket, relative to the configured prefix. |
required |
local_path
|
str | Path
|
The local destination path. Parent directories will be created if they don't exist. |
required |
Source code in src/embedrag/shared/object_store.py
70 71 72 73 74 75 76 77 78 79 80 81 82 | |
get_json(remote_path)
¶
Download a JSON file and deserialize it into a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path
|
str
|
The source path (key) in the bucket. |
required |
Returns:
| Type | Description |
|---|---|
dict | None
|
dict | None: The deserialized dictionary, or None if the key does not exist. |
Source code in src/embedrag/shared/object_store.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |
head_object(remote_path)
¶
Retrieve metadata for an object without downloading its content.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path
|
str
|
The path (key) in the bucket. |
required |
Returns:
| Type | Description |
|---|---|
dict | None
|
dict | None: The object metadata, or None if an error occurs (e.g., object not found). |
Source code in src/embedrag/shared/object_store.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
list_prefix(prefix)
¶
List all object keys under a specific prefix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prefix
|
str
|
The prefix to list objects from. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of full object keys (including the bucket prefix). |
Source code in src/embedrag/shared/object_store.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | |
put_json(remote_path, data)
¶
Serialize a dictionary to JSON and upload it to the object store.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
remote_path
|
str
|
The destination path (key) in the bucket. |
required |
data
|
dict
|
The dictionary to serialize and upload. |
required |
Source code in src/embedrag/shared/object_store.py
84 85 86 87 88 89 90 91 92 93 | |
upload_file(local_path, remote_path)
¶
Upload a local file to the object store.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_path
|
str | Path
|
The path to the file on the local filesystem. |
required |
remote_path
|
str
|
The destination path (key) in the bucket, relative to the configured prefix. |
required |
Source code in src/embedrag/shared/object_store.py
58 59 60 61 62 63 64 65 66 67 68 | |
Metrics¶
Prometheus metrics collection and reporting.
Prometheus metrics for both writer and query nodes.
This module defines the Prometheus metrics used to monitor the performance, health, and internal state of EmbedRAG nodes. These metrics can be scraped by a Prometheus server and used for building Grafana dashboards or setting up operational alerts.
Writer Node¶
Components specific to the writer node, which handles ingestion and indexing.
Ingestion & Build¶
The writer's FastAPI application and lifespan management.
Writer node FastAPI application.
This module defines the web application and runtime state management for the EmbedRAG Writer Node. The writer node is responsible for the "write" side of the system: ingesting documents, managing the persistent SQLite database, communicating with external embedding services, and building the final FAISS indexes that are published as snapshots.
WriterState
¶
Holds all runtime state for the writer node.
This class serves as a central registry for shared resources such as
database connection pools and embedding clients. It is initialized once
during the application startup and made available to all API routes
via the app.state object.
Attributes:
| Name | Type | Description |
|---|---|---|
config |
WriterNodeConfig
|
The validated configuration for this node. |
db |
WriterSQLitePool
|
The connection pool to the primary SQLite database. |
embedding_clients |
dict[str, EmbeddingClient]
|
A mapping of embedding space names to their respective API clients. |
build_dir |
Path
|
The local directory where new index versions are built before being published. |
current_version |
str
|
The version ID of the most recently built index. |
Source code in src/embedrag/writer/app.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
last_manifest
property
writable
¶
The manifest from the most recent successful build.
__init__(config)
¶
Initialize the writer state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
WriterNodeConfig
|
The writer node configuration. |
required |
Source code in src/embedrag/writer/app.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
close()
async
¶
Gracefully shut down all database connections and network clients.
This method should be called during the application's shutdown sequence.
Source code in src/embedrag/writer/app.py
93 94 95 96 97 98 99 100 | |
get_embedding_client(space='text')
¶
Retrieve the embedding client for a specific space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
space
|
str
|
The identifier of the embedding space. Defaults to "text". |
'text'
|
Returns:
| Name | Type | Description |
|---|---|---|
EmbeddingClient |
EmbeddingClient
|
The client configured for the requested space. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If no client is configured for the given space name. |
Source code in src/embedrag/writer/app.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
create_writer_app(config_path=None)
¶
Factory function to create and configure the Writer FastAPI application.
This function sets up the basic FastAPI app, attaches the lifespan manager, and registers all functional routes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
An optional file path to a YAML configuration file. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
FastAPI |
FastAPI
|
The fully configured web application instance. |
Source code in src/embedrag/writer/app.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | |
writer_lifespan(app)
async
¶
Manages the lifecycle of the Writer FastAPI application.
This context manager handles the startup and shutdown phases. On startup,
it loads the configuration, initializes the WriterState, and sets up
structured logging. On shutdown, it ensures that all resources (DB
connections, clients) are released properly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app
|
FastAPI
|
The FastAPI application instance. |
required |
Yields:
| Name | Type | Description |
|---|---|---|
None |
AsyncIterator[None]
|
Control is returned to the FastAPI framework to start serving. |
Source code in src/embedrag/writer/app.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
Index Builder¶
The logic for constructing FAISS indexes from document vectors.
FAISS IVF_PQ index builder: training, sharding, and serialization.
This module provides the logic for constructing sharded FAISS indexes from document embeddings. To achieve high performance and support datasets that exceed single-machine memory, EmbedRAG splits the vector space into multiple independent shards. This builder handles the automatic selection of the most appropriate index type (e.g., Flat, IVF, or PQ) based on the dataset size and performs deterministic sharding to ensure consistency.
IndexBuilder
¶
Builds sharded FAISS IVF_PQ indexes from chunk embeddings.
The builder orchestrates the entire index creation pipeline: 1. Distributing chunk IDs and embeddings into shards using consistent hashing. 2. Analyzing the dataset size to select an optimal FAISS index factory string. 3. Training IVF centroids and PQ sub-quantizers if necessary. 4. Building and serializing individual shards to disk. 5. Generating the corresponding ID mapping files.
Source code in src/embedrag/writer/index_builder.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 | |
__init__(config, dim=1024)
¶
Initialize the IndexBuilder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
IndexBuildConfig
|
Configuration parameters for the build process, including shard count and quantization settings. |
required |
dim
|
int
|
The dimensionality of the input vectors. Defaults to 1024. |
1024
|
Source code in src/embedrag/writer/index_builder.py
39 40 41 42 43 44 45 46 47 48 49 | |
build(chunk_ids, embeddings, output_dir, space='text')
¶
Build a complete sharded FAISS index for a specific embedding space.
This is the primary entry point for index construction. It takes a flat list of embeddings and their IDs, partitions them, builds the physical shard files, and returns the metadata required for the manifest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_ids
|
list[str]
|
A list of globally unique chunk identifiers. |
required |
embeddings
|
ndarray
|
A 2D float32 numpy array of shape (N, dim) containing the vectors to be indexed. |
required |
output_dir
|
str
|
The root directory where the built files will be stored. |
required |
space
|
str
|
The name of the embedding space (e.g., 'text'). Files will be stored in a sub-folder named after the space. Defaults to "text". |
'text'
|
Returns:
| Type | Description |
|---|---|
tuple[IndexInfo, str]
|
tuple[IndexInfo, str]: A tuple containing: - IndexInfo: A dataclass describing the built index (type, shards, etc.). - str: The filesystem path to the generated ID mapping file. |
Source code in src/embedrag/writer/index_builder.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
Storage¶
SQLite-based storage for document text and metadata.
SQLite WAL-mode read/write connection pool for the writer node.
This module provides a robust connection pool for the EmbedRAG writer node, supporting concurrent readers and a single serialized writer using SQLite's Write-Ahead Logging (WAL) mode. It handles schema initialization and efficient storage of documents and chunks.
WriterSQLitePool
¶
Read/write split connection pool with WAL mode for the writer node.
EmbedRAG uses a single-writer, multiple-reader pattern. This pool manages
a single persistent writer connection protected by an asyncio.Lock,
and a queue of read-only connections. WAL mode is used to allow readers
to proceed while a write is in progress.
Source code in src/embedrag/writer/storage.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 | |
__init__(db_path, max_readers=4, wal_autocheckpoint=1000, cache_size_mb=64)
¶
Initialize the connection pool.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db_path
|
str
|
The file path to the SQLite database. |
required |
max_readers
|
int
|
The number of read-only connections to maintain in the pool. Defaults to 4. |
4
|
wal_autocheckpoint
|
int
|
The WAL autocheckpoint interval in pages. Defaults to 1000. |
1000
|
cache_size_mb
|
int
|
The SQLite page cache size in megabytes. Defaults to 64. |
64
|
Source code in src/embedrag/writer/storage.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | |
checkpoint()
¶
Manually trigger a SQLite WAL checkpoint (TRUNCATE).
Source code in src/embedrag/writer/storage.py
138 139 140 | |
cleanup_before_upsert(doc_ids)
async
¶
Remove stale FTS and closure rows for docs about to be re-ingested.
Called before insert_chunks_batch on re-ingest so that FTS5 (which doesn't participate in CASCADE) stays consistent. Returns the number of stale chunk rows cleaned.
Source code in src/embedrag/writer/storage.py
223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 | |
close()
¶
Close all connections in the pool and trigger a final checkpoint.
Source code in src/embedrag/writer/storage.py
142 143 144 145 146 147 148 149 150 151 | |
delete_document(doc_id)
async
¶
Delete a single document and all its associated chunks.
Cascades through closure, FTS, and embedding tables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc_id
|
str
|
The document identifier to delete. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The number of chunk rows deleted. |
Source code in src/embedrag/writer/storage.py
447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 | |
delete_documents_bulk(doc_ids)
async
¶
Delete multiple documents and their associated chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc_ids
|
list[str]
|
The document identifiers to delete. |
required |
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
A tuple of |
Source code in src/embedrag/writer/storage.py
414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 | |
export_query_db(output_path)
¶
Export a lean read-only SQLite database for query nodes.
The exported database excludes the embedding column and includes only the tables required for serving queries (documents, chunks, closure, FTS, schema version).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_path
|
str
|
Filesystem path for the exported database. |
required |
Returns:
| Type | Description |
|---|---|
tuple[int, int]
|
A tuple of |
Source code in src/embedrag/writer/storage.py
481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 | |
get_all_chunks_with_embeddings(space='text')
async
¶
Read all (chunk_id, embedding) pairs for a given embedding space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
space
|
str
|
The embedding space name (default |
'text'
|
Returns:
| Type | Description |
|---|---|
list[tuple[str, ndarray]]
|
A list of |
Source code in src/embedrag/writer/storage.py
299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 | |
get_chunk_count()
async
¶
Return the total number of chunk rows in the database.
Source code in src/embedrag/writer/storage.py
325 326 327 328 329 | |
get_chunk_ids_for_doc(doc_id)
async
¶
Return all chunk IDs belonging to a document, ordered by seq_in_parent.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc_id
|
str
|
The document identifier. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
An ordered list of chunk IDs. |
Source code in src/embedrag/writer/storage.py
398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 | |
get_db_size_bytes()
¶
Return the on-disk size of the database file in bytes.
Source code in src/embedrag/writer/storage.py
443 444 445 | |
get_doc_count()
async
¶
Return the total number of document rows in the database.
Source code in src/embedrag/writer/storage.py
331 332 333 334 335 | |
get_doc_ids_by_type(doc_type)
async
¶
Return all document IDs matching a given document type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc_type
|
str
|
The document type to filter by. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
A list of matching document IDs. |
Source code in src/embedrag/writer/storage.py
430 431 432 433 434 435 436 437 438 439 440 441 | |
get_embedding_spaces()
async
¶
Return all distinct embedding space names in the database.
Returns:
| Type | Description |
|---|---|
list[str]
|
An alphabetically sorted list of space names. |
Source code in src/embedrag/writer/storage.py
315 316 317 318 319 320 321 322 323 | |
get_per_space_vector_counts()
async
¶
Return a mapping of embedding space to vector count.
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
A dict like |
Source code in src/embedrag/writer/storage.py
337 338 339 340 341 342 343 344 345 346 347 | |
insert_closure_batch(relations)
async
¶
Insert closure table entries: (ancestor_id, descendant_id, depth).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
relations
|
list[tuple[str, str, int]]
|
A list of |
required |
Source code in src/embedrag/writer/storage.py
287 288 289 290 291 292 293 294 295 296 297 | |
list_documents(limit=50, offset=0, doc_type=None, source=None)
async
¶
Return a paginated document list and total count matching optional filters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Maximum number of documents per page. |
50
|
offset
|
int
|
Number of documents to skip. |
0
|
doc_type
|
str | None
|
Optional document type filter. |
None
|
source
|
str | None
|
Optional document source filter. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[list[dict], int]
|
A tuple of |
Source code in src/embedrag/writer/storage.py
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 | |
read_conn()
async
¶
Acquire a read-only connection from the pool.
Yields:
| Type | Description |
|---|---|
AsyncIterator[Connection]
|
sqlite3.Connection: A read-only SQLite connection. |
Source code in src/embedrag/writer/storage.py
106 107 108 109 110 111 112 113 114 115 116 117 | |
write_conn()
async
¶
Acquire the exclusive writer connection.
Yields:
| Type | Description |
|---|---|
AsyncIterator[Connection]
|
sqlite3.Connection: The read-write SQLite connection. |
Source code in src/embedrag/writer/storage.py
119 120 121 122 123 124 125 126 127 | |
write_conn_sync()
¶
Acquire the exclusive writer connection in a synchronous context.
Yields:
| Type | Description |
|---|---|
Connection
|
sqlite3.Connection: The read-write SQLite connection. |
Source code in src/embedrag/writer/storage.py
129 130 131 132 133 134 135 136 | |
Query Node¶
Components specific to the query node, which handles search and retrieval.
Search & Retrieval¶
The query node's FastAPI application and retrieval logic.
Query node FastAPI application with lifespan for bootstrap and shutdown.
This module defines the web application and runtime state management for the EmbedRAG Query Node. The query node is responsible for the "read" side of the system: serving high-concurrency search requests, performing hybrid retrieval (dense + sparse), and automatically synchronizing with new index snapshots in the background.
QueryState
¶
Holds all runtime state for the query node.
This class serves as a central registry for shared resources such as the
GenerationManager (which handles hot-swapping indexes), embedding clients,
and the background synchronization task.
Attributes:
| Name | Type | Description |
|---|---|---|
config |
QueryNodeConfig
|
The validated configuration for this node. |
gen_manager |
GenerationManager
|
The component that manages the lifecycle of loaded FAISS shards and SQLite search connections. |
embedding_clients |
dict[str, EmbeddingClient]
|
A mapping of embedding space names to their respective API clients. |
syncer |
Any
|
The background task (if enabled) that polls for new snapshots. |
Source code in src/embedrag/query/app.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
__init__(config)
¶
Initialize the query state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
QueryNodeConfig
|
The query node configuration. |
required |
Source code in src/embedrag/query/app.py
44 45 46 47 48 49 50 51 52 53 54 55 56 | |
close()
async
¶
Gracefully shut down the generation manager and network clients.
This method ensures all index resources (mmap files, DB connections) are released and all pending network requests are cancelled.
Source code in src/embedrag/query/app.py
76 77 78 79 80 81 82 83 84 | |
get_embedding_client(space='text')
¶
Retrieve the embedding client for a specific space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
space
|
str
|
The identifier of the embedding space. Defaults to "text". |
'text'
|
Returns:
| Name | Type | Description |
|---|---|---|
EmbeddingClient |
EmbeddingClient
|
The client configured for the requested space. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If no client is configured for the given space name. |
Source code in src/embedrag/query/app.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
create_query_app(config_path=None)
¶
Factory function to create and configure the Query FastAPI application.
This function sets up the basic FastAPI app, attaches the lifespan manager, configures CORS, and registers all functional search and admin routes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
An optional file path to a YAML configuration file. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
FastAPI |
FastAPI
|
The fully configured web application instance. |
Source code in src/embedrag/query/app.py
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 | |
query_lifespan(app)
async
¶
Manages the complex lifecycle of the Query FastAPI application.
This context manager handles the critical startup sequence:
1. Loading the node configuration.
2. Initializing the QueryState.
3. Bootstrapping the node (downloading/loading the initial snapshot).
4. Starting the background synchronization process if configured.
On shutdown, it ensures that all search resources are released and the syncer task is stopped cleanly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
app
|
FastAPI
|
The FastAPI application instance. |
required |
Yields:
| Name | Type | Description |
|---|---|---|
None |
AsyncIterator[None]
|
Control is returned to the FastAPI framework to start serving. |
Raises:
| Type | Description |
|---|---|
BootstrapError
|
If the node fails to load its initial index snapshot. |
RuntimeError
|
If bootstrap completes without a valid index loaded. |
Source code in src/embedrag/query/app.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
Dense Retrieval¶
FAISS-based vector search implementation.
Dense retriever: parallel shard search with result merging.
This module provides the core vector search functionality for the query node. It manages a pool of FAISS shard workers, dispatches queries to them in parallel, and merges the partial results into a final ranked list.
DenseResult
dataclass
¶
A single hit from the dense vector search.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_id |
str
|
The unique identifier of the retrieved chunk. |
score |
float
|
The similarity score (usually inner product/dot product) between the query vector and the chunk's vector. Higher is more similar. |
Source code in src/embedrag/query/retrieval/dense.py
23 24 25 26 27 28 29 30 31 32 33 34 | |
DenseRetriever
¶
High-level dense retrieval interface.
Wraps the ShardManager to provide a clean search API, handling timing
and the filtering of deleted chunks (hotfixes) before returning the final results.
Source code in src/embedrag/query/retrieval/dense.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | |
__init__(shard_manager)
¶
Initialize the DenseRetriever.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
shard_manager
|
ShardManager
|
The active |
required |
Source code in src/embedrag/query/retrieval/dense.py
143 144 145 146 147 148 149 | |
search(query_vector, top_k, deleted_ids=None)
¶
Execute a dense search and filter out logically deleted chunks.
To accommodate filtering without returning fewer results than requested,
this method queries the underlying shards for top_k * 2 results, filters
out any chunk IDs present in deleted_ids, and then truncates to top_k.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_vector
|
ndarray
|
The query embedding vector. |
required |
top_k
|
int
|
The final number of desired results. |
required |
deleted_ids
|
set[str]
|
An optional set of |
None
|
Returns:
| Type | Description |
|---|---|
tuple[list[DenseResult], float]
|
tuple[list[DenseResult], float]: A tuple containing:
- The list of filtered |
Source code in src/embedrag/query/retrieval/dense.py
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | |
ShardManager
¶
Manages multiple FAISS shard workers and dispatches parallel searches.
The index is split into multiple shards during the build phase. This manager
holds references to the loaded ShardWorker instances and uses a thread pool
to execute searches across all shards concurrently, minimizing latency.
Source code in src/embedrag/query/retrieval/dense.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
num_shards
property
¶
int: The number of active shards being managed.
total_vectors
property
¶
int: The total number of vectors across all managed shards.
__init__(workers, id_mapper)
¶
Initialize the ShardManager.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
workers
|
list[ShardWorker]
|
A list of loaded |
required |
id_mapper
|
IDMapper
|
An |
required |
Source code in src/embedrag/query/retrieval/dense.py
45 46 47 48 49 50 51 52 53 54 55 | |
reconstruct_all()
¶
Reconstruct every stored vector with its chunk id.
Returns (chunk_ids, vectors). Exact for Flat/IVF-Flat shards,
approximate for IVF,PQ. Vectors whose ids cannot be resolved are
skipped.
Source code in src/embedrag/query/retrieval/dense.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
search(query_vector, top_k)
¶
Search all shards in parallel and merge the results.
This method dispatches the query to all workers via a thread pool. Once all
workers return their local top-k results, the lists are concatenated,
sorted globally by score, and truncated to the final top_k.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_vector
|
ndarray
|
A 1D or 2D float32 numpy array representing the query embedding. If 1D, it will be reshaped to (1, dim). |
required |
top_k
|
int
|
The maximum number of total results to return. |
required |
Returns:
| Type | Description |
|---|---|
list[DenseResult]
|
list[DenseResult]: A list of |
Source code in src/embedrag/query/retrieval/dense.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
shutdown()
¶
Shut down the thread pool and release all worker resources.
Source code in src/embedrag/query/retrieval/dense.py
129 130 131 132 133 | |
Sparse Retrieval¶
SQLite FTS5-based keyword search implementation.
Sparse retrieval via SQLite FTS5 trigram index.
This module implements a hybrid keyword search strategy designed for both space-delimited languages (e.g., English) and scriptio-continua languages (e.g., Chinese, Japanese). It uses a tiered approach combining SQLite's FTS5 trigram index for fast BM25-ranked matches and a LIKE-based fallback for short terms and bigrams.
Tiered retrieval strategy
- FTS5 MATCH (primary, fast, BM25-ranked): Uses trigram-based indexing. For scriptio-continua segments, the query is decomposed into sliding windows to handle punctuation breaks.
- LIKE fallback (secondary, slower, length-ranked): Activated for short segments (< 3 characters) and bigrams extracted from long segments to bridge punctuation boundaries.
Requires schema v3 which includes the text_norm column in the FTS table.
SparseResult
dataclass
¶
A single hit from the sparse keyword search.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_id |
str
|
The unique identifier of the retrieved chunk. |
score |
float
|
The relevance score. For FTS matches, this is the negative BM25 rank. For LIKE matches, it's a length-based heuristic. |
Source code in src/embedrag/query/retrieval/sparse.py
58 59 60 61 62 63 64 65 66 67 68 69 | |
SparseRetriever
¶
Keyword search via SQLite FTS5 trigram index with optional metadata filters.
This retriever handles the complexities of multilingual keyword search by splitting queries into FTS-eligible segments and short fallback segments.
Source code in src/embedrag/query/retrieval/sparse.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 | |
__init__(pool)
¶
Initialize the SparseRetriever.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pool
|
ReadOnlySQLitePool
|
The connection pool to the query node's read-only SQLite database. |
required |
Source code in src/embedrag/query/retrieval/sparse.py
79 80 81 82 83 84 85 86 | |
search(query_text, top_k, filters=None)
¶
Search for chunks using a combination of FTS5 and LIKE fallback.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_text
|
str
|
The raw keyword query string. |
required |
top_k
|
int
|
The maximum number of results to return. |
required |
filters
|
dict
|
Metadata filters to apply (e.g., |
None
|
Returns:
| Type | Description |
|---|---|
tuple[list[SparseResult], float]
|
tuple[list[SparseResult], float]: A tuple containing: - list[SparseResult]: The merged and ranked search results. - float: The elapsed time in milliseconds. |
Source code in src/embedrag/query/retrieval/sparse.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | |
Fusion¶
Reciprocal Rank Fusion (RRF) for combining dense and sparse results.
Reciprocal Rank Fusion (RRF) for merging dense and sparse results.
This module provides an implementation of the Reciprocal Rank Fusion algorithm, which is used to combine multiple ranked result lists into a single, unified ranking without requiring score normalization.
FusedResult
dataclass
¶
A single hit from the fused search results.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_id |
str
|
The unique identifier of the retrieved chunk. |
rrf_score |
float
|
The calculated RRF score for this chunk. |
dense_score |
float
|
The original score from the dense retriever. |
sparse_score |
float
|
The original score from the sparse retriever. |
Source code in src/embedrag/query/retrieval/fusion.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |
rrf_fuse(dense_results, sparse_results, top_k, k=60, dense_weight=1.0, sparse_weight=1.0)
¶
Merge dense and sparse results using Reciprocal Rank Fusion.
The RRF score for a document is calculated as
RRFscore(d) = sum( weight / (k + rank_i(d)) )
where rank_i(d) is the rank of document d in the i-th ranking list.
RRF is highly effective because it does not require the underlying scores (e.g., dot product for dense and BM25 for sparse) to be on the same scale.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dense_results
|
list[DenseResult]
|
Ranked results from the dense retriever. |
required |
sparse_results
|
list[SparseResult]
|
Ranked results from the sparse retriever. |
required |
top_k
|
int
|
The number of final fused results to return. |
required |
k
|
int
|
The smoothing constant used in the RRF formula. Defaults to 60, which is the value recommended in the original RRF paper. |
60
|
dense_weight
|
float
|
A multiplier for the dense ranking's contribution to the final score. Defaults to 1.0. |
1.0
|
sparse_weight
|
float
|
A multiplier for the sparse ranking's contribution to the final score. Defaults to 1.0. |
1.0
|
Returns:
| Type | Description |
|---|---|
list[FusedResult]
|
list[FusedResult]: A list of |
Source code in src/embedrag/query/retrieval/fusion.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | |
Clustering¶
The standalone, reusable clustering module that powers the embedrag cluster
CLI and the integrated query-node cluster API.
Pipeline & Library API¶
Orchestration and the public cluster_vectors / cluster_items entry points.
End-to-end clustering pipeline.
Wires the stages together: normalize -> (optional) reduce -> select params /
cluster -> evaluate -> explain -> label -> visualize, returning a
ClusterResult. The sync core (cluster_vectors) needs no network; text
vectorization and LLM labeling are layered on top by callers (CLI / HTTP).
apply_llm_labels(result, chat_url, model='', api_key='', language='auto')
async
¶
Replace keyword labels with LLM-generated topic names (in place).
Source code in src/embedrag/cluster/pipeline.py
102 103 104 105 106 107 108 109 110 111 112 | |
cluster_vectors(vectors, items, *, algorithm='auto', reduce='auto', n_components=0, auto=True, params=None, top_keywords=10, top_reps=5, ground_truth=None, run_id=None, space='text', source='')
¶
Cluster a matrix of vectors and return a fully explained result.
This is the pure, synchronous core (no embedding service, no LLM calls).
Source code in src/embedrag/cluster/pipeline.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
Sources¶
Loading vectors and items from files, .npy, a writer DB, or TF-IDF.
Vector + item acquisition for clustering.
Supports several co-equal sources so the tool works standalone or against the embedRAG vector store:
- Files:
.jsonl/.csvof{id, text, [embedding]}, or a.npymatrix of precomputed vectors. - Passed-in python objects: a list of texts, or a numpy/array of vectors.
- A writer SQLite DB: exact vectors from
chunk_embeddings+ text fromchunks(with optional filters). - A loaded query-node generation: vectors reconstructed from the FAISS index (exact for Flat/IVF-Flat, approximate for IVF,PQ).
When no vectors are available, callers fall back to embedding the text via an embedding service, or to a local TF-IDF representation (no service needed).
items_from_texts(texts, ids=None)
¶
Build items from a list of texts.
Source code in src/embedrag/cluster/source.py
120 121 122 123 124 | |
load_items_from_file(path, text_field='text', id_field='id', embedding_field='embedding')
¶
Load items (and optional inline embeddings) from a .jsonl or .csv file.
Returns (items, vectors_or_None). vectors is returned only when
every row carries an embedding under embedding_field.
Source code in src/embedrag/cluster/source.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
load_vectors_npy(path, items=None)
¶
Load a .npy matrix of vectors; synthesize ids if no items given.
Source code in src/embedrag/cluster/source.py
108 109 110 111 112 113 114 115 116 117 | |
read_writer_db(db_path, space='text', filters=None, limit=None)
¶
Read exact vectors + text from a writer DB's chunk_embeddings table.
filters may contain doc_type and/or doc_id to restrict the set.
Raises if the DB has no populated chunk_embeddings table.
Source code in src/embedrag/cluster/source.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | |
tfidf_vectors(texts, max_features=4096)
¶
Build a dense TF-IDF representation for the no-embedding-service path.
Uses char n-grams so it works for CJK text without word segmentation.
Source code in src/embedrag/cluster/source.py
186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
Algorithms¶
Pluggable clustering backends behind a single interface.
Pluggable clustering backends behind a single interface.
Each backend takes preprocessed (normalized, optionally reduced) vectors and
produces integer labels (-1 == noise) plus optional per-point membership
probabilities. Backends also declare which visualization panels make sense for
them, so the UI/exports can adapt per algorithm.
AgglomerativeBackend
¶
Bases: ClusterBackend
Hierarchical (Ward); supports a dendrogram and threshold/K cut.
Source code in src/embedrag/cluster/algorithms.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |
ClusterAssignment
dataclass
¶
Output of a clustering backend.
Source code in src/embedrag/cluster/algorithms.py
29 30 31 32 33 34 35 | |
ClusterBackend
¶
Base interface for a clustering algorithm.
Source code in src/embedrag/cluster/algorithms.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 | |
DBSCANBackend
¶
Bases: ClusterBackend
Density-based with explicit eps; good for the no-embedding/TF-IDF path.
Source code in src/embedrag/cluster/algorithms.py
78 79 80 81 82 83 84 85 86 87 88 89 90 | |
HDBSCANBackend
¶
Bases: ClusterBackend
Density-based, auto cluster count, native noise + membership probability.
Source code in src/embedrag/cluster/algorithms.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
KMeansBackend
¶
Bases: ClusterBackend
Centroid-based (spherical via normalized inputs). Scales via MiniBatch.
Source code in src/embedrag/cluster/algorithms.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
LeidenBackend
¶
Bases: ClusterBackend
Community detection on a FAISS kNN graph (optional deps).
Source code in src/embedrag/cluster/algorithms.py
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
available_algorithms()
¶
Names of all registered clustering backends.
Source code in src/embedrag/cluster/algorithms.py
188 189 190 | |
make_backend(name, **params)
¶
Instantiate a clustering backend by name.
Source code in src/embedrag/cluster/algorithms.py
193 194 195 196 197 | |
Evaluation¶
Internal/external metrics and the automatic parameter sweep.
Evaluation harness and automatic parameter selection.
This is the "honesty layer": every run reports internal quality metrics so the
result can be judged rather than blindly trusted. --auto sweeps the key
parameter for an algorithm and returns the full score curve, and the auto
algorithm compares backends on a composite score.
composite_score(metrics)
¶
Single objective for model selection: silhouette penalized by noise.
Source code in src/embedrag/cluster/evaluate.py
83 84 85 86 87 88 | |
external_metrics(labels, ground_truth)
¶
Compare predicted labels to ground-truth labels (when available).
Source code in src/embedrag/cluster/evaluate.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 | |
internal_metrics(vectors, labels, sample=5000)
¶
Compute clustering quality metrics that need no ground truth.
Silhouette / Davies-Bouldin / Calinski-Harabasz are computed on non-noise
points only. Returns a dict including n_clusters and noise_ratio.
Source code in src/embedrag/cluster/evaluate.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
select_params(vectors, algorithm, overrides=None, auto=True)
¶
Pick the best parameters for an algorithm (or compare backends for 'auto').
Returns (chosen_algorithm, assignment, chosen_params, sweep) where
sweep is the list of evaluated candidates (the score curve).
Source code in src/embedrag/cluster/evaluate.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
Explainability¶
c-TF-IDF keywords, medoids, cohesion/separation, and attribution.
Cluster explainability: keywords, representatives, stats, attribution.
Produces the human-facing description of each cluster: - distinctive keywords via class-based TF-IDF (c-TF-IDF), - medoid example texts (points nearest the centroid), - cohesion (mean cosine to centroid) and separation (nearest other centroid), - an inter-cluster similarity matrix, - and per-point "why this cluster" attribution.
compute_centroids(vectors, labels)
¶
Mean (then renormalized) vector per non-noise cluster.
Source code in src/embedrag/cluster/explain.py
26 27 28 29 30 31 32 33 34 35 36 37 38 | |
ctfidf_keywords(texts, labels, top_n=10)
¶
Distinctive keywords per cluster via class-based TF-IDF (c-TF-IDF).
Each cluster is treated as a single document. Char bigrams are used for CJK-heavy corpora (no word segmentation needed), words otherwise.
Source code in src/embedrag/cluster/explain.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | |
explain(vectors, labels, probabilities, items, top_keywords=10, top_reps=5)
¶
Build per-cluster info, per-member attribution, and similarity matrix.
Source code in src/embedrag/cluster/explain.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | |
Persistence¶
Side-file storage for cluster runs.
Side-file persistence for cluster runs.
A cluster run is stored as a single JSON file under
<data_dir>/cluster_runs/<run_id>.json. This deliberately avoids any
snapshot DB schema change: runs can be created, listed, read, and deleted
independently of the index lifecycle.
delete_run(data_dir, run_id)
¶
Delete a run file; returns True if it existed.
Source code in src/embedrag/cluster/store.py
53 54 55 56 57 58 59 | |
list_runs(data_dir)
¶
List run summaries (without the full member/projection payload).
Source code in src/embedrag/cluster/store.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | |
load_run(data_dir, run_id)
¶
Load a single run by id, or None if missing.
Source code in src/embedrag/cluster/store.py
44 45 46 47 48 49 50 | |
make_run_id(prefix='run')
¶
Generate a sortable, unique run id.
Source code in src/embedrag/cluster/store.py
30 31 32 | |
runs_dir(data_dir)
¶
Return (and create) the cluster-runs directory under data_dir.
Source code in src/embedrag/cluster/store.py
23 24 25 26 27 | |
save_run(data_dir, result)
¶
Persist a cluster run; returns the file path.
Source code in src/embedrag/cluster/store.py
35 36 37 38 39 40 41 | |