Skip to content

Configuration Reference

EmbedRAG uses YAML configuration files with Pydantic validation. All fields have sensible defaults; a minimal config only needs node.role set.

Loading

embedrag writer --config config/writer_node.yaml
embedrag query  --config config/query_node.yaml

Without --config, the node starts with all defaults. The config determines node type via node.role.

Shared Sections

These sections appear in both writer and query configs.

node

node:
  role: query          # "query" or "writer" (required distinction)
  node_id: auto        # hostname if "auto", otherwise a custom string
  data_dir: /data/embedrag  # root for all local data (snapshots, db, builds)

object_store

object_store:
  provider: s3         # "s3", "tos", or "minio"
  endpoint: ""         # custom endpoint (required for minio/tos)
  bucket: embedrag-data
  prefix: snapshots/   # key prefix inside the bucket
  access_key_env: AWS_ACCESS_KEY_ID    # env var name for access key
  secret_key_env: AWS_SECRET_ACCESS_KEY # env var name for secret key
  region: us-east-1

Credentials are never stored in the YAML. The *_env fields name environment variables that are read at runtime.

server

server:
  host: 0.0.0.0
  port: 8000           # 8001 for writer
  workers: 1           # uvicorn workers (keep 1 for FAISS mmap)
  readiness_delay_seconds: 0    # delay before reporting ready
  shutdown_drain_seconds: 30    # max wait for in-flight queries

logging

logging:
  level: INFO          # DEBUG, INFO, WARNING, ERROR
  format: json         # "json" (structured) or "console" (human-readable)
  access_log: true     # log each HTTP request

metrics

metrics:
  enabled: true
  port: 9090           # Prometheus scrape port

Writer-Only Sections

db

db:
  path: ""             # auto-resolved to {data_dir}/db/writer.db if empty
  wal_autocheckpoint: 1000  # WAL checkpoint interval (pages)
  cache_size_mb: 64         # SQLite page cache

index_build

index_build:
  num_shards: 4              # FAISS index shards for parallelism
  ivf_nlist: 4096            # IVF cluster count (for datasets > 1K vectors)
  pq_m: 64                  # PQ subquantizer count (for datasets > 50K)
  train_sample_size: 500000  # max vectors for IVF training
  compression: zstd          # "zstd" or "none"
  compression_level: 3       # zstd level (1-19, higher = smaller + slower)

Index type is auto-selected based on dataset size: - < 1,000 vectors: IndexFlatIP - 1,000 - 50,000: IVF{nlist},Flat - > 50,000: IVF{nlist},PQ{pq_m}

Shared Between Writer and Query

embedding

embedding:
  service_url: http://embedding-service:8080/embed
  api_format: embedrag    # "embedrag" or "openai"
  api_key: ""            # Bearer token for OpenAI format
  model: ""              # model name for OpenAI format
  batch_size: 64         # vectors per API call
  timeout_seconds: 30    # per-batch timeout
  retry_count: 3         # retries on failure

On the writer, this is used during /ingest to embed chunk texts. On the query node, it powers the /search/text endpoint and the debug panel.

Two API formats are supported:

Format Request Body Response Body
embedrag (default) {"texts": ["..."]} {"embeddings": [[0.1, ...]]}
openai {"input": ["..."], "model": "..."} {"data": [{"embedding": [...], "index": 0}]}

OpenAI format example:

embedding:
  service_url: https://api.openai.com/v1/embeddings
  api_format: openai
  api_key: sk-your-api-key
  model: text-embedding-3-large
  batch_size: 64
  timeout_seconds: 30
  retry_count: 3

This works with the official OpenAI API, Azure OpenAI, and any OpenAI-compatible embedding provider (e.g., vLLM, Ollama, LiteLLM).

Multi-Space Embedding (multi-modal)

To use different embedding models for different modalities (text, image, audio, etc.), use the spaces dict:

embedding:
  spaces:
    text:
      service_url: http://text-embed:8080/v1/embeddings
      api_format: openai
      model: text-embedding-large
      batch_size: 64
    image:
      service_url: http://clip-embed:8081/v1/embeddings
      api_format: openai
      model: clip-vit-large
      batch_size: 32
      timeout_seconds: 60

Each space has its own EmbeddingSpaceConfig with the same fields as the top-level embedding block. When spaces is set, the top-level fields are ignored.

When spaces is omitted (the default), the top-level fields are treated as a single "text" space — fully backward compatible.

See docs/multi-modal.md for the full multi-modal architecture guide.

Query-Only Sections

snapshot

snapshot:
  bootstrap_version: latest     # "latest" or a specific version string
  poll_interval_seconds: 300    # how often to check for new snapshots
  download_concurrency: 4       # parallel file downloads
  download_timeout_seconds: 600 # per-download timeout
  disk_reserve_bytes: 5368709120  # 5GB reserve after download

sync

Controls background snapshot synchronization. When enabled, the query node periodically checks a remote source for new snapshot versions and hot-swaps automatically.

sync:
  enabled: false                  # set to true to activate background sync
  source: object_store            # "object_store" or "http"
  http_url: ""                    # base URL when source=http
  cron: ""                        # 5-field cron expression (e.g. "0 */2 * * *")
  poll_interval_seconds: 300      # interval between checks when cron is empty
  download_concurrency: 4         # parallel file downloads
  download_timeout_seconds: 600   # per-download timeout

Source types:

Source Description
object_store Uses the object_store config section (S3/MinIO/TOS)
http Downloads from sync.http_url (any HTTP/HTTPS static file server or CDN)

Scheduling behavior:

  • When cron is empty (default): Uses fixed interval polling every poll_interval_seconds
  • When cron is set: Uses cron-based scheduling (takes priority over poll_interval_seconds)
  • The croniter library is used for cron expression parsing

Manual trigger: POST /admin/sync triggers a one-off sync. Pass {"source_url": "https://..."} to sync from an arbitrary URL regardless of the configured source.

Status: GET /admin/sync/status returns last check time, last result, next scheduled check, and error counts.

HTTP source example:

sync:
  enabled: true
  source: http
  http_url: "https://my-cdn.example.com/embedrag/snapshots/"
  cron: "0 */2 * * *"

S3 source example:

sync:
  enabled: true
  source: object_store
  cron: "*/10 * * * *"
object_store:
  provider: s3
  bucket: my-rag-bucket
  prefix: snapshots/

index

index:
  num_shards: 4    # must match what the writer built
  nprobe: 32       # FAISS search probes (higher = slower + more accurate)
  mmap: true       # memory-map FAISS indexes (recommended for 4C16G)
search:
  default_top_k: 10           # default results if not specified in request
  max_top_k: 100              # hard cap on results per query
  enable_sparse: true          # enable FTS5 BM25 keyword search
  enable_hierarchy_expand: true # attach parent chunk text to results
  context_depth: 1             # how many levels up to expand

hotfix

hotfix:
  enabled: true
  max_vectors: 10000   # max chunks in the emergency write buffer

The hotfix buffer is an in-memory FAISS IndexFlatIP on the query node. Writes are merged into search results and cleared when the next snapshot loads.

Full Examples

See config/writer_node.yaml.example and config/query_node.yaml.example for complete annotated examples.

Environment Variable Resolution

Any config field ending in _env is treated as an environment variable name:

object_store:
  access_key_env: MY_CUSTOM_KEY  # reads os.environ["MY_CUSTOM_KEY"]

Next Steps