Embeddings

Embeddings are optional but important for semantic retrieval. Zaxy supports a deterministic local hash provider and an OpenAI-compatible hosted provider. Both produce fixed-size vectors attached to extracted entities and query text, then Neo4j vector search participates in result fusion.

The hash provider is designed for tests, offline development, and deterministic behavior. It does not provide high-quality semantic meaning, but it allows vector code paths to run without network access or secrets. This is valuable for CI and for local contributors who only need to verify mechanics.

For the local-first profile:

zaxy local-profile
zaxy local-profile --output .env.local
zaxy local-profile --check

The generated profile uses EMBEDDING_PROVIDER=hash, RERANKER_PROVIDER=lexical, and NEO4J_AUTO_START=true. It intentionally omits hosted API key variables so the default local path stays offline and deterministic.

The hosted provider is selected with:

EMBEDDING_ENABLED=true
EMBEDDING_PROVIDER=openai
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY_FILE=secrets/openai_api_key.txt

OPENAI_BASE_URL can point at any OpenAI-compatible embeddings endpoint. Keep EMBEDDING_DIMENSION aligned with the model and the Neo4j vector index. If the dimension changes, rebuild the vector index and replay affected events so entity vectors are regenerated consistently.

Secrets should be supplied through OPENAI_API_KEY_FILE in production. Direct OPENAI_API_KEY is convenient for local testing but can leak through process inspection or shell history. See security.md and configuration.md.

Embedding generation happens after extraction. The provider receives structured entity text, not arbitrary raw payload dumps. This reduces the chance of embedding secrets and improves result quality by keeping the vector input close to the graph fact being stored.

If hosted embedding calls fail, treat the event log as the recovery source. Fix configuration, replay the Eventloom log, and rebuild graph projections. Do not manually patch vectors in Neo4j unless you are doing a controlled maintenance operation documented in operations.md.

Related pages: retrieval.md, graph-schema.md, deployment.md, and README.md. The public site summary is site/index.html.