ZMS LLM API utility module
This module provides an abstract interface for Large Language Model providers. All providers follow the OpenAI /v1/chat/completions API schema for consistency.
Supported providers:
- OpenAI (gpt-4, gpt-3.5-turbo, etc.)
- Ollama (local deployment)
- RAG with Qdrant vector database
Configuration properties:
- llm.provider: 'openai', 'ollama', or 'rag' (default: 'openai')
- llm.api.key: API key for OpenAI (if provider is 'openai')
- llm.api.model: Model name (default: 'gpt-4o-mini' for OpenAI, 'llama2' for Ollama)
- llm.api.endpoint: Custom endpoint URL
- llm.ollama.host: Ollama host (default: 'http://localhost:11434')
- llm.qdrant.host: Qdrant host (default: 'http://localhost:6333')
- llm.qdrant.collection: Qdrant collection name (default: 'zms_docs')
- llm.embedding.model: SentenceTransformer model (default: 'all-MiniLM-L6-v2')
- llm.rag.top_k: Number of documents to retrieve (default: '3')
- llm.rag.score_threshold: Minimum similarity score (0.0-1.0, default: '0.0')
- llm.temperature: LLM temperature 0.0-2.0 (default: '0.7', RAG: 0.1 recommended)
- llm.top_p: Nucleus sampling 0.0-1.0 (default: '0.9')
- llm.max_tokens: Maximum tokens to generate (optional)
- llm.num_ctx: Context window size (default: '4096')
- llm.store: Enable storage for 'responses' API (default: False)
- llm.timeout: Timeout for LLM responses in seconds (default: '120')
- llm.rag.timeout: Timeout for RAG retrieval in seconds (default: '10')
Response format (OpenAI /v1/chat/completions compatible):
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-4o-mini",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Response text"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}
For backwards compatibility, a convenience property 'message' is also provided at the top level containing the first choice's message.
Requirements for RAG:
- pip install sentence-transformers
- pip install qdrant-client
License: GNU General Public License v2 or later, Organization: ZMS Publishing
| Class | |
Abstract base class for LLM providers |
| Class | |
Ollama local LLM provider (normalized to OpenAI format) |
| Class | |
OpenAI API provider (v1/chat/completions compatible) |
| Class | |
RAG (Retrieval-Augmented Generation) provider using Qdrant and Ollama |
| Function | chat |
Send messages to the configured LLM provider and get a response. |
| Function | get |
Fetch the list of locally available models from the configured Ollama server. |
| Function | get |
Get information about the currently configured LLM provider. |
| Variable | security |
Undocumented |
| Function | _generate |
Generate a unique request ID for tracking |
| Function | _get |
Factory function to get the appropriate LLM provider based on configuration. |
| Function | _normalize |
Normalize provider-specific responses to OpenAI /v1/chat/completions format. |
| Constant | _EMBEDDING |
Undocumented |
Send messages to the configured LLM provider and get a response.
This is the main entry point for LLM interactions in ZMS. All responses follow the OpenAI /v1/chat/completions format.
| Parameters | |
| context:object | ZMS context object |
| messages:list | str | List of message dicts [{"role": "user", "content": "..."}] or a string for backwards compatibility |
| temperature | Sampling temperature 0.0-2.0 (optional) |
| top | Nucleus sampling 0.0-1.0 (optional) |
| max | Maximum tokens to generate (optional) |
| store | Enable storage for responses API (optional) |
| metadata | Metadata for responses API (optional) |
| Returns | |
dict Success format:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-4o-mini",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Response text"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
},
"message": {...} # Backwards compatibility: first choice's message
}
Error format:
{
"error": {
"code": "ERROR_CODE",
"message": "error description"
}
}
| Response in OpenAI /v1/chat/completions format |
| Notes | |
| Configuration - Set llm.provider to one of: 'openai', 'ollama', 'rag' | |
For OpenAI:
| |
For Ollama:
| |
For RAG:
| |
Fetch the list of locally available models from the configured Ollama server.
Returns a dict with 'models' (list of name strings) on success, or 'error' on failure. This is used by the Config tab to populate the model dropdown for Ollama/RAG providers.
Get information about the currently configured LLM provider.
Args: context: ZMS context object
Returns: dict: Provider information including type, model, and endpoint
Factory function to get the appropriate LLM provider based on configuration.
Args: context: ZMS context object
Returns: LLMProvider: An instance of the configured provider
Normalize provider-specific responses to OpenAI /v1/chat/completions format.
This ensures all providers return a consistent schema compatible with the OpenAI API and the upcoming 'responses' schema.
Args:
- response_data: Raw response from provider
- provider: Provider name ('openai', 'ollama', 'rag')
- model: Model name used
- original_message: Original user message
Returns: dict: Normalized response in OpenAI format