Metadata-Version: 2.1
Name: next_gen_ui_llama_stack_embedded
Version: 0.2.1
Summary: Embedded Llama-Stack server inference for Next Gen UI Agent
Home-page: https://github.com/RedHat-UX/next-gen-ui-agent
License: Apache-2.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: <3.14,>=3.12
Description-Content-Type: text/markdown
Requires-Dist: aiosqlite
Requires-Dist: blobfile
Requires-Dist: fastapi
Requires-Dist: fire
Requires-Dist: litellm
Requires-Dist: llama-stack-client>=0.2.15
Requires-Dist: llama-stack==0.2.20
Requires-Dist: next-gen-ui-agent==0.2.1
Requires-Dist: next-gen-ui-llama-stack==0.2.1
Requires-Dist: ollama
Requires-Dist: opentelemetry-exporter-otlp
Requires-Dist: opentelemetry-instrumentation
Requires-Dist: opentelemetry-sdk
Requires-Dist: overrides
Requires-Dist: requests
Requires-Dist: uvicorn

# Next Gen UI Embedded Llama Stack Server Inference

This module is part of the [Next Gen UI Agent project](https://github.com/RedHat-UX/next-gen-ui-agent).

[![Module Category](https://img.shields.io/badge/Module%20Category-AI%20Framework-darkred)](https://github.com/RedHat-UX/next-gen-ui-agent)
[![Module Status](https://img.shields.io/badge/Module%20Status-Supported-green)](https://github.com/RedHat-UX/next-gen-ui-agent)

Support for LLM inference using [Embedded Llama Stack server](https://github.com/meta-llama/llama-stack).

## Provides

* `LlamaStackEmbeddedAsyncAgentInference` to use LLM hosted in embedded Llama Stack server, started from provided [Llama Stack config file](https://llama-stack.readthedocs.io/en/latest/distributions/configuration.html).
* `init_inference_from_env` method to init Llama Stack inference (remote or embedded) based on environment variables
    * `INFERENCE_MODEL` - LLM model to use - inference is not created if undefined (default value can be provided as method parameter)
    * `LLAMA_STACK_HOST` - remote LlamaStack host - if defined then it is used with LLAMA_STACK_PORT to create remote LlamaStack inference
    * `LLAMA_STACK_PORT` - remote LlamaStack port - optional, defaults to `5001`
    * `LLAMA_STACK_URL` - remote LlamaStack url - if `LLAMA_STACK_HOST` is not defined, but this url is defined, then it 
      is used to create remote LlamaStack inference
    * `LLAMA_STACK_CONFIG_FILE` - path to embedded LlamaStack server config file, used only if no remote LlamaStack is configured 
      (default value can be provided as method parameter)
* `examples/llamastack-ollama.yaml` example of the LlamaStack config file to use LLM from [Ollama](https://ollama.com/) running on 
  localhost (with model also taken from `INFERENCE_MODEL` env variable).

## Installation

```sh
pip install -U next_gen_ui_llama_stack_embedded
```

## Example

### Instantiation of `LlamaStackEmbeddedAsyncAgentInference`

```py

from next_gen_ui_llama_stack_embedded import LlamaStackEmbeddedAsyncAgentInference

config_file = "example/llamastack-ollama.yaml"
model = "llama3.2:latest"

inference = LlamaStackEmbeddedAsyncAgentInference(config_file, model)

# init UI Agent using inference

```

### Inference initialization from environment variables

```py
from next_gen_ui_llama_stack_embedded import init_inference_from_env

# default model used if env variable is not defined
INFERENCE_MODEL_DEFAULT = "granite3.3:2b"

inference = init_inference_from_env(default_model=INFERENCE_MODEL_DEFAULT)

if (inference):
    # init UI Agent using inference
else:
    print("Inference not initialized because not configured in env variables")

```

## Links

* [Documentation](https://redhat-ux.github.io/next-gen-ui-agent/guide/ai_apps_binding/llamastack_embedded/)
* [Source Codes](https://github.com/RedHat-UX/next-gen-ui-agent/tree/main/libs/next_gen_ui_llama_stack_embedded)
* [Contributing](https://redhat-ux.github.io/next-gen-ui-agent/development/contributing/)
