Metadata-Version: 2.4
Name: splunk-otel-genai-evals-deepeval
Version: 0.1.6
Summary: OpenTelemetry GenAI Utils
Project-URL: Homepage, https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/util/opentelemetry-util-genai
Project-URL: Repository, https://github.com/open-telemetry/opentelemetry-python-contrib
Author-email: OpenTelemetry Authors <cncf-opentelemetry-contributors@lists.cncf.io>
License-Expression: Apache-2.0
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.9
Requires-Dist: deepeval<3.8.0,>=3.3.9
Requires-Dist: openai>=1.0.0
Requires-Dist: splunk-otel-util-genai-evals>=0.1.4
Requires-Dist: splunk-otel-util-genai>=0.1.4
Provides-Extra: fsspec
Requires-Dist: fsspec>=2025.9.0; extra == 'fsspec'
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == 'test'
Description-Content-Type: text/x-rst

OpenTelemetry GenAI Utilities Evals for Deepeval (opentelemetry-util-genai-evals-deepeval)
==========================================================================================

This package plugs the `deepeval <https://github.com/confident-ai/deepeval>`_ metrics
suite into the OpenTelemetry GenAI evaluation pipeline. When it is installed a
``Deepeval`` evaluator is registered automatically and, unless explicitly disabled,
is executed for every LLM/agent invocation alongside the builtin metrics.

Installation
------------

Install the evaluator (and its runtime dependencies) from PyPI:

.. code-block:: bash

    pip install opentelemetry-util-genai-evals-deepeval

The command pulls in ``opentelemetry-util-genai``, ``deepeval`` and ``openai``
automatically so the evaluator is ready to use right after installation.

Requirements
------------

* ``opentelemetry-util-genai`` together with ``deepeval`` and ``openai`` –
  these are installed automatically when you install this package.
* An LLM provider supported by Deepeval. By default the evaluator uses OpenAI's
  ``gpt-4o-mini`` model because it offers the best balance of latency and cost
  for judge workloads right now, so make sure ``OPENAI_API_KEY`` is available.
  To override the model, set ``DEEPEVAL_EVALUATION_MODEL`` (or ``DEEPEVAL_MODEL`` /
  ``OPENAI_MODEL``) to a different deployment along with the corresponding
  provider credentials.
* (Optional) ``DEEPEVAL_API_KEY`` if your Deepeval account requires it.

Configuration
-------------

Use ``OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS`` to select the metrics that
should run. Leaving the variable unset enables every registered evaluator with its
default metric set. Examples:

* ``OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS=Deepeval`` – run the default
  Deepeval bundle (Bias, Toxicity, Answer Relevancy, Faithfulness).
* ``OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS=Deepeval(LLMInvocation(bias(threshold=0.75)))`` –
  override the Bias threshold for LLM invocations and skip the remaining metrics.
* ``OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS=none`` – disable the evaluator entirely.

Results are emitted through the standard GenAI evaluation emitters (events,
metrics, spans). Each metric includes helper attributes such as
``deepeval.success``, ``deepeval.threshold`` and any evaluation model metadata
returned by Deepeval. Metrics that cannot run because required inputs are missing
(for example Faithfulness without a ``retrieval_context``) are marked as
``label="skipped"`` and carry a ``deepeval.error`` attribute so you can wire the
necessary data or disable that metric explicitly.
