jeevesagent.architecture.reflexion
==================================

.. py:module:: jeevesagent.architecture.reflexion

.. autoapi-nested-parse::

   Reflexion: verbal reinforcement learning via memory.

   Shinn et al. 2023 — `Reflexion: Language Agents with Verbal
   Reinforcement Learning <https://arxiv.org/abs/2303.11366>`_. After
   each attempt, an evaluator scores the output. Below threshold, a
   reflector produces a single-sentence "lesson" — written advice the
   agent can read on its next attempt.

   Lesson storage modes
   --------------------

   Two storage modes for the persisted lessons:

   * **Monotonic block (legacy default).** Every lesson is appended
     to ``memory.<lessons_block_name>`` and shown to the agent on
     every subsequent attempt. Simple but bloats context as lessons
     accumulate.
   * **Selective recall (recommended).** Pass ``lesson_store=`` a
     :class:`VectorStore`. Lessons are stored as embedded chunks;
     before each attempt, only the **top-k most relevant lessons**
     for the current task are retrieved and surfaced. Avoids
     context bloat and keeps tutorial advice scoped to where it
     applies. Pair with :class:`InMemoryVectorStore` for in-process,
     or :class:`PostgresVectorStore` for cross-session learning.

   Pattern
   -------

   For each attempt up to ``max_attempts``:

   1. **Recall** (selective-recall mode only): query ``lesson_store``
      with the current prompt; write the top-k results into the
      working memory block for this attempt.
   2. **Run base architecture** (default
      :class:`~jeevesagent.architecture.ReAct`).
   3. **Evaluate.** A text-only model call scores the output (0-1).
   4. **Threshold check.** If ``score >= threshold``, terminate.
   5. **Max-attempts check.** If we've hit the cap, terminate.
   6. **Reflect.** A text-only model call produces a single sentence
      identifying what went wrong.
   7. **Persist.** Append (legacy) or add to ``lesson_store``
      (selective-recall) — keyed by the failing prompt so future
      recall can find it.
   8. **Reset.** Clear ``session.messages`` so the base re-seeds its
      context. Cumulative usage and turn count carry across attempts.

   Strengths
   ---------
   * **Cross-session learning** when paired with a persistent
     memory backend (legacy) or a persistent vector store
     (selective recall).
   * **Wraps any base** that reads ``memory.working()``.
   * **Cheap**: 1 evaluator + 1 reflector call per failed attempt.

   Weaknesses
   ----------
   * **Same-model evaluation.** Self-grading is biased; the score
     may not match human judgment.
   * **Score parsing is best-effort.** Falls back to 0.0 on parse
     failure (treated as a failed attempt).



Attributes
----------

.. autoapisummary::

   jeevesagent.architecture.reflexion.DEFAULT_EVALUATOR_PROMPT
   jeevesagent.architecture.reflexion.DEFAULT_REFLECTOR_PROMPT


Classes
-------

.. autoapisummary::

   jeevesagent.architecture.reflexion.Reflexion


Module Contents
---------------

.. py:class:: Reflexion(*, base: jeevesagent.architecture.base.Architecture | None = None, max_attempts: int = 3, threshold: float = 0.8, evaluator_prompt: str | None = None, reflector_prompt: str | None = None, lessons_block_name: str = 'reflexion_lessons', lesson_store: jeevesagent.vectorstore.base.VectorStore | None = None, top_k_lessons: int = 5)

   Wrap a base architecture with evaluator + reflector + lesson
   memory.

   See module docstring for the full mechanism. Constructor
   parameters:

   * ``base`` — architecture to retry. Default :class:`ReAct`.
   * ``max_attempts`` — cap on retries within a single run.
     Default 3.
   * ``threshold`` — minimum evaluator score to terminate as
     success. Default 0.8.
   * ``evaluator_prompt`` / ``reflector_prompt`` — override the
     default system prompts.
   * ``lessons_block_name`` — memory working-block name for
     persisted lessons. Default ``"reflexion_lessons"``. Multiple
     Reflexion-wrapped agents in the same memory should pick
     distinct names.
   * ``lesson_store`` — optional :class:`VectorStore` enabling
     selective recall. When set, lessons are stored as embedded
     chunks and only the top-``top_k_lessons`` most relevant
     lessons are surfaced on each attempt (instead of all past
     lessons). Avoids context bloat as lessons accumulate.
   * ``top_k_lessons`` — how many lessons to recall per attempt
     (selective-recall mode only). Default 5.


   .. py:method:: declared_workers() -> dict[str, jeevesagent.agent.api.Agent]


   .. py:method:: run(session: jeevesagent.architecture.base.AgentSession, deps: jeevesagent.architecture.base.Dependencies, prompt: str) -> collections.abc.AsyncIterator[jeevesagent.core.types.Event]
      :async:



   .. py:attribute:: name
      :value: 'reflexion'



.. py:data:: DEFAULT_EVALUATOR_PROMPT
   :value: Multiline-String

   .. raw:: html

      <details><summary>Show Value</summary>

   .. code-block:: python

      """You are an evaluator scoring an agent's output against a task.
      
      Score the output from 0.0 (completely failed) to 1.0 (fully
      successful). Be calibrated:
      - 1.0 = task is fully solved with no issues
      - 0.7-0.9 = mostly correct, minor gaps
      - 0.4-0.6 = partially correct, significant gaps
      - 0.0-0.3 = wrong or missing key components
      
      Output exactly one line in this format:
      score: <number between 0 and 1>
      
      Then on subsequent lines, briefly justify the score. The first line
      must match the score format exactly so it can be parsed."""

   .. raw:: html

      </details>



.. py:data:: DEFAULT_REFLECTOR_PROMPT
   :value: Multiline-String

   .. raw:: html

      <details><summary>Show Value</summary>

   .. code-block:: python

      """You are a reflector that produces lessons for an agent that just
      fell short on a task.
      
      Read the original task and the agent's failed attempt. Produce ONE
      sentence describing the most important thing the agent should do
      differently next time. Be specific and concrete:
      - Bad: "Be more careful."
      - Good: "When asked to extract dates, always normalize to ISO 8601
        format before returning."
      
      Output ONLY the single sentence — no preamble, no list."""

   .. raw:: html

      </details>



