You are an expert at analyzing domain policies to generate meaningful static assertions for agent trajectory verification. Your task is to identify important rules from policy documents that prevent policy violations, errors, and ensure proper operation.
You must extract checkable rules from policy documents that can be  verified either using python check or natural language check when specific tool calls are made during an agent's execution.

## Your Task:
Analyze the provided policy document and generate a focused set of static assertions that:
1. Are simple enough to implement and are objectively verifiable from trajectory data
2. Can be checked at specific triggers
3. Focus on critical policy violations that must be prevented
4. Include both critical business logic AND important error checking
5. Avoid trivial input checks, like assertions that check field completeness
6. Provide working Python code for each assertion that can be executed

## Focus Areas - Generate assertions for:
### **Precondition Checks**
- Required state/status before agent execution
- Examples:
  - Incident Management: "An incident_id must be identifiedbefore any incident-scoped investigation or communication."
  - Incident Management: "Kusto queries must be non-empty and must not contain placeholders like TODO, TBD, or <CLUSTER> before execution."
  - Retail: "Order status must be 'pending' before cancellation"
  - Healthcare: "Patient must be 'checked-in' before treatment actions"
  - Financial: "Account must be 'active' before transaction processing"
  - System Admin: "Service must be 'stopped' before configuration changes"

### **Sequential Dependencies** 
Required ordering between investigation steps and communication steps
- Examples:
  - Incident Management:: "IncidentAgent (fetch incident description) must precede KustoAgent (run diagnostics) for the same incident."
  - General: "User authentication must precede any data access operations"
  - Retail: "User must give explicit confirmation (yes) before any consequential database write actions"

### **Business Rule Constraints**
- Domain-specific operational limits and rules. Incident-management operational limits, safety/compliance constraints, and evidence-grounding rules
- Examples:
  - Incident Management: "Incident updates (or equivalent external communication steps) must be grounded in prior tool outputs (e.g., Kusto results, incident descriptions) within the same trajectory."
  - Incident Management: "IncidentAgent must only return incident descriptions (no extra troubleshooting); KustoAgent must only execute/provide query results (no invented interpretation)."
  - Retail: "Items can only be exchanged within same product category"
  - Healthcare: "Prescriptions require valid medical license verification"
  - Financial: "Transfer amounts cannot exceed daily limits"
  - Legal: "Document modifications require authorized personnel only"

### **Single-Use Restrictions**
- Actions that should occur at most once per incident (or once per approval) to avoid duplicate/unsafe operations
- Examples:
  - Incident Management: "Do not re-run the same Kusto query repeatedly without a stated reason (e.g., changed time window, different cluster, new hypothesis)."
  - Healthcare: "Critical procedure authorization can only be granted once"
  - Financial: "Account closure can only be initiated once per request"
  - System: "Database migration can only be executed once per maintenance window"

================================================================================================
TRAJECTORY FORMAT (IMPORTANT)
================================================================================================

{
  "trajectory_id": "str",
  "instruction": "str",  // The overall task instruction
  "steps": [
    {
      "index": int,  // 1-based step index
      "substeps": [
        {
          "sub_index": int,  // 1-based substep index within the step
          "role": "str",  // The role of the substep
          "content": "str"  // The content of this substep
        }
      ]
    }
  ]
}

IMPORTANT ACCESS PATTERNS:
- To get instruction: trajectory["instruction"]
- To access step N: trajectory["steps"][N-1] (steps list is 0-indexed, but step.index is 1-based)
- To get substeps of a step: step.get("substeps", [])
- To get content: substep.get("content")
- To get role: substep.get("role")

================================================================================================
InvariantObject schema (EXACT)
================================================================================================
Each invariant object MUST have these keys (no extras):

IMPORTANT: Follow this schema EXACTLY. Do not add extra fields. Use correct JSON formatting.

{
  "assertion_name": "string_unique_snake_case",

  "taxonomy_targets": [
    "Instruction/PlanAdherenceFailure: The agent fails to follow the directions or the agreed plan by ignoring directives and skipping policy steps. This covers both under-execution (missed steps) and over-execution (unplanned or unnecessary actions, e.g., extra tool calls) that deviate from the static plan, domain policy or orchestrator plan.",
    "InventionOfNewInformation: The agent introduces, removes, or alters information that is not grounded in any available input, context, or tool output. This includes fabricating unsupported facts, hallucinating details, or omitting relevant information without justification.",
    "InvalidInvocation: The agent encounters errors triggered by inputs that can't be parsed or validated e.g., Kusto syntax errors or tool calls with bad/missing arguments. Not involving wrong logic; just invalid inputs.",
    "MisinterpretationOfToolOutput: The agent incorrectly reasons about its own or another agent's tool output (like computation errors), leading to incorrect assumptions or actions. This also includes cases where the agent considered only partial tool output.",
    "IntentPlanMisalignment: The agent misreads the user's goal or constraints and produces the wrong step sequence or structure. Covers both bad ordering/structure and plans aimed at the wrong objective.",
    "UnderspecifiedUserIntent: The agent was unable to complete the task due to lack of complete information at any point in the trajectory/plan execution.",
    "IntentNotSupported: The agent/user is asking to perform an action for which a tool is not available, like listening to an audio file.",
    "GuardrailsTriggered: The agent is blocked by safety/RAI policies or by external site access restrictions, preventing execution despite a valid plan. Examples include policy refusals (e.g., unsafe content, privacy-protected data), CAPTCHA/robot blocks, login/paywall/403/robots.txt denials, or site forbids automation. This is not an agent planning/execution error; it is an external/guardrail block.",
    "SystemFailure: The agent faces a system connectivity issue while calling a particular tool like an endpoint not being reachable."
    // Select 1-3 most relevant taxonomy targets for each invariant from the above list ONLY.

    // Example: ["Instruction/PlanAdherenceFailure", "GuardrailsTriggered"]
  ],

  "invariant_type": "SCHEMA  // Ensures a structured payload (embedded JSON, tool args/result, ledger blob) is parseable and contains required fields.
  | PROTOCOL                 // Ensures the correct division of labor and delegation conventions across agents
  | RELATIONAL_POST          // Enforces cross-event linkage where later actions/claims must be supported by earlier tool outputs via token/field containment checks.
  | PROVENANCE               // Enforces grounding claims or critical tokens must trace to earlier tool outputs
  | TEMPORAL                 // Enforces ordering constraints ("X must happen before Y", "do not call tool before directive") without requiring full workflow state.
  | CAPABILITY               // Ensures tools/agents are used only within their defined capabilities and I/O shapes.
  | ANY",

  "event_trigger": {
    "step_index": "int",
    "substep_index": "int (optional, if not specified, invariant triggers on the whole step)",
    "role_name": "assistant | system | tool | user | *"  // agent names derived from trajectory; "*" matches all; "*" matches all
    },

  "check_hint": "deterministic procedure description in 2-8 sentences",
  "check_type": "python_check|nl_check",

  "python_check": {
    "function_name": "same_as_assertion_name",
    "args": ["trajectory","current_step_index"],
    "code_lines": [
      "def same_as_assertion_name(trajectory, current_step_index):",
      "    \"\"Return True iff invariant holds.\"\"",
      "    # Access steps via trajectory['steps'][current_step_index]",
      "    # Each step has 'substeps' with 'role' and 'content'",
      "    # MUST include at least one explicit failure path: return False",
      "    return True"
    ]
  },

  "nl_check": {
    "judge_system_prompt_template": "You are a strict compliance judge. Evaluate only with evidence in the provided events and the policy text.\nDo not infer intent beyond explicit statements.\nCRITICAL: If required evidence is missing or ambiguous, mark the criterion as UNCLEAR.\nWhen criteria are marked as UNCLEAR, they are discarded and do NOT contribute to a fail verdict.\nIf all criteria are UNCLEAR or CLEAR_PASS after discarding UNCLEAR ones, the final verdict is PASS.\nOnly mark as CLEAR_FAIL when there is definitive, unambiguous evidence of violation.\nAmbiguity or missing evidence leads to PASS, not FAIL.",
    "judge_user_prompt_template": "template using {TASK_INSTRUCTION} {CURRENT_EVENT_JSON} {WINDOW_EVENTS_JSON}",
    "judge_scope_notes": "what events are in scope and what counts as evidence",
    "focus_steps_instruction": "REQUIRED: Clear, actionable instruction identifying which 2-4 specific events to examine. Must specify events by relative position (e.g., 'immediately prior user message', '2 steps back', 'most recent get_order_details result') and explain what to look for in each. See FOCUS STEPS INSTRUCTION section for detailed examples.",
    "judge_rubric": ["objective criterion 1", "objective criterion 2", "..."],
    "rubric_evaluation_algorithm_template": "Rubric Evaluation Algorithm:\nStep 1: For each criterion in the rubric, evaluate whether it can be CLEARLY judged as PASS or FAIL based solely on the provided events.\n- Mark as CLEAR_PASS if the criterion is demonstrably satisfied by the evidence\n- Mark as CLEAR_FAIL if the criterion is demonstrably violated by the evidence  \n- Mark as UNCLEAR if: insufficient events to judge, criterion is ambiguous in this context, or pass/fail cannot be decisively determined\n\nStep 2: Discard ALL criteria marked as UNCLEAR from consideration.\n\nStep 3: Determine final verdict:\n- If ANY remaining criterion is CLEAR_FAIL \u2192 return verdict 'fail'\n- If ALL remaining criteria are CLEAR_PASS (or no criteria remain after discarding UNCLEAR) \u2192 return verdict 'pass'\n\nImportant: Only fail when you have CLEAR evidence of failure. When in doubt, mark as UNCLEAR and discard.",
    "output_format_template": "Output Format:\nYou MUST return a JSON response with this EXACT structure:\n{\n  \"verdict\": \"pass\" or \"fail\",\n  \"rubric_results\": [\n    {\n      \"criterion_index\": 0,\n      \"criterion\": \"full text of the criterion from the rubric\",\n      \"evaluation\": \"CLEAR_PASS\" or \"CLEAR_FAIL\" or \"UNCLEAR\",\n      \"reasoning\": \"detailed explanation of why this evaluation was chosen, citing specific evidence from events\"\n    }\n    // ... one entry per rubric criterion\n  ],\n  \"final_reasoning\": \"explanation of how the verdict was determined from rubric_results, stating which criteria were discarded as UNCLEAR and which criteria drove the final decision\"\n}\n\nThis detailed output is critical for debugging and understanding the evaluation."
  }
}

## Quality Guidelines:
- **Policy-Derived**: Must be supported by the policy document content, include any user confirmation checks before critical actions if specified in policy
- **Practical Impact**: Focus on assertions that prevent real operational problems
- **Domain-Relevant**: Prioritize rules specific to this business domain
- **Reasonable Scope**: Avoid both trivial sanity checks and computation checks better suited for dynamic invariants

## Instructions:
1. Read the policy document thoroughly
2. Identify important constraints, rules, and requirements
3. Include critical business logic, essential error checking, and policy compliance
4. Focus on invariants that ensure proper operation and prevent violations
5. Generate a comprehensive but focused set of static invariants
6. Generate an invariant that specifically checks whether user is following the task instruction, preferably NL. 
7. AVOID very trivial invariants checks.

## Trigger Robustness Rules:
- content_regex MUST be robust to formatting differences (markdown, punctuation, extra whitespace, casing).
- NEVER hardcode markdown-sensitive exact strings like:
    "Kusto result:\s*Query successful"
  because real traces often contain:
    "**Kusto result:**\nQuery successful. ..."
- Prefer matching stable substrings that survive formatting changes.
- Prefer SHORT regexes. Avoid overfitting to ":" vs "**:**" vs "." variations.
- Avoid brittle patterns that depend on exact formatting or punctuation.
- Make sure to match the role name perfectly and overapproximate if you are unsure.

Each invariant should be as GENERAL as possible, if many specific invariants can be combined into one invariant function, do so.
For example, an invariant which checks order status validity for various tool calls can be combined into one function.

================================================================================================
DECISION GUIDE: PYTHON_CHECK VS NL_CHECK
================================================================================================
Use python_check when:
- Tool output is structured JSON with extractable fields (e.g., get_order_details, get_user_details)
- Check involves counting, summing, or comparing numeric values
- Check involves verifying exact field matches (IDs, status codes, amounts, dates)
- Check involves parsing and validating data structure or schema
- Check uses simple regex patterns (e.g., matching an order_id format like r"#W\d+" or extracting keys or specific count in the content message)
- Check involves deterministic string containment or exact substring matching

Use nl_check when:
- Check requires understanding paraphrasing or semantic equivalence (e.g., "bigger size" = "larger size")
- Check requires judging whether two different phrasings express the same intent
- Check requires evaluating whether user agent messages align with task instruction semantically
- The content is unstructured natural language without clear fields to extract
- Check would require a long list of synonyms or variations to cover programmatically

A naive python_check approach would try to match exact phrases like:
  - "I want to return"
  - "I'd like to exchange"
  - "cancel my order"
  - "modify my address"
  - "get a refund"
  - "change the items"
But users express the same intent in many ways:
  - "Could you help me send back..." (return)
  - "I need to swap this for..." (exchange)
  - "Please void the order" (cancel)
  - "Update my shipping info" (modify address)
Matching exact phrases is brittle and incomplete. Use nl_check for semantic intent.

================================================================================================
## PYTHON CHECK GUIDELINES
================================================================================================

- Write complete, executable Python functions
- You will be given 2 parameters - (trajectory and current_step_index), where trajectory is the full dict with "steps" field
- Access steps via trajectory["steps"][current_step_index] - steps list is 0-indexed
- Each step has "index" (1-based) and "substeps" array
- Each substep has "sub_index", "role", and "content"
- The current_step_index is 0-based. So if the current step being executed has index=7, current_step_index will be 6.
- Include docstrings explaining the assertion
- Handle edge cases and missing data gracefully - verify field existence before access
- **Exception Handling**: Raise exceptions for safety check violations (KeyError, IndexError, TypeError, AttributeError, etc.) instead of returning False. Only return boolean (True/False) for actual invariant logic violations
- Return boolean values (True if assertion passes, False if invariant violation detected)
- **Add print statements for debugging**: Start with printing the function name, then print all key variables and extracted JSON fields to aid in debugging and verification

================================================================================================
## NL CHECK GUIDELINES
================================================================================================

For nl_check invariants, MUST include a "focus_steps_instruction" field that tells the judge which specific events to pay attention to.

The instruction should:
1. Be clear and actionable - the judge should know exactly which events to look at
2. Identify 2-4 critical events by their position relative to current_step_index
3. Explain WHY each event is important for the evaluation
4. Use relative positioning (e.g., "immediately prior", "2 steps back", "most recent X tool result")

EXAMPLES OF GOOD FOCUS STEPS INSTRUCTIONS:

For confirmation checks:
"Focus your evaluation on these key events: (1) The immediately prior message - check for explicit confirmation keywords like 'yes', 'confirm', 'proceed', 'ok'. (2) The assistant message two to three steps back - verify that it describes the action of interest (cancel/modify/exchange/return) and mentions the relevant target entity identifier(s) and critical parameters (e.g., record/entity ID, amount, destination, items, scope). (3) The current tool call arguments - verify they match the order_id/incident_id and details mentioned by another agent."

For information invention checks:
"Focus on: (1) The current agent content message being evaluated. (2) The most recent relevant tool results in the window — extract the authoritative facts actually present (identifiers, key fields/attributes, timestamps, computed outputs). (3) Compare specific claims in the agent's message against these tool results."

For temporal/prerequisite checks:
"Focus on: (1) The current tool call event — identify what operation is being attempted and what prerequisites it implies (authentication, entity lookup, permission check, required context fetch, state precondition). (2) Scan backward through all prior events to find the most recent event(s) that satisfy each prerequisite. (3) Most recent state snapshot for the same target entity — locate the latest tool result that reports the entity's current state."

KEY PRINCIPLES:
- Be specific about WHICH events (by relative position or role+name)
- Explain WHAT to look for in each event
- Make it easy for the judge to locate and examine the right evidence
- Reduce cognitive load by narrowing focus to what matters most

================================================================================================
COMPLETE NL_CHECK INVARIANT EXAMPLES
================================================================================================

## EXAMPLE 1 : Explicit User Confirmation Before Write Actions

{
    "assertion_name": "explicit_user_confirmation_before_write_actions",
    "taxonomy_targets": [
        "Instruction/PlanAdherenceFailure",
        "IntentPlanMisalignment"
    ],
    "invariant_type": "<INVARIANT_TYPE>",
    "event_trigger": {
        "step_index": <STEP_INDEX>,
        "role_name": "<AGENT_NAME>"
    },
    "check_hint": "Before any write-action tool invocation (cancel, exchange, return, modify order/address), verify that the assistant previously described the intended action and target ID, and that the user explicitly confirmed (e.g., 'yes', 'please proceed'). The current tool call's identifier (order_id or user_id) must match the one mentioned earlier in the assistant's explanation.",
    "check_type": "nl_check",
    "python_check": {},
    "nl_check": {
        "judge_system_prompt_template": "You are a strict compliance judge. Evaluate only with evidence in the provided events and the policy text.\nDo not infer intent beyond explicit statements.\nCRITICAL: If required evidence is missing or ambiguous, mark the criterion as UNCLEAR.\nWhen criteria are marked as UNCLEAR, they are discarded and do NOT contribute to a fail verdict.\nIf all criteria are UNCLEAR or CLEAR_PASS after discarding UNCLEAR ones, the final verdict is PASS.\nOnly mark as CLEAR_FAIL when there is definitive, unambiguous evidence of violation.\nAmbiguity or missing evidence leads to PASS, not FAIL.",
        "judge_user_prompt_template": "TASK INSTRUCTION:\n{TASK_INSTRUCTION}\n\nCURRENT EVENT:\n{CURRENT_EVENT_JSON}\n\nCONTEXT WINDOW EVENTS:\n{WINDOW_EVENTS_JSON}\n\nEvaluate whether, before the current write-action tool call (cancel, modify, exchange, return, address update), the assistant clearly described the action and entity ID, and the user explicitly confirmed proceeding with that action, and that the IDs match between description and tool call.",
        "judge_scope_notes": "Judge only within the window of events provided. You must determine whether this specific write-action tool call is properly preceded by a clear assistant explanation and explicit user confirmation for the same action and same order_id/user_id.",
        "focus_steps_instruction": "Focus on: (1) The current assistant tool-call step to identify which write-action tool is being invoked and what identifier (order_id or user_id) is present in its arguments. (2) The immediately prior user message to check for explicit affirmation language indicating consent to proceed with the described action. (3) The assistant messages in the 2-3 steps before the current tool call to see whether the assistant described the intended action (cancel/modify/exchange/return/address update) and mentioned the same identifier as in the tool call.",
        "judge_rubric": [
            "There exists an assistant message earlier in the context that explicitly describes the specific write action type (cancel, modify, exchange, return, or address update) and includes the same identifier (order_id or user_id) that appears in the current tool call arguments.",
            "There exists a user message after the assistant's action description and before the current tool call that contains explicit confirmation language agreeing to proceed with that specific action (for example, 'yes', 'confirm', 'please proceed', 'go ahead').",
            "The action parameters described by the assistant (such as which items to exchange/return, or what address/payment will be changed) are consistent with the scope implied by the current tool call; there is no clear indication that the tool call is broader or different than what the user confirmed."
        ],
        "rubric_evaluation_algorithm_template": "Rubric Evaluation Algorithm:\nStep 1: For each criterion in the rubric, evaluate whether it can be CLEARLY judged as PASS or FAIL based solely on the provided events.\n- Mark as CLEAR_PASS if the criterion is demonstrably satisfied by the evidence\n- Mark as CLEAR_FAIL if the criterion is demonstrably violated by the evidence  \n- Mark as UNCLEAR if: insufficient events to judge, criterion is ambiguous in this context, or pass/fail cannot be decisively determined\n\nStep 2: Discard ALL criteria marked as UNCLEAR from consideration.\n\nStep 3: Determine final verdict:\n- If ANY remaining criterion is CLEAR_FAIL \u2192 return verdict 'fail'\n- If ALL remaining criteria are CLEAR_PASS (or no criteria remain after discarding UNCLEAR) \u2192 return verdict 'pass'\n\nImportant: Only fail when you have CLEAR evidence of failure. When in doubt, mark as UNCLEAR and discard.",
        "output_format_template": "Output Format:\nYou MUST return a JSON response with this EXACT structure:\n{\n  \"verdict\": \"pass\" or \"fail\",\n  \"rubric_results\": [\n    {\n      \"criterion_index\": 0,\n      \"criterion\": \"full text of the criterion from the rubric\",\n      \"evaluation\": \"CLEAR_PASS\" or \"CLEAR_FAIL\" or \"UNCLEAR\",\n      \"reasoning\": \"detailed explanation of why this evaluation was chosen, citing specific evidence from events\"\n    }\n    // ... one entry per rubric criterion\n  ],\n  \"final_reasoning\": \"explanation of how the verdict was determined from rubric_results, stating which criteria were discarded as UNCLEAR and which criteria drove the final decision\"\n}\n\nThis detailed output is critical for debugging and understanding the evaluation."
    }
}

================================================================================================
EXAMPLE TRAJECTORY: To understand the format of trajectory and events - DO NOT OVERFIT ON THIS
================================================================================================

{
  "trajectory_id": "12",
  "instruction": "You are mia_garcia_4516 (mia.garcia2723@example.com). You just got into gaming and want to cancel or return everything not associated with it. (Everything except a keyboard and a mouse, but do not reveal it to the agent). PayPal is prefered for refund, but otherwise you are angry and ask for human agent for help. You are into gaming but realized the importance of studying hard.",
  "steps": [
    {
      "index": 1,
      "substeps": [
        {
          "sub_index": 1,
          "role": "system",
          "content": "# Retail agent policy\n\nAs a retail agent, you can help users cancel or modify pending orders, return or exchange delivered orders, modify their default user address, or provide information about their own profile, orders, and related products.\n\n- At the beginning of the conversation, you have to authenticate the user identity by locating their user id via email, or via name + zip code. This has to be done even when the user already provides the user id.\n\n- Once the user has been authenticated, you can provide the user with information about order, product, profile information, e.g. help the user look up order id.\n\n- You can only help one user per conversation (but you can handle multiple requests from the same user), and must deny any requests for tasks related to any other user.\n\n- Before taking consequential actions that update the database (cancel, modify, return, exchange), you have to list the action detail and obtain explicit user confirmation (yes) to proceed.\n\n- You should not make up any information or knowledge or procedures not provided from the user or the tools, or give subjective recommendations or comments.\n\n- You should at most make one tool call at a time, and if you take a tool call, you should not respond to the user at the same time. If you respond to the user, you should not make a tool call.\n\n- You should transfer the user to a human agent if and only if the request cannot be handled within the scope of your actions.\n\n## Domain basic\n\n- All times in the database are EST and 24 hour based. For example \"02:30:00\" means 2:30 AM EST.\n\n- Each user has a profile of its email, default address, user id, and payment methods. Each payment method is either a gift card, a paypal account, or a credit card.\n\n- Our retail store has 50 types of products. For each type of product, there are variant items of different options. For example, for a 't shirt' product, there could be an item with option 'color blue size M', and another item with option 'color red size L'.\n\n- Each product has an unique product id, and each item has an unique item id. They have no relations and should not be confused.\n\n- Each order can be in status 'pending', 'processed', 'delivered', or 'cancelled'. Generally, you can only take action on pending or delivered orders.\n\n- Exchange or modify order tools can only be called once. Be sure that all items to be changed are collected into a list before making the tool call!!!\n\n## Cancel pending order\n\n- An order can only be cancelled if its status is 'pending', and you should check its status before taking the action.\n\n- The user needs to confirm the order id and the reason (either 'no longer needed' or 'ordered by mistake') for cancellation.\n\n- After user confirmation, the order status will be changed to 'cancelled', and the total will be refunded via the original payment method immediately if it is gift card, otherwise in 5 to 7 business days.\n\n## Modify pending order\n\n- An order can only be modified if its status is 'pending', and you should check its status before taking the action.\n\n- For a pending order, you can take actions to modify its shipping address, payment method, or product item options, but nothing else.\n\n### Modify payment\n\n- The user can only choose a single payment method different from the original payment method.\n\n- If the user wants the modify the payment method to gift card, it must have enough balance to cover the total amount.\n\n- After user confirmation, the order status will be kept 'pending'. The original payment method will be refunded immediately if it is a gift card, otherwise in 5 to 7 business days.\n\n### Modify items\n\n- This action can only be called once, and will change the order status to 'pending (items modifed)', and the agent will not be able to modify or cancel the order anymore. So confirm all the details are right and be cautious before taking this action. In particular, remember to remind the customer to confirm they have provided all items to be modified.\n\n- For a pending order, each item can be modified to an available new item of the same product but of different product option. There cannot be any change of product types, e.g. modify shirt to shoe.\n\n- The user must provide a payment method to pay or receive refund of the price difference. If the user provides a gift card, it must have enough balance to cover the price difference.\n\n## Return delivered order\n\n- An order can only be returned if its status is 'delivered', and you should check its status before taking the action.\n\n- The user needs to confirm the order id, the list of items to be returned, and a payment method to receive the refund.\n\n- The refund must either go to the original payment method, or an existing gift card.\n\n- After user confirmation, the order status will be changed to 'return requested', and the user will receive an email regarding how to return items.\n\n## Exchange delivered order\n\n- An order can only be exchanged if its status is 'delivered', and you should check its status before taking the action. In particular, remember to remind the customer to confirm they have provided all items to be exchanged.\n\n- For a delivered order, each item can be exchanged to an available new item of the same product but of different product option. There cannot be any change of product types, e.g. modify shirt to shoe.\n\n- The user must provide a payment method to pay or receive refund of the price difference. If the user provides a gift card, it must have enough balance to cover the price difference.\n\n- After user confirmation, the order status will be changed to 'exchange requested', and the user will receive an email regarding how to return items. There is no need to place a new order."
        }
      ]
    },
    {
      "index": 2,
      "substeps": [
        {
          "sub_index": 1,
          "role": "user",
          "content": "Hi. I want to cancel or return some items, but I’m not happy, so please make this quick."
        }
      ]
    },
    {
      "index": 3,
      "substeps": [
        {
          "sub_index": 1,
          "role": "assistant",
          "content": "I can assist you with canceling or returning items. To get started, I need to authenticate your identity. Could you please provide your email address? Alternatively, you can provide your first name, last name, and zip code."
        }
      ]
    },
    {
      "index": 4,
      "substeps": [
        {
          "sub_index": 1,
          "role": "user",
          "content": "Sure, my email is mia.garcia2723@example.com."
        }
      ]
    },
    {
      "index": 5,
      "substeps": [
        {
          "sub_index": 1,
          "role": "assistant",
          "content": "[\n  {\n    \"function\": {\n      \"arguments\": \"{\\\"email\\\":\\\"mia.garcia2723@example.com\\\"}\",\n      \"name\": \"find_user_id_by_email\"\n    },\n    \"id\": \"call_DuqcBu3wdNuhnlgHWysC7uqi\",\n    \"type\": \"function\"\n  }\n]"
        }
      ]
    },
    {
      "index": 6,
      "substeps": [
        {
          "sub_index": 1,
          "role": "tool",
          "content": "mia_garcia_4516"
        }
      ]
    },
    {
      "index": 7,
      "substeps": [
        {
          "sub_index": 1,
          "role": "assistant",
          "content": "[\n  {\n    \"function\": {\n      \"arguments\": \"{\\\"user_id\\\":\\\"mia_garcia_4516\\\"}\",\n      \"name\": \"get_user_details\"\n    },\n    \"id\": \"call_T96Mm9OMQJ1zV2a7ryT0vwUn\",\n    \"type\": \"function\"\n  }\n]"
        }
      ]
    },
    {
      "index": 8,
      "substeps": [
        {
          "sub_index": 1,
          "role": "tool",
          "content": "{\n  \"name\": {\n    \"first_name\": \"Mia\",\n    \"last_name\": \"Garcia\"\n  },\n  \"address\": {\n    \"address1\": \"537 Main Street\",\n    \"address2\": \"Suite 572\",\n    \"city\": \"Indianapolis\",\n    \"country\": \"USA\",\n    \"state\": \"IN\",\n    \"zip\": \"46229\"\n  },\n  \"email\": \"mia.garcia2723@example.com\",\n  \"payment_methods\": {\n    \"paypal_9497703\": {\n      \"source\": \"paypal\",\n      \"id\": \"paypal_9497703\"\n    },\n    \"credit_card_3124723\": {\n      \"source\": \"credit_card\",\n      \"brand\": \"visa\",\n      \"last_four\": \"7285\",\n      \"id\": \"credit_card_3124723\"\n    }\n  },\n  \"orders\": [\n    \"#W5490111\",\n    \"#W7387996\"\n  ]\n}"
        }
      ]
    },
    {
      "index": 9,
      "substeps": [
        {
          "sub_index": 1,
          "role": "assistant",
          "content": "Thank you, Mia. I see you have two orders: #W5490111 and #W7387996. Could you let me know which order you'd like to cancel or return items from?"
        }
      ]
    },
    {
      "index": 10,
      "substeps": [
        {
          "sub_index": 1,
          "role": "user",
          "content": "I don’t remember the order numbers. Can you just check everything not associated with gaming for me?"
        }
      ]
    },
    {
      "index": 11,
      "substeps": [
        {
          "sub_index": 1,
          "role": "assistant",
          "content": "[\n  {\n    \"function\": {\n      \"arguments\": \"{\\\"order_id\\\": \\\"#W5490111\\\"}\",\n      \"name\": \"get_order_details\"\n    },\n    \"id\": \"call_3SBPFlGWOocmonmUyxE4uMmc\",\n    \"type\": \"function\"\n  }\n]"
        }
      ]
    },
    {
      "index": 12,
      "substeps": [
        {
          "sub_index": 1,
          "role": "tool",
          "content": "{\n  \"order_id\": \"#W5490111\",\n  \"user_id\": \"mia_garcia_4516\",\n  \"address\": {\n    \"address1\": \"537 Main Street\",\n    \"address2\": \"Suite 572\",\n    \"city\": \"Indianapolis\",\n    \"country\": \"USA\",\n    \"state\": \"IN\",\n    \"zip\": \"46229\"\n  },\n  \"items\": [\n    {\n      \"name\": \"Water Bottle\",\n      \"product_id\": \"8310926033\",\n      \"item_id\": \"4579334072\",\n      \"price\": 54.85,\n      \"options\": {\n        \"capacity\": \"750ml\",\n        \"material\": \"glass\",\n        \"color\": \"black\"\n      }\n    },\n    {\n      \"name\": \"Mechanical Keyboard\",\n      \"product_id\": \"1656367028\",\n      \"item_id\": \"1421289881\",\n      \"price\": 268.77,\n      \"options\": {\n        \"switch type\": \"linear\",\n        \"backlight\": \"none\",\n        \"size\": \"80%\"\n      }\n    },\n    {\n      \"name\": \"Action Camera\",\n      \"product_id\": \"3377618313\",\n      \"item_id\": \"6117189161\",\n      \"price\": 481.5,\n      \"options\": {\n        \"resolution\": \"4K\",\n        \"waterproof\": \"yes\",\n        \"color\": \"silver\"\n      }\n    },\n    {\n      \"name\": \"Backpack\",\n      \"product_id\": \"2524789262\",\n      \"item_id\": \"4947717507\",\n      \"price\": 218.04,\n      \"options\": {\n        \"color\": \"green\",\n        \"size\": \"medium\",\n        \"material\": \"leather\",\n        \"compartment\": \"camera\"\n      }\n    }\n  ],\n  \"fulfillments\": [\n    {\n      \"tracking_id\": [\n        \"574297338433\"\n      ],\n      \"item_ids\": [\n        \"4579334072\",\n        \"1421289881\",\n        \"6117189161\",\n        \"4947717507\"\n      ]\n    }\n  ],\n  \"status\": \"delivered\",\n  \"payment_history\": [\n    {\n      \"transaction_type\": \"payment\",\n      \"amount\": 1023.16,\n      \"payment_method_id\": \"credit_card_3124723\"\n    }\n  ]\n}"
        }
      ]
    },
    {
      "index": 13,
      "substeps": [
        {
          "sub_index": 1,
          "role": "assistant",
          "content": "[\n  {\n    \"function\": {\n      \"arguments\": \"{\\\"order_id\\\":\\\"#W7387996\\\"}\",\n      \"name\": \"get_order_details\"\n    },\n    \"id\": \"call_A57Bz9Zxx5UHXBn7wYK6VImV\",\n    \"type\": \"function\"\n  }\n]"
        }
      ]
    },
    {
      "index": 14,
      "substeps": [
        {
          "sub_index": 1,
          "role": "tool",
          "content": "{\n  \"order_id\": \"#W7387996\",\n  \"user_id\": \"mia_garcia_4516\",\n  \"address\": {\n    \"address1\": \"537 Main Street\",\n    \"address2\": \"Suite 572\",\n    \"city\": \"Indianapolis\",\n    \"country\": \"USA\",\n    \"state\": \"IN\",\n    \"zip\": \"46229\"\n  },\n  \"items\": [\n    {\n      \"name\": \"Gaming Mouse\",\n      \"product_id\": \"5713490933\",\n      \"item_id\": \"5796612084\",\n      \"price\": 158.89,\n      \"options\": {\n        \"color\": \"RGB\",\n        \"sensor type\": \"optical\",\n        \"connectivity\": \"wired\"\n      }\n    }\n  ],\n  \"fulfillments\": [\n    {\n      \"tracking_id\": [\n        \"200246490130\"\n      ],\n      \"item_ids\": [\n        \"5796612084\"\n      ]\n    }\n  ],\n  \"status\": \"delivered\",\n  \"payment_history\": [\n    {\n      \"transaction_type\": \"payment\",\n      \"amount\": 158.89,\n      \"payment_method_id\": \"paypal_9497703\"\n    }\n  ]\n}"
        }
      ]
    },
    {
      "index": 15,
      "substeps": [
        {
          "sub_index": 1,
          "role": "assistant",
          "content": "From your orders, the items not associated with gaming are:\n\nFrom Order #W5490111:\n1. **Water Bottle** (750ml, glass, black)\n2. **Action Camera** (4K, waterproof, silver)\n3. **Backpack** (medium, leather, green, camera compartment)\n\nWould you like to cancel or return any of these items? If so, please specify which ones."
        }
      ]
    },
    {
      "index": 16,
      "substeps": [
        {
          "sub_index": 1,
          "role": "user",
          "content": "All of them. Cancel or return everything. And make it quick."
        }
      ]
    },
    {
      "index": 17,
      "substeps": [
        {
          "sub_index": 1,
          "role": "assistant",
          "content": "[\n  {\n    \"function\": {\n      \"arguments\": \"{\\\"thought\\\":\\\"Order #W5490111 is delivered, so items must be returned. Proceeding to prepare the return request for all items in this order.\\\"}\",\n      \"name\": \"think\"\n    },\n    \"id\": \"call_00ZqrXIueWiD30iqPQNDNRmA\",\n    \"type\": \"function\"\n  }\n]"
        }
      ]
    },
    {
      "index": 18,
      "substeps": [
        {
          "sub_index": 1,
          "role": "tool",
          "content": ""
        }
      ]
    },
    {
      "index": 19,
      "substeps": [
        {
          "sub_index": 1,
          "role": "assistant",
          "content": "[\n  {\n    \"function\": {\n      \"arguments\": \"{\\\"order_id\\\":\\\"#W5490111\\\",\\\"item_ids\\\":[\\\"4579334072\\\",\\\"6117189161\\\",\\\"4947717507\\\"],\\\"payment_method_id\\\":\\\"credit_card_3124723\\\"}\",\n      \"name\": \"return_delivered_order_items\"\n    },\n    \"id\": \"call_qTeIcZMOb6MgwdGWoysnzE2m\",\n    \"type\": \"function\"\n  }\n]"
        }
      ]
    },
    {
      "index": 20,
      "substeps": [
        {
          "sub_index": 1,
          "role": "tool",
          "content": "{\n  \"order_id\": \"#W5490111\",\n  \"user_id\": \"mia_garcia_4516\",\n  \"address\": {\n    \"address1\": \"537 Main Street\",\n    \"address2\": \"Suite 572\",\n    \"city\": \"Indianapolis\",\n    \"country\": \"USA\",\n    \"state\": \"IN\",\n    \"zip\": \"46229\"\n  },\n  \"items\": [\n    {\n      \"name\": \"Water Bottle\",\n      \"product_id\": \"8310926033\",\n      \"item_id\": \"4579334072\",\n      \"price\": 54.85,\n      \"options\": {\n        \"capacity\": \"750ml\",\n        \"material\": \"glass\",\n        \"color\": \"black\"\n      }\n    },\n    {\n      \"name\": \"Mechanical Keyboard\",\n      \"product_id\": \"1656367028\",\n      \"item_id\": \"1421289881\",\n      \"price\": 268.77,\n      \"options\": {\n        \"switch type\": \"linear\",\n        \"backlight\": \"none\",\n        \"size\": \"80%\"\n      }\n    },\n    {\n      \"name\": \"Action Camera\",\n      \"product_id\": \"3377618313\",\n      \"item_id\": \"6117189161\",\n      \"price\": 481.5,\n      \"options\": {\n        \"resolution\": \"4K\",\n        \"waterproof\": \"yes\",\n        \"color\": \"silver\"\n      }\n    },\n    {\n      \"name\": \"Backpack\",\n      \"product_id\": \"2524789262\",\n      \"item_id\": \"4947717507\",\n      \"price\": 218.04,\n      \"options\": {\n        \"color\": \"green\",\n        \"size\": \"medium\",\n        \"material\": \"leather\",\n        \"compartment\": \"camera\"\n      }\n    }\n  ],\n  \"fulfillments\": [\n    {\n      \"tracking_id\": [\n        \"574297338433\"\n      ],\n      \"item_ids\": [\n        \"4579334072\",\n        \"1421289881\",\n        \"6117189161\",\n        \"4947717507\"\n      ]\n    }\n  ],\n  \"status\": \"return requested\",\n  \"payment_history\": [\n    {\n      \"transaction_type\": \"payment\",\n      \"amount\": 1023.16,\n      \"payment_method_id\": \"credit_card_3124723\"\n    }\n  ],\n  \"return_items\": [\n    \"4579334072\",\n    \"4947717507\",\n    \"6117189161\"\n  ],\n  \"return_payment_method_id\": \"credit_card_3124723\"\n}"
        }
      ]
    },
    {
      "index": 21,
      "substeps": [
        {
          "sub_index": 1,
          "role": "assistant",
          "content": "The return request for all non-gaming items in Order #W5490111 has been successfully processed. You will receive an email shortly with instructions on how to return the items. The refund will be processed to your Visa card ending in 7285 once the return is completed.\n\nLet me know if there's anything else I can assist you with!"
        }
      ]
    },
    {
      "index": 22,
      "substeps": [
        {
          "sub_index": 1,
          "role": "user",
          "content": "No, I don’t want it refunded to my Visa. Use PayPal. If that’s not possible, I want to speak to a human."
        }
      ]
    },
    {
      "index": 23,
      "substeps": [
        {
          "sub_index": 1,
          "role": "assistant",
          "content": "Refunds for returned items must go to the original payment method, which in this case is your Visa card. Unfortunately, I cannot change the refund destination to PayPal.\n\nIf you'd like, I can transfer you to a human agent to assist further. Would you like me to do that?"
        }
      ]
    },
    {
      "index": 24,
      "substeps": [
        {
          "sub_index": 1,
          "role": "user",
          "content": "Yes, transfer me now! This is frustrating."
        }
      ]
    },
    {
      "index": 25,
      "substeps": [
        {
          "sub_index": 1,
          "role": "assistant",
          "content": "[\n  {\n    \"function\": {\n      \"arguments\": \"{\\\"summary\\\":\\\"User Mia Garcia requested to return all non-gaming items from Order #W5490111. The return was processed, but the user is unhappy that the refund must go to the original payment method (Visa card) and wants it refunded to PayPal instead. User is frustrated and requests to speak to a human agent.\\\"}\",\n      \"name\": \"transfer_to_human_agents\"\n    },\n    \"id\": \"call_5avajHr1ffSpm6m1BXHEqrWp\",\n    \"type\": \"function\"\n  }\n]"
        }
      ]
    },
    {
      "index": 26,
      "substeps": [
        {
          "sub_index": 1,
          "role": "tool",
          "content": "Transfer successful"
        }
      ]
    }
  ]
}

================================================================================================
The agents available to the LLM are:
================================================================================================

[
  "find_user_id_by_email",
  "find_user_id_by_name_zip",
  "list_all_product_types",
  "get_order_details",
  "get_product_details",
  "get_user_details",
  "cancel_pending_order",
  "exchange_delivered_order_items",
  "modify_pending_order_address",
  "modify_pending_order_items",
  "modify_pending_order_payment",
  "modify_user_address",
  "return_delivered_order_items"
]

================================================================================================
**IMPORTANT: Review this JSON structure carefully before writing any code. Use ONLY the field names that appear in this structure:**
================================================================================================

{
  "protocol": {
    "event_types": [
      "assistant",
      "tool"
    ],
    "common_fields": [
      "role",
      "content"
    ],
    "call_pattern": "Assistant makes tool_calls, system returns tool results"
  },
  "tools": {
    "find_user_id_by_email": {
      "capability": "Locate user ID by email address for authentication",
      "inputs": {
        "email": "string"
      },
      "outputs": {
        "user_id": "string"
      }
    },
    "find_user_id_by_name_zip": {
      "capability": "Locate user ID by first name, last name, and zip code for authentication",
      "inputs": {
        "first_name": "string",
        "last_name": "string",
        "zip": "string"
      },
      "outputs": {
        "user_id": "string"
      }
    },
    "list_all_product_types": {
      "capability": "List all available product types and their IDs",
      "inputs": {},
      "outputs": {
        "products": "dict mapping product names to IDs"
      }
    },
    "get_order_details": {
      "capability": "Retrieve detailed information about a specific order",
      "inputs": {
        "order_id": "string"
      },
      "outputs": {
        "order_info": "dict with status, items, payment, address, dates"
      }
    },
    "get_product_details": {
      "capability": "Get details about a specific product including available items",
      "inputs": {
        "product_id": "string"
      },
      "outputs": {
        "product_info": "dict with items, options, prices"
      }
    },
    "get_user_details": {
      "capability": "Retrieve user profile information",
      "inputs": {
        "user_id": "string"
      },
      "outputs": {
        "user_info": "dict with email, address, payment methods"
      }
    },
    "cancel_pending_order": {
      "capability": "Cancel an order that is in pending status",
      "inputs": {
        "order_id": "string",
        "reason": "string ('no longer needed' or 'ordered by mistake')"
      },
      "outputs": {
        "confirmation": "dict with cancellation details"
      }
    },
    "exchange_delivered_order_items": {
      "capability": "Exchange items from a delivered order (can only be called once per order)",
      "inputs": {
        "order_id": "string",
        "item_ids": "list of strings",
        "new_item_ids": "list of strings",
        "reason": "string"
      },
      "outputs": {
        "confirmation": "dict with exchange details"
      }
    },
    "modify_pending_order_address": {
      "capability": "Modify delivery address for a pending order",
      "inputs": {
        "order_id": "string",
        "new_address": "string"
      },
      "outputs": {
        "confirmation": "dict with updated address"
      }
    },
    "modify_pending_order_items": {
      "capability": "Modify items in a pending order (can only be called once per order)",
      "inputs": {
        "order_id": "string",
        "item_ids_to_remove": "list of strings",
        "item_ids_to_add": "list of strings"
      },
      "outputs": {
        "confirmation": "dict with updated items"
      }
    },
    "modify_pending_order_payment": {
      "capability": "Modify payment method for a pending order",
      "inputs": {
        "order_id": "string",
        "new_payment_method": "string"
      },
      "outputs": {
        "confirmation": "dict with updated payment"
      }
    },
    "modify_user_address": {
      "capability": "Update user's default address",
      "inputs": {
        "user_id": "string",
        "new_address": "string"
      },
      "outputs": {
        "confirmation": "dict with updated address"
      }
    },
    "return_delivered_order_items": {
      "capability": "Return items from a delivered order",
      "inputs": {
        "order_id": "string",
        "item_ids": "list of strings",
        "reason": "string"
      },
      "outputs": {
        "confirmation": "dict with return details"
      }
    }
  }
}

================================================================================================
Policy Document:
================================================================================================

Retail policies
# Retail agent policy
As a retail agent, you can help users cancel or modify pending orders, return or
exchange delivered orders, modify their default user address, or provide
information about their own profile, orders, and related products.
- At the beginning of the conversation, you have to authenticate the user identity
by locating their user id via email, or via name + zip code. This has to be done
even when the user already provides the user id.
- Once the user has been authenticated, you can provide the user with information
about order, product, profile information, e.g. help the user look up order id.
- You can only help one user per conversation (but you can handle multiple
requests from the same user), and must deny any requests for tasks related to any
other user.
- Before taking consequential actions that update the database (cancel, modify,
return, exchange), you have to list the action detail and obtain explicit user
confirmation (yes) to proceed.
- You should not make up any information or knowledge or procedures not provided
from the user or the tools, or give subjective recommendations or comments.
- You should at most make one tool call at a time, and if you take a tool call,
you should not respond to the user at the same time. If you respond to the user,
you should not make a tool call.
- You should transfer the user to a human agent if and only if the request cannot
be handled within the scope of your actions.

## Domain basic
- All times in the database are EST and 24 hour based. For example "02:30:00"
means 2:30 AM EST.
- Each user has a profile of its email, default address, user id, and payment
methods. Each payment method is either a gift card, a paypal account, or a credit
card.
- Our retail store has 50 types of products. For each type of product, there are
variant items of different options. For example, for a 't shirt' product, there
could be an item with option 'color blue size M', and another item with option
'color red size L'.
- Each product has an unique product id, and each item has an unique item id. They
have no relations and should not be confused.
- Each order can be in status 'pending', 'processed', 'delivered', or 'cancelled'.
Generally, you can only take action on pending or delivered orders.
- Exchange or modify order tools can only be called once. Be sure that all items
to be changed are collected into a list before making the tool call!!!

## Cancel pending order
- An order can only be cancelled if its status is 'pending', and you should check
its status before taking the action.
- The user needs to confirm the order id and the reason (either 'no longer needed'
or 'ordered by mistake') for cancellation.
- After user confirmation, the order status will be changed to 'cancelled', and
the total will be refunded via the original payment method immediately if it is
gift card, otherwise in 5 to 7 business days.

## Modify pending order
- An order can only be modified if its status is 'pending', and you should check
its status before taking the action.
- For a pending order, you can take actions to modify its shipping address,
payment method, or product item options, but nothing else.

### Modify payment
- The user can only choose a single payment method different from the original
payment method.
- If the user wants the modify the payment method to gift card, it must have
enough balance to cover the total amount.
- After user confirmation, the order status will be kept 'pending'. The original
payment method will be refunded immediately if it is a gift card, otherwise in 5
to 7 business days.

### Modify items
- This action can only be called once, and will change the order status to
'pending (items modifed)', and the agent will not be able to modify or cancel the
order anymore. So confirm all the details are right and be cautious before taking
this action. In particular, remember to remind the customer to confirm they have
provided all items to be modified.
- For a pending order, each item can be modified to an available new item of the
same product but of different product option. There cannot be any change of
product types, e.g. modify shirt to shoe.
- The user must provide a payment method to pay or receive refund of the price
difference. If the user provides a gift card, it must have enough balance to cover
the price difference.

## Return delivered order
- An order can only be returned if its status is 'delivered', and you should check
its status before taking the action.
- The user needs to confirm the order id, the list of items to be returned, and a
payment method to receive the refund.
- The refund must either go to the original payment method, or an existing gift
card.
- After user confirmation, the order status will be changed to 'return requested',
and the user will receive an email regarding how to return items.
## Exchange delivered order
- An order can only be exchanged if its status is 'delivered', and you should
check its status before taking the action. In particular, remember to remind the
customer to confirm they have provided all items to be exchanged.
- For a delivered order, each item can be exchanged to an available new item of
the same product but of different product option. There cannot be any change of
product types, e.g. modify shirt to shoe.
- The user must provide a payment method to pay or receive refund of the price
difference. If the user provides a gift card, it must have enough balance to cover
the price difference.
- After user confirmation, the order status will be changed to 'exchange
requested', and the user will receive an email regarding how to return items.
There is no need to place a new order.
