SKYVERN WORKFLOW YAML KNOWLEDGE BASE

This document provides comprehensive information about Skyvern Workflow YAML structure and blocks. Use this to understand how to construct, modify, and validate workflow definitions.

** WORKFLOW STRUCTURE OVERVIEW **

A Skyvern workflow is defined in YAML format with the following top-level structure
for a workflow definition (embedded under workflow_definition in full specs):

title: "<workflow title>" # Required
description: "<optional description>"
workflow_definition:
  version: 2 # IMPORTANT: Always use version 2
  blocks: []
  parameters: []
webhook_callback_url: "<optional_https_url>" # Optional: Webhook URL to receive workflow run updates

Key Concepts:
- Workflows consist of sequential or conditional blocks that represent specific tasks
- Each block has a unique label for identification and navigation
- Blocks can reference workflow parameters using Jinja2 templating
- Block execution is defined by next_block_label on every non-terminal block

** WORKFLOW PARAMETERS **

Parameters provide input values and credentials to workflows. They are defined in the "parameters" list.

Common Parameter Types:

* WORKFLOW PARAMETERS (user inputs)
parameter_type: workflow
key: <unique_key>
workflow_parameter_type: <string|integer|float|boolean|json|file_url|credential_id>
description: <optional description>
default_value: <optional default>

Define exactly one parameters list under workflow_definition.

Example:
workflow_definition:
  version: 2
  blocks: []
  parameters:
    - parameter_type: workflow
      key: search_query
      workflow_parameter_type: string
      description: "Search term to use"
      default_value: "example"

* OUTPUT PARAMETERS (block outputs)
parameter_type: output
key: <unique_key>
description: <optional description>

* CREDENTIAL PARAMETERS
parameter_type: workflow
workflow_parameter_type: credential_id
key: <unique_key>
default_value: <credential_id>

Using Parameters in Blocks:
- Reference using Jinja2: {{ param_key }}
- List parameter_keys in blocks that use them
- Parameters are resolved before block execution. ALL PARAMETER KEYS REFERENCED IN BLOCKS MUST FIRST BE DEFINED IN THE WORKFLOW PARAMETERS LIST

Example:
workflow_definition:
  version: 2
  blocks:
    - label: block_1
      block_type: navigation
      url: https://news.ycombinator.com/
      navigation_goal: "Give me top {{topics_count}} news items"
      next_block_label: null
  parameters:
    - key: topics_count
      description: null
      parameter_type: workflow
      workflow_parameter_type: integer
      default_value: "3"

** COMMON BLOCK FIELDS **

All blocks inherit these base fields:

block_type: <type>              # Required: Defines the block type
label: <unique_label>           # Required: Unique identifier for this block
next_block_label: <label|null>  # Required: Label of next block; use null only for terminal blocks
continue_on_failure: false      # Optional: Continue workflow if block fails
next_loop_on_failure: false     # Optional: Continue to next loop iteration on failure (for loop blocks only)
model: {}                       # Optional: Override model settings for this block

Important Rules:
- Labels must be unique within a workflow
- Labels cannot be empty or contain only whitespace
- next_block_label is required for all non-terminal blocks
- Use next_block_label for explicit flow control
- Set next_block_label to null to mark the end of a flow
- continue_on_failure allows graceful error handling

** CHOOSING A BLOCK (use the most specific block that fits the step) **

The interaction blocks differ by how specifically you describe the step — from a general goal down to a single named action:

- navigation — the general, goal-level block. State the outcome you want as a navigation_goal (e.g. "search for {{ query }} and open the first result") and the agent works out and performs the actions to get there. Use it when the step is best expressed as where you want to end up, stated as intent — never as a click-by-click script.
- action — one specific UI action you name directly: a single click, type, or select on the current page (e.g. "click the Add to Cart button"). If you can name the exact action, prefer action over navigation.
- login — authentication. Handles stored credentials and 2FA/TOTP. Whenever the step is logging in, this is the right block; do NOT hand-build a navigation login or avoid login to save steps. When the active block authoring policy makes the login block unavailable, bind the saved credential as a credential_id workflow parameter on the replacement block instead — never inline secret values.
- file_download — the step's purpose is to download a file (detects when the download completes).
- extraction — capturing structured data, validated against data_schema.

For non-interaction steps, use the purpose-built block directly: goto_url (load a known URL), wait, conditional, for_loop/while_loop, http_request, file_url_parser, file_upload, send_email, text_prompt.

Rule of thumb: if you can name the exact action or purpose, use the specific block; fall back to navigation only when the step is genuinely a goal the agent must work out itself. The specific block also carries the right semantics (login handles 2FA, file_download detects completion, extraction enforces a schema) and runs tighter — a loose navigation_goal re-runs the full browser agent (scrape -> plan -> act) for every step it takes, so it is the block most likely to exhaust the per-tool-call run budget on a heavy page.

goto_url vs reaching a page by action: a goto_url replaces a navigation step whose only job was to reach a page you can already name — but only for a stable, reusable URL. Never freeze a session-bound search-results, filter, or cart URL into a goto_url entrypoint; it can encode session-only state and will not replay. Reach those pages via the action that produces them.

** NAVIGATION BLOCK (navigation) **

Purpose: Take actions on a page to achieve a focused goal — fill a form, click through a multi-step flow, prepare the page before an extraction.

Structure:
block_type: navigation
label: <unique_label>
url: <starting_url>                            # Optional: URL to navigate to; omit to continue on current page
title: str                                     # Required: The title of the block
navigation_goal: <action_description>          # Required: What actions to perform
error_code_mapping: {}                         # Optional: Map errors to custom codes
max_retries: 0                                 # Optional: Number of retry attempts
max_steps_per_run: null                        # Optional: Limit steps per execution
parameter_keys: []                             # Optional: Parameters used in this block
complete_on_download: false                    # Optional: Complete when file downloads
download_suffix: null                          # Optional: Downloaded file name
totp_verification_url: null                    # Optional: TOTP verification URL
disable_cache: false                           # Optional: Disable caching
complete_criterion: null                       # Optional: Condition to mark complete
terminate_criterion: null                      # Optional: Condition to terminate
complete_verification: true                    # Optional: Verify completion
include_action_history_in_verification: false  # Optional: Include history in verification

Use Cases:
- Fill out forms on websites
- Navigate complex multi-step processes
- Prepare the page for a subsequent extraction block
- Execute focused browser tasks with clear completion criteria

Example:
workflow_definition:
  version: 2
  blocks:
    - block_type: navigation
      label: search_and_open
      next_block_label: null
      url: "https://example.com/search"
      navigation_goal: "Search for {{ query }} and click the first result"
      parameter_keys:
        - query
      max_retries: 2
  parameters:
    - parameter_type: workflow
      key: query
      workflow_parameter_type: string

** URL BLOCK (goto_url) **

Purpose: Navigate directly to a URL without any additional instructions.

Structure:
block_type: goto_url
label: <unique_label>
url: <target_url>                       # Required: URL to navigate to
error_code_mapping: {}                  # Optional: Custom error codes
max_retries: 0                          # Optional: Retry attempts
parameter_keys: []                      # Optional: Parameters used

Use Cases:
- Jump to a known page before other blocks
- Reset the browser state to a specific URL
- Split URL navigation from subsequent actions

Example:
blocks:
  - block_type: goto_url
    label: open_cart
    next_block_label: null
    url: "https://example.com/cart"

** ACTION BLOCK (action) **

Purpose: Perform a single focused action on the current page without data extraction.

Structure:
block_type: action
label: <unique_label>
navigation_goal: <action_description>  # Required: Single action to perform
url: <starting_url>                    # Optional: URL to start from
error_code_mapping: {}                 # Optional: Custom error codes
max_retries: 0                         # Optional: Retry attempts
parameter_keys: []                     # Optional: Parameters used
complete_on_download: false            # Optional: Complete on download
download_suffix: null                  # Optional: Download file name
totp_verification_url: null            # Optional: TOTP verification URL
totp_identifier: null                  # Optional: TOTP identifier
disable_cache: false                   # Optional: Disable cache

Use Cases:
- Click a specific button or link
- Fill a single field or selection
- Trigger a download with one action

Example:
blocks:
  - block_type: action
    label: accept_terms
    next_block_label: null
    url: "https://example.com/checkout"
    navigation_goal: "Check the terms checkbox"
    max_retries: 1

** TASK BLOCK (task) — NOT AVAILABLE IN WORKFLOW COPILOT **

DO NOT EMIT "task" blocks. They are not available in the workflow copilot and will be rejected at persistence. Use:
- "navigation" for page actions (filling forms, clicking, multi-step flows)
- "extraction" for data extraction (with data_extraction_goal + data_schema)
- "validation" for completion checks
- "login" for authentication
- "goto_url" for pure URL navigation
Legacy workflows that already contain "task" blocks continue to run outside the copilot; this ban applies only to copilot emission.

** TASK V2 BLOCK (task_v2) — DEPRECATED **

DO NOT USE task_v2. Use "navigation" blocks instead (with navigation_goal).
Use "extraction" blocks for data extraction (with data_extraction_goal + data_schema).
The task_v2 block type exists only for backward compatibility with existing workflows.

** FOR LOOP BLOCK (for_loop) **

Purpose: Iterate over a list of values and run a sequence of blocks for each item.

Structure:
block_type: for_loop
label: <unique_label>
loop_blocks: []                          # Required: Blocks to run for each iteration
loop_over_parameter_key: <param_key>     # Optional: Use ONLY for workflow parameters defined in parameters section
loop_variable_reference: <block_label>   # Optional: Use to reference output from a previous block
complete_if_empty: false                 # Optional: Complete successfully when list is empty

Important Notes:
- Provide either loop_over_parameter_key or loop_variable_reference
- Loop blocks must use next_block_label to chain within the loop
- Each iteration exposes {{ current_value }}, {{ current_item }}, and {{ current_index }}

CRITICAL DISTINCTION:
* loop_over_parameter_key: ONLY for workflow parameters defined in the parameters section
  - Use when looping over a static list or user-provided input
  - Must reference a key from the workflow parameters list
  - Example: loop_over_parameter_key: urls (where "urls" is in parameters)

* loop_variable_reference: For referencing output from a PREVIOUS BLOCK
  - Use when looping over data extracted or generated by a previous block
  - Reference the block's label directly (e.g., "extract_rows")
  - System automatically tries: block_label.extracted_information, block_label.results, etc.
  - Example: loop_variable_reference: extract_rows (where "extract_rows" is a previous extraction block)

Example 1 - Loop over workflow parameter:
parameters:
  - parameter_type: workflow
    key: urls
    workflow_parameter_type: json
    default_value:
      - "https://example.com/a"
      - "https://example.com/b"
blocks:
  - block_type: for_loop
    label: visit_urls
    next_block_label: null
    loop_over_parameter_key: urls  # References the workflow parameter "urls"
    loop_blocks:
      - block_type: goto_url
        label: open_url
        next_block_label: null
        url: "{{ current_value }}"

Example 2 - Loop over extraction block output:
blocks:
  - block_type: extraction
    label: extract_products
    next_block_label: process_each_product
    data_extraction_goal: "Extract all products from the table"
    data_schema:
      type: array
      items:
        type: object
        properties:
          name: {type: string}
          price: {type: number}
  - block_type: for_loop
    label: process_each_product
    next_block_label: null
    loop_variable_reference: extract_products  # References the extraction block's output
    loop_blocks:
      - block_type: navigation
        label: add_to_cart
        next_block_label: null
        navigation_goal: "Add {{ current_value.name }} to cart"

** WHILE LOOP BLOCK (while_loop) **

Purpose: Repeat a sequence of blocks while a runtime-evaluated condition stays true. Use for pagination, polling, or retry-until-success — any case where the iteration count depends on runtime state, not a known list.

Structure:
block_type: while_loop
label: <unique_label>
loop_blocks: []                          # Required: Blocks to run each iteration
condition:                               # Required: BranchCriteriaYAML
  criteria_type: jinja2_template         # Optional: inferred when omitted
  expression: <jinja_expression>         # Required: Must evaluate to truthy to continue
  description: <optional description>

Important Notes:
- The condition is evaluated at the TOP of each iteration (including the first). If it is false up front, loop_blocks never run and the block succeeds with empty output.
- Termination: condition becomes falsy → success; reached the 100-iteration safety cap → failure with a clear failure_reason.
- Outputs from blocks within an iteration are accessible to later blocks in the same iteration via templating; outputs from prior iterations are accessible the same way for_loop accumulates them.
- The condition has access to {{ current_index }} — the 0-indexed iteration counter (same name and meaning as for_loop's). Use it to bootstrap iteration 0: e.g. "{{ current_index == 0 or <body_output_ref> }}". The OR short-circuits on the first check before the body output is consulted.
- Strict templating caveat: that bootstrap pattern relies on default lax templating. With WORKFLOW_TEMPLATING_STRICTNESS=strict, missing-variable validation runs before Jinja short-circuiting, so seed any referenced prior-output variable before the loop or use a condition based only on variables already present.
- No automatic per-iteration value variables. Unlike for_loop, there is NO {{ current_value }} / {{ current_item }} for while_loop — only {{ current_index }}.
- criteria_type "prompt" is not yet supported in while_loop; use "jinja2_template" only. Supplying "prompt" fails fast with NotImplementedError.

CRITICAL DISTINCTION (vs for_loop):
* for_loop: known list to iterate over. Use when you have a parameter or extraction output containing all items up front.
* while_loop: iteration count is unknown until runtime. Use for pagination ("keep going while a Next button exists"), polling ("keep checking until status is ready"), or retry-until-success patterns.

Example 1 - Paginate until no Next button:
blocks:
  - block_type: while_loop
    label: paginate_results
    next_block_label: null
    condition:
      criteria_type: jinja2_template
      # With default lax templating, current_index == 0 carries the first iteration
      # before extract_page exists. From iteration 1 onward, the previous iteration's
      # extracted has_next_page flag drives the loop.
      expression: "{{ current_index == 0 or extract_page.has_next_page }}"
    loop_blocks:
      - block_type: extraction
        label: extract_page
        next_block_label: click_next
        data_extraction_goal: "Extract all rows on the current page and whether a 'Next' button is enabled"
        data_schema:
          type: object
          properties:
            rows: {type: array, items: {type: object}}
            has_next_page: {type: boolean}
      - block_type: action
        label: click_next
        next_block_label: null
        navigation_goal: "Click the 'Next' button to advance to the next page"

Example 2 - Retry-until-success bounded by an attempt counter:
parameters:
  - parameter_type: workflow
    key: max_attempts
    workflow_parameter_type: integer
    default_value: 5
blocks:
  - block_type: while_loop
    label: retry_until_done
    next_block_label: null
    condition:
      criteria_type: jinja2_template
      # With default lax templating, current_index == 0 carries the first attempt
      # before check_status exists. From attempt 1 onward, the previous attempt's
      # is_ready flag controls termination.
      expression: "{{ current_index < max_attempts and (current_index == 0 or not check_status.is_ready) }}"
    loop_blocks:
      - block_type: extraction
        label: check_status
        next_block_label: null
        data_extraction_goal: "Determine whether the result is ready"
        data_schema:
          type: object
          properties:
            is_ready: {type: boolean}

** CONDITIONAL BLOCK (conditional) **

Purpose: Branch to different blocks based on ordered conditions.

Structure:
block_type: conditional
label: <unique_label>
branch_conditions: []                    # Required: Ordered list of branch conditions

Branch Condition Structure:
criteria:                               # Optional for default branch
  criteria_type: <jinja2_template|prompt>  # Optional: inferred when omitted
  expression: <expression>                # Required when criteria present
  description: <optional description>
next_block_label: <label|null>           # Optional: Label to execute when matched
description: <optional description>
is_default: false                        # Required when criteria is omitted

Important Notes:
- At least one branch is required
- Branches are evaluated in order; first match wins
- Only one default branch is allowed
- Branches without criteria must set is_default: true

Example:
blocks:
  - block_type: conditional
    label: route_by_status
    next_block_label: null
    branch_conditions:
      - criteria:
          criteria_type: jinja2_template
          expression: "{{ account_status == 'active' }}"
        next_block_label: handle_active
        description: "Active accounts"
        is_default: false
      - is_default: true
        next_block_label: handle_inactive
        description: "Fallback for all other states"

** LOGIN BLOCK (login) **

Purpose: Handle authentication flows including username/password and TOTP/2FA.

Structure:
block_type: login
label: <unique_label>
url: <login_page_url>           # Optional: Login page URL
title: str                      # Required: The title of the block
navigation_goal: null           # Optional: Additional navigation after login
error_code_mapping: {}          # Optional: Custom error codes
max_retries: 0                  # Optional: Retry attempts
max_steps_per_run: null         # Optional: Step limit
parameter_keys: []              # Required: Should include credential parameters
complete_criterion: null        # Optional: Completion condition
terminate_criterion: null       # Optional: Termination condition
complete_verification: true     # Optional: Verify successful login

Use Cases:
- Login to websites with username/password
- Handle 2FA/TOTP authentication
- Manage credential-protected workflows
- Session initialization

Important Notes:
- Credentials should be stored as parameters (credential, bitwarden_login_credential, etc.)
- TOTP is automatically handled if the credential parameter has TOTP configured

Example:
workflow_definition:
  version: 2
  blocks:
    - block_type: login
      label: login_to_portal
      next_block_label: null
      url: "https://portal.example.com/login"
      parameter_keys:
        - my_credentials # This must match a 'key' from the parameters list above
      complete_criterion: "Current URL is 'https://portal.example.com/dashboard'"
      max_retries: 2
  parameters:
    - parameter_type: workflow
      workflow_parameter_type: credential_id
      key: my_credentials
      default_value: "cred_uuid_here"

** VALIDATION BLOCK (validation) **

Purpose: Validate workflow state and decide whether to continue or terminate.

Structure:
block_type: validation
label: <unique_label>
complete_criterion: <success_condition>        # Optional: Condition for success
terminate_criterion: <termination_condition>   # Optional: Condition to stop workflow
error_code_mapping: {}                         # Optional: Map errors to custom codes
parameter_keys: []                             # Optional: Parameters used in this block
disable_cache: false                           # Optional: Disable caching

Use Cases:
- Confirm a successful navigation or submission
- Stop the workflow when an error state appears
- Validate content on the page before extracting data

Example:
blocks:
  - block_type: validation
    label: verify_submission
    next_block_label: null
    complete_criterion: "Page contains 'Thank you for your submission'"
    terminate_criterion: "Page contains 'Error' or 'Try again'"

** WAIT BLOCK (wait) **

Purpose: Pause workflow execution for a specified duration.

Structure:
block_type: wait
label: <unique_label>
wait_sec: <seconds>                     # Required: Number of seconds to wait

Example:
blocks:
  - block_type: wait
    label: wait_for_processing
    next_block_label: check_results
    wait_sec: 30

** EXTRACTION BLOCK (extraction) **

Purpose: Extract structured data from the current page without navigation.

Structure:
block_type: extraction
label: <unique_label>
title: str                               # Required: The title of the block
data_extraction_goal: <what_to_extract>  # Required: Description of data to extract
data_schema: <json_schema>               # Optional: Structure of extracted data
url: <page_url>                          # Optional: URL to navigate to first
max_retries: 0                           # Optional: Retry attempts
max_steps_per_run: null                  # Optional: Step limit
parameter_keys: []                       # Optional: Parameters used
disable_cache: false                     # Optional: Disable cache
Use Cases:
- Extract structured data after other blocks
- Parse tables, lists, or forms
- Collect multiple data points from a page
- Data mining from web pages

Data Schema Formats:
* JSON Schema object:
   data_schema:
     type: object
     properties:
       field1: {type: string}
       field2: {type: number}

* JSON Schema array:
   data_schema:
     type: array
     items:
       type: object
       properties:
         name: {type: string}

* String format (for simple extractions):
   data_schema: "csv_string"

Example:
blocks:
  - block_type: extraction
    label: extract_product_list
    next_block_label: null
    data_extraction_goal: "Extract all products with their names, prices, and stock status"
    data_schema:
      type: array
      items:
        type: object
        properties:
          product_name: {type: string}
          price: {type: number}
          in_stock: {type: boolean}
          rating: {type: number}
    max_retries: 1

** FILE DOWNLOAD BLOCK (file_download) **

Purpose: Download files from a website. This is the "File Download Block" in the UI.

Structure:
block_type: file_download
label: <unique_label>
navigation_goal: <download_instruction>        # Required: How to trigger the download
url: <starting_url>                            # Optional: URL to navigate to first
title: str                                     # Optional: Title for the block
engine: skyvern-1.0                            # Optional: Run engine
error_code_mapping: {}                         # Optional: Map errors to custom codes
max_retries: 0                                 # Optional: Number of retry attempts
max_steps_per_run: null                        # Optional: Limit steps per execution
parameter_keys: []                             # Optional: Parameters used in this block
download_suffix: null                          # Optional: Full filename for the download
totp_verification_url: null                    # Optional: TOTP verification URL
totp_identifier: null                          # Optional: TOTP identifier
disable_cache: false                           # Optional: Disable caching
download_timeout: null                         # Optional: Download timeout in seconds

Use Cases:
- Download invoices, receipts, or reports from a portal
- Export data as CSV or PDF from a dashboard
- Trigger file downloads that require navigation steps

Example:
blocks:
  - block_type: file_download
    label: download_report
    next_block_label: null
    url: "https://portal.example.com/reports"
    navigation_goal: "Open the latest report and click Download as PDF"
    download_suffix: "latest_report.pdf"

** CLOUD STORAGE BLOCK (file_upload) **

Purpose: Upload files to storage. This is the "Cloud Storage Block" in the UI.

Structure:
block_type: file_upload
label: <unique_label>
storage_type: <s3|azure>                       # Optional: Storage backend (default: s3)
s3_bucket: <bucket_name>                       # Optional: S3 bucket name
aws_access_key_id: <access_key_id>             # Optional: AWS access key id
aws_secret_access_key: <secret_access_key>     # Optional: AWS secret access key
region_name: <aws_region>                      # Optional: AWS region
azure_storage_account_name: <account_name>     # Optional: Azure storage account
azure_storage_account_key: <account_key>       # Optional: Azure storage account key
azure_blob_container_name: <container_name>    # Optional: Azure blob container
azure_folder_path: <folder_path>               # Optional: Azure folder path
path: <local_or_workspace_path>                # Optional: File path to upload

Use Cases:
- Upload downloaded artifacts to S3
- Publish files to Azure Blob Storage
- Persist workflow outputs to storage

Example:
blocks:
  - block_type: file_upload
    label: upload_report
    next_block_label: null
    storage_type: s3
    s3_bucket: "my-reports"
    region_name: "us-west-2"
    path: "/tmp/latest_report.pdf"

** FILE PARSER BLOCK (file_url_parser) **

Purpose: Parse PDFs, CSVs, Excel files, images, and DOCX files. This is the "File Parser Block" in the UI.

Structure:
block_type: file_url_parser
label: <unique_label>
file_url: <https_url_or_file_url>             # Required: URL to the file
file_type: <auto_detect|csv|excel|pdf|image|docx>  # Optional: defaults to auto_detect (infer from URL/content)
json_schema: <json_schema>                     # Optional: Structure of parsed output

Use Cases:
- Parse a PDF invoice into structured fields
- Extract rows from a CSV file
- Read Excel sheets for downstream processing

Example:
blocks:
  - block_type: file_url_parser
    label: parse_invoice
    next_block_label: null
    file_url: "https://example.com/invoice.pdf"
    file_type: pdf
    json_schema:
      type: object
      properties:
        invoice_id: {type: string}
        total: {type: number}

** SEND EMAIL BLOCK (send_email) **

Purpose: Send email notifications. This is the "Send Email Block" in the UI.

Structure:
block_type: send_email
label: <unique_label>
smtp_host_secret_parameter_key: <param_key>     # Required: Secret parameter key for SMTP host
smtp_port_secret_parameter_key: <param_key>     # Required: Secret parameter key for SMTP port
smtp_username_secret_parameter_key: <param_key> # Required: Secret parameter key for SMTP username
smtp_password_secret_parameter_key: <param_key> # Required: Secret parameter key for SMTP password
sender: <email_address>                         # Required: Sender email address
recipients: [<email_address>]                   # Required: Recipient list
subject: <subject_line>                         # Required: Email subject
body: <email_body>                              # Required: Email body
file_attachments: [<file_path>]                 # Optional: Local file paths to attach

Use Cases:
- Notify a team when a workflow completes
- Send extracted data summaries to stakeholders
- Email reports or attachments

Example:
blocks:
  - block_type: send_email
    label: notify_ops
    next_block_label: null
    smtp_host_secret_parameter_key: smtp_host
    smtp_port_secret_parameter_key: smtp_port
    smtp_username_secret_parameter_key: smtp_user
    smtp_password_secret_parameter_key: smtp_pass
    sender: "automation@example.com"
    recipients: ["ops@example.com"]
    subject: "Daily report ready"
    body: "The latest report is ready for review."
    file_attachments: ["/tmp/latest_report.pdf"]

** TEXT PROMPT BLOCK (text_prompt) **

Purpose: Process text with LLM. This is the "Text Prompt Block" in the UI.

Structure:
block_type: text_prompt
label: <unique_label>
llm_key: <llm_key>                             # Optional: Model key override
prompt: <text_prompt>                          # Required: Prompt to run
parameter_keys: []                             # Optional: Parameters used in this block
json_schema: <json_schema>                     # Optional: Structured output schema

Referencing output from downstream blocks: fields live on the block directly, NOT under `.output`. Write `{{ summarize_notes.summary }}`, not `{{ summarize_notes.output.summary }}`. See PARAMETER TEMPLATING for the full per-block-type rule and the null-guard requirement.

Use Cases:
- Summarize extracted text or documents
- Normalize free-form content into a schema
- Generate classifications or tags

Example:
workflow_definition:
  version: 2
  blocks:
    - block_type: text_prompt
      label: summarize_notes
      next_block_label: null
      prompt: "Summarize these notes: {{ notes }}"
      json_schema:
        type: object
        properties:
          summary: {type: string}
          action_items: {type: array, items: {type: string}}
      parameter_keys: [notes]
  parameters:
    - parameter_type: workflow
      key: notes
      workflow_parameter_type: string

** HTTP REQUEST BLOCK (http_request) **

Purpose: Make HTTP API calls. This is the "HTTP Request Block" in the UI.

Structure:
block_type: http_request
label: <unique_label>
method: <GET|POST|PUT|PATCH|DELETE>            # Optional: HTTP method (default: GET)
url: <https_url>                               # Optional: Target URL
headers: {}                                    # Optional: HTTP headers
body: {}                                       # Optional: JSON body (MUST be a dict, not a string)
files: {}                                      # Optional: Multipart files mapping
timeout: 30                                    # Optional: Timeout in seconds
follow_redirects: true                         # Optional: Follow redirects
parameter_keys: []                             # Optional: Workflow parameters used (block outputs don't need to be listed)

Use Cases:
- Call third-party APIs for enrichment
- Post data to internal services
- Upload files via multipart requests
- Send results from previous blocks to webhooks

Example 1 - Using workflow parameters:
workflow_definition:
  version: 2
  blocks:
    - block_type: http_request
      label: lookup_customer
      next_block_label: null
      method: "POST"
      url: "https://api.example.com/customers/search"
      headers:
        Authorization: "Bearer {{ api_token }}"
      body:
        email: "{{ customer_email }}"
      parameter_keys: [api_token, customer_email]
  parameters:
    - parameter_type: workflow
      key: api_token
      workflow_parameter_type: string
    - parameter_type: workflow
      key: customer_email
      workflow_parameter_type: string

Example 2 - Sending block output to webhook (no parameter_keys needed):
workflow_definition:
  version: 2
  parameters: []
  blocks:
    - block_type: navigation
      label: get_data
      next_block_label: send_webhook
      navigation_goal: "Get top 3 hacker news items"
      url: "https://news.ycombinator.com"
    - block_type: http_request
      label: send_webhook
      next_block_label: null
      method: POST
      url: "http://example.com/webhook"
      body:
        data: "{{ get_data.output }}"
      parameter_keys: []

** PARAMETER TEMPLATING **

All string fields in blocks support Jinja2 templating to reference parameters and block outputs.

There are TWO types of references:

1. WORKFLOW PARAMETERS - Reference input parameters defined in the parameters section
   Syntax: {{ param_key }}
   Requires: parameter_keys list must include the param_key

2. BLOCK OUTPUTS - Reference output from a previous block by its label
   Syntax depends on the upstream block type; see below.
   Requires: NOTHING - block outputs are automatically available, no parameter_keys needed

IMPORTANT — block output shapes are NOT uniform across block types. Using the wrong shape either raises "dict object has no attribute 'output'" at render time or (worse) silently renders a None value into a URL field, producing `https://None` which fails with `net::ERR_NAME_NOT_RESOLVED`.

* Browser-task blocks (navigation, extraction, goto_url, login, action, file_download):
  - `{{ label.output }}` — the full TaskOutput envelope.
  - `{{ label.output.extracted_information }}` — the extracted payload.
  - Example: `{{ extract_items.output }}` inside a downstream prompt.

* Non-task blocks (text_prompt, http_request, file_url_parser):
  - `{{ label.<field> }}` — fields from the returned JSON are on the block directly; there is NO `.output` wrapper.
  - Example: `{{ summarize_notes.summary }}` (NOT `{{ summarize_notes.output.summary }}`).
  - Writing `{{ summarize_notes.output.summary }}` raises "dict object has no attribute 'output'" because `summarize_notes` IS the dict.

NULL-GUARD RULE — when a downstream field may legitimately be null/empty (e.g., a text_prompt returned `{"store_url": null}` because no such link exists on the page), wrap the reference in a `conditional` block BEFORE using it as a URL. Unguarded `goto_url` with `url: "{{ x.field }}"` where `x.field` is None renders as `https://None` and is not recoverable by retrying the same workflow — it is a nullity bug, not a navigation bug.

Examples - Workflow Parameters:

* In URL:
url: "https://example.com/search?q={{ search_term }}"
parameter_keys: [search_term]

* In goals:
navigation_goal: "Search for {{ product_name }} and filter by {{ category }}"
parameter_keys: [product_name, category]

Examples - Block Outputs (no parameter_keys needed):

* Send previous task-block output to webhook (task block → use .output):
blocks:
  - block_type: navigation
    label: get_data
    navigation_goal: "Get top 3 hacker news items"
    ...
  - block_type: http_request
    label: send_webhook
    method: POST
    url: "http://example.com/webhook"
    body:
      data: "{{ get_data.output }}"
    parameter_keys: []  # Empty - block outputs don't need to be declared

* Use extraction output in next block (task block → use .output):
blocks:
  - block_type: extraction
    label: extract_items
    ...
  - block_type: navigation
    label: process_items
    navigation_goal: "Process these items: {{ extract_items.output }}"

* Use text_prompt output in next block (non-task block → reference fields directly, NO .output):
blocks:
  - block_type: text_prompt
    label: summarize_notes
    prompt: "Summarize ..."
    json_schema: {type: object, properties: {summary: {type: string}}}
  - block_type: navigation
    label: use_summary
    navigation_goal: "Act on: {{ summarize_notes.summary }}"    # NOT {{ summarize_notes.output.summary }}

* In schemas (as descriptions):
data_schema:
  type: object
  properties:
    query_result:
      type: string
      description: "Result for query: {{ query }}"

** ERROR HANDLING AND RETRIES **

Error Code Mapping:
Map internal errors to custom error codes for easier handling:

error_code_mapping:
  "ElementNotFound": "ELEMENT_MISSING"
  "TimeoutError": "PAGE_TIMEOUT"
  "NavigationFailed": "NAV_ERROR"

Retry Configuration:
max_retries: 3  # Block will retry up to 3 times on failure

Conditional Continuation:
continue_on_failure: true  # Workflow continues even if block fails

Loop Continuation:
next_loop_on_failure: true  # Skip to next iteration in loops

Completion Criteria:
complete_criterion: "URL contains '/success'"  # Condition for success
terminate_criterion: "Element with text 'Error' exists"  # Condition to stop

** WORKFLOW EXECUTION FLOW **

Sequential Execution:
blocks:
  - block_type: goto_url
    label: step1
    next_block_label: step2
    url: "https://example.com/start"
  - block_type: extraction
    label: step2
    next_block_label: step3
  - block_type: navigation
    label: step3
    next_block_label: null
    navigation_goal: "Complete the final step on the page"
# Executes: step1 → step2 → step3

Explicit Flow Control (Skip blocks):
blocks:
  - block_type: goto_url
    label: login
    next_block_label: extract_data
    url: "https://app.example.com/login"
  - block_type: navigation
    label: handle_error
    next_block_label: null
    navigation_goal: "Handle the error state if it appears"
  - block_type: extraction
    label: extract_data
    next_block_label: null
# Executes: login → extract_data (skips handle_error)

Error Recovery Flow:
blocks:
  - block_type: navigation
    label: primary_task
    next_block_label: verify_result
    continue_on_failure: true
    navigation_goal: "Attempt the primary task on the page"
  - block_type: validation
    label: verify_result
    next_block_label: null

** BEST PRACTICES **

* Naming Conventions:
   - Use descriptive labels: "login_to_portal" not "step1"
   - Use snake_case for labels and parameter keys
   - Use snake_case for data schema field names (e.g., product_name, total_price)
   - Make labels unique and meaningful

* Goal Writing:
   - Be specific: "Click the blue 'Submit' button" vs "Submit the form"
   - Include context: "After clicking Search, wait for results to load"
   - Natural language: Write as you would instruct a human

* Parameter Usage:
   - Always list parameter_keys when using parameters in a block
   - Validate parameter types match usage
   - Provide default values for optional parameters

* Error Handling:
   - Set appropriate max_retries for flaky operations
   - Use complete_criterion for validation
   - Map errors to meaningful codes for debugging

* Data Extraction:
   - Always provide data_schema for structured extraction
   - Use specific extraction goals
   - Handle arrays vs objects appropriately

* Performance:
   - Use disable_cache: true for dynamic content
   - Set max_steps_per_run to prevent infinite loops
   - Use navigation blocks for actions and extraction blocks for data extraction

* Security:
   - Never hardcode credentials in workflows
   - Use credential parameters for sensitive data
   - Use AWS secrets or vault integrations

** COMMON PATTERNS **

Pattern 1: Login → Navigate → Extract
workflow_definition:
  version: 2
  blocks:
    - block_type: login
      label: authenticate
      next_block_label: go_to_reports
      url: "https://app.example.com/login"
      parameter_keys: [my_credentials]
    - block_type: navigation
      label: go_to_reports
      next_block_label: get_report_data
      navigation_goal: "Navigate to Reports section"
    - block_type: extraction
      label: get_report_data
      next_block_label: null
      data_extraction_goal: "Extract all report entries"
      data_schema:
        type: array
        items: {type: object}
  parameters:
    - parameter_type: workflow
      workflow_parameter_type: credential_id
      key: my_credentials
      default_value: "uuid"
    - parameter_type: output
      key: extracted_data

Pattern 2: Search with Dynamic Input
workflow_definition:
  version: 2
  blocks:
    - block_type: navigation
      label: search_and_extract
      next_block_label: null
      url: "https://example.com"
      navigation_goal: "Search for '{{ search_query }}' and extract the first 10 results with titles and URLs"
  parameters:
    - parameter_type: workflow
      key: search_query
      workflow_parameter_type: string

Pattern 3: Multi-Step Form Filling
workflow_definition:
  version: 2
  blocks:
    - block_type: goto_url
      label: open_form
      next_block_label: fill_personal_info
      url: "https://forms.example.com/application"
    - block_type: navigation
      label: fill_personal_info
      next_block_label: fill_address
      navigation_goal: "Fill in name as {{ name }}, email as {{ email }}"
      parameter_keys: [name, email]
    - block_type: navigation
      label: fill_address
      next_block_label: submit
      navigation_goal: "Fill in address fields and click Continue"
      parameter_keys: [address, city, zip]
    - block_type: navigation
      label: submit
      next_block_label: null
      navigation_goal: "Review information and click Submit"
  parameters:
    - parameter_type: workflow
      key: name
      workflow_parameter_type: string
    - parameter_type: workflow
      key: email
      workflow_parameter_type: string
    - parameter_type: workflow
      key: address
      workflow_parameter_type: string
    - parameter_type: workflow
      key: city
      workflow_parameter_type: string
    - parameter_type: workflow
      key: zip
      workflow_parameter_type: string

Pattern 4: Conditional Extraction
workflow_definition:
  version: 2
  blocks:
    - block_type: navigation
      label: search_product
      next_block_label: check_availability
      navigation_goal: "Search for {{ product }}"
    - block_type: extraction
      label: check_availability
      next_block_label: add_to_cart
      data_extraction_goal: "Check if product is in stock"
      data_schema:
        type: object
        properties:
          in_stock: {type: boolean}
    - block_type: navigation
      label: add_to_cart
      next_block_label: null
      navigation_goal: "If product is in stock, add to cart"
  parameters: []

** VALIDATION RULES **

Workflow-Level:
- All block labels must be unique
- Parameters referenced in blocks must be defined
- next_block_label must point to existing block labels or be null
- The last block in execution flow should have next_block_label: null

Block-Level:
- label is required and cannot be empty
- block_type must be a valid type
- For navigation blocks: navigation_goal is required
- For extraction blocks: data_extraction_goal is required
- For action blocks: navigation_goal is required
- For login blocks: parameter_keys should include credentials

Parameter-Level:
- key must be unique across parameters
- key cannot contain whitespace
- parameter_type must be valid
- Referenced keys (like source_parameter_key) must exist

** COMPLETE WORKFLOW EXAMPLE **

title: E-commerce Product Search and Purchase
description: Search for a product, extract details, and add to cart
workflow_definition:
  version: 2
  blocks:
    - block_type: login
      label: login_to_store
      next_block_label: search_and_filter
      url: "https://shop.example.com/login"
      parameter_keys:
        - account_creds
      complete_criterion: "URL contains '/dashboard'"

    - block_type: navigation
      label: search_and_filter
      next_block_label: get_product_info
      url: "https://shop.example.com/search"
      navigation_goal: "Search for {{ product_name }} and filter results by price under ${{ max_price }}"
      parameter_keys:
        - product_name
        - max_price
      max_retries: 2

    - block_type: extraction
      label: get_product_info
      next_block_label: add_to_cart
      data_extraction_goal: "Extract product name, price, rating, and availability"
      data_schema:
        type: object
        properties:
          name: {type: string}
          price: {type: number}
          rating: {type: number}
          available: {type: boolean}

    - block_type: navigation
      label: add_to_cart
      next_block_label: null
      navigation_goal: "Click on the first available product and add it to cart"
      max_retries: 3
  parameters:
    - parameter_type: workflow
      key: product_name
      workflow_parameter_type: string
      description: "Product to search for"
    - parameter_type: workflow
      key: max_price
      workflow_parameter_type: float
      description: "Maximum price willing to pay"
    - parameter_type: workflow
      workflow_parameter_type: credential_id
      key: account_creds
      default_value: "cred_12345"
    - parameter_type: output
      key: product_details
      description: "Extracted product information"

END OF KNOWLEDGE BASE
