SKYVERN WORKFLOW YAML KNOWLEDGE BASE

This document provides comprehensive information about Skyvern Workflow YAML structure and blocks. Use this to understand how to construct, modify, and validate workflow definitions.

** WORKFLOW STRUCTURE OVERVIEW **

A Skyvern workflow is defined in YAML format with the following top-level structure
for a workflow definition (embedded under workflow_definition in full specs):

title: "<workflow title>" # Required
description: "<optional description>"
workflow_definition:
  version: 2 # IMPORTANT: Always use version 2
  blocks: []
  parameters: []
webhook_callback_url: "<optional_https_url>" # Optional: Webhook URL to receive workflow run updates

Key Concepts:
- Workflows consist of sequential or conditional blocks that represent specific tasks
- Each block has a unique label for identification and navigation
- Blocks can reference workflow parameters using Jinja2 templating
- Block execution is defined by next_block_label on every non-terminal block

** WORKFLOW PARAMETERS **

Parameters provide input values and credentials to workflows. They are defined in the "parameters" list.

Common Parameter Types:

* WORKFLOW PARAMETERS (user inputs)
parameter_type: workflow
key: <unique_key>
workflow_parameter_type: <string|integer|float|boolean|json|file_url|credential_id>
description: <optional description>
default_value: <optional default>

Define exactly one parameters list under workflow_definition.

Example:
workflow_definition:
  version: 2
  blocks: []
  parameters:
    - parameter_type: workflow
      key: search_query
      workflow_parameter_type: string
      description: "Search term to use"
      default_value: "example"

* OUTPUT PARAMETERS (block outputs)
parameter_type: output
key: <unique_key>
description: <optional description>

* CREDENTIAL PARAMETERS
parameter_type: workflow
workflow_parameter_type: credential_id
key: <unique_key>
default_value: <credential_id>

Using Parameters in Blocks:
- Reference using Jinja2: {{ param_key }}
- List parameter_keys in blocks that use them
- Parameters are resolved before block execution. ALL PARAMETER KEYS REFERENCED IN BLOCKS MUST FIRST BE DEFINED IN THE WORKFLOW PARAMETERS LIST

Example:
workflow_definition:
  version: 2
  blocks:
    - label: block_1
      block_type: navigation
      url: https://news.ycombinator.com/
      navigation_goal: "Give me top {{topics_count}} news items"
      next_block_label: null
  parameters:
    - key: topics_count
      description: null
      parameter_type: workflow
      workflow_parameter_type: integer
      default_value: "3"

** COMMON BLOCK FIELDS **

All blocks inherit these base fields:

block_type: <type>              # Required: Defines the block type
label: <unique_label>           # Required: Unique identifier for this block
next_block_label: <label|null>  # Required: Label of next block; use null only for terminal blocks
continue_on_failure: false      # Optional: Continue workflow if block fails
next_loop_on_failure: false     # Optional: Continue to next loop iteration on failure (for loop blocks only)
model: {}                       # Optional: Override model settings for this block

Important Rules:
- Labels must be unique within a workflow
- Labels cannot be empty or contain only whitespace
- next_block_label is required for all non-terminal blocks
- Use next_block_label for explicit flow control
- Set next_block_label to null to mark the end of a flow
- continue_on_failure allows graceful error handling

** NAVIGATION BLOCK (navigation) **

Purpose: Take actions on a page to achieve a focused goal — fill a form, click through a multi-step flow, prepare the page before an extraction.

Structure:
block_type: navigation
label: <unique_label>
url: <starting_url>                            # Optional: URL to navigate to; omit to continue on current page
title: str                                     # Required: The title of the block
navigation_goal: <action_description>          # Required: What actions to perform
error_code_mapping: {}                         # Optional: Map errors to custom codes
max_retries: 0                                 # Optional: Number of retry attempts
max_steps_per_run: null                        # Optional: Limit steps per execution
parameter_keys: []                             # Optional: Parameters used in this block
complete_on_download: false                    # Optional: Complete when file downloads
download_suffix: null                          # Optional: Downloaded file name
totp_verification_url: null                    # Optional: TOTP verification URL
disable_cache: false                           # Optional: Disable caching
complete_criterion: null                       # Optional: Condition to mark complete
terminate_criterion: null                      # Optional: Condition to terminate
complete_verification: true                    # Optional: Verify completion
include_action_history_in_verification: false  # Optional: Include history in verification

Use Cases:
- Fill out forms on websites
- Navigate complex multi-step processes
- Prepare the page for a subsequent extraction block
- Execute focused browser tasks with clear completion criteria

Example:
workflow_definition:
  version: 2
  blocks:
    - block_type: navigation
      label: search_and_open
      next_block_label: null
      url: "https://example.com/search"
      navigation_goal: "Search for {{ query }} and click the first result"
      parameter_keys:
        - query
      max_retries: 2
  parameters:
    - parameter_type: workflow
      key: query
      workflow_parameter_type: string

** URL BLOCK (goto_url) **

Purpose: Navigate directly to a URL without any additional instructions.

Structure:
block_type: goto_url
label: <unique_label>
url: <target_url>                       # Required: URL to navigate to
error_code_mapping: {}                  # Optional: Custom error codes
max_retries: 0                          # Optional: Retry attempts
parameter_keys: []                      # Optional: Parameters used

Use Cases:
- Jump to a known page before other blocks
- Reset the browser state to a specific URL
- Split URL navigation from subsequent actions

Example:
blocks:
  - block_type: goto_url
    label: open_cart
    next_block_label: null
    url: "https://example.com/cart"

** ACTION BLOCK (action) **

Purpose: Perform a single focused action on the current page without data extraction.

Structure:
block_type: action
label: <unique_label>
navigation_goal: <action_description>  # Required: Single action to perform
url: <starting_url>                    # Optional: URL to start from
error_code_mapping: {}                 # Optional: Custom error codes
max_retries: 0                         # Optional: Retry attempts
parameter_keys: []                     # Optional: Parameters used
complete_on_download: false            # Optional: Complete on download
download_suffix: null                  # Optional: Download file name
totp_verification_url: null            # Optional: TOTP verification URL
totp_identifier: null                  # Optional: TOTP identifier
disable_cache: false                   # Optional: Disable cache

Use Cases:
- Click a specific button or link
- Fill a single field or selection
- Trigger a download with one action

Example:
blocks:
  - block_type: action
    label: accept_terms
    next_block_label: null
    url: "https://example.com/checkout"
    navigation_goal: "Check the terms checkbox"
    max_retries: 1

** TASK BLOCK (task) — NOT AVAILABLE IN WORKFLOW COPILOT **

DO NOT EMIT "task" blocks. They are not available in the workflow copilot and will be rejected at persistence. Use:
- "navigation" for page actions (filling forms, clicking, multi-step flows)
- "extraction" for data extraction (with data_extraction_goal + data_schema)
- "validation" for completion checks
- "login" for authentication
- "goto_url" for pure URL navigation
Legacy workflows that already contain "task" blocks continue to run outside the copilot; this ban applies only to copilot emission.

** TASK V2 BLOCK (task_v2) — DEPRECATED **

DO NOT USE task_v2. Use "navigation" blocks instead (with navigation_goal).
Use "extraction" blocks for data extraction (with data_extraction_goal + data_schema).
The task_v2 block type exists only for backward compatibility with existing workflows.

** FOR LOOP BLOCK (for_loop) **

Purpose: Iterate over a list of values and run a sequence of blocks for each item.

Structure:
block_type: for_loop
label: <unique_label>
loop_blocks: []                          # Required: Blocks to run for each iteration
loop_over_parameter_key: <param_key>     # Optional: Use ONLY for workflow parameters defined in parameters section
loop_variable_reference: <block_label>   # Optional: Use to reference output from a previous block
complete_if_empty: false                 # Optional: Complete successfully when list is empty

Important Notes:
- Provide either loop_over_parameter_key or loop_variable_reference
- Loop blocks must use next_block_label to chain within the loop
- Each iteration exposes {{ current_value }}, {{ current_item }}, and {{ current_index }}

CRITICAL DISTINCTION:
* loop_over_parameter_key: ONLY for workflow parameters defined in the parameters section
  - Use when looping over a static list or user-provided input
  - Must reference a key from the workflow parameters list
  - Example: loop_over_parameter_key: urls (where "urls" is in parameters)

* loop_variable_reference: For referencing output from a PREVIOUS BLOCK
  - Use when looping over data extracted or generated by a previous block
  - Reference the block's label directly (e.g., "extract_rows")
  - System automatically tries: block_label.extracted_information, block_label.results, etc.
  - Example: loop_variable_reference: extract_rows (where "extract_rows" is a previous extraction block)

Example 1 - Loop over workflow parameter:
parameters:
  - parameter_type: workflow
    key: urls
    workflow_parameter_type: json
    default_value:
      - "https://example.com/a"
      - "https://example.com/b"
blocks:
  - block_type: for_loop
    label: visit_urls
    next_block_label: null
    loop_over_parameter_key: urls  # References the workflow parameter "urls"
    loop_blocks:
      - block_type: goto_url
        label: open_url
        next_block_label: null
        url: "{{ current_value }}"

Example 2 - Loop over extraction block output:
blocks:
  - block_type: extraction
    label: extract_products
    next_block_label: process_each_product
    data_extraction_goal: "Extract all products from the table"
    data_schema:
      type: array
      items:
        type: object
        properties:
          name: {type: string}
          price: {type: number}
  - block_type: for_loop
    label: process_each_product
    next_block_label: null
    loop_variable_reference: extract_products  # References the extraction block's output
    loop_blocks:
      - block_type: navigation
        label: add_to_cart
        next_block_label: null
        navigation_goal: "Add {{ current_value.name }} to cart"

** WHILE LOOP BLOCK (while_loop) **

Purpose: Repeat a sequence of blocks while a runtime-evaluated condition stays true. Use for pagination, polling, or retry-until-success — any case where the iteration count depends on runtime state, not a known list.

Structure:
block_type: while_loop
label: <unique_label>
loop_blocks: []                          # Required: Blocks to run each iteration
condition:                               # Required: BranchCriteriaYAML
  criteria_type: jinja2_template         # Optional: inferred when omitted
  expression: <jinja_expression>         # Required: Must evaluate to truthy to continue
  description: <optional description>

Important Notes:
- The condition is evaluated at the TOP of each iteration (including the first). If it is false up front, loop_blocks never run and the block succeeds with empty output.
- Termination: condition becomes falsy → success; reached the 100-iteration safety cap → failure with a clear failure_reason.
- Outputs from blocks within an iteration are accessible to later blocks in the same iteration via templating; outputs from prior iterations are accessible the same way for_loop accumulates them.
- The condition has access to {{ current_index }} — the 0-indexed iteration counter (same name and meaning as for_loop's). Use it to bootstrap iteration 0: e.g. "{{ current_index == 0 or <body_output_ref> }}". The OR short-circuits on the first check before the body output is consulted.
- Strict templating caveat: that bootstrap pattern relies on default lax templating. With WORKFLOW_TEMPLATING_STRICTNESS=strict, missing-variable validation runs before Jinja short-circuiting, so seed any referenced prior-output variable before the loop or use a condition based only on variables already present.
- No automatic per-iteration value variables. Unlike for_loop, there is NO {{ current_value }} / {{ current_item }} for while_loop — only {{ current_index }}.
- criteria_type "prompt" is not yet supported in while_loop; use "jinja2_template" only. Supplying "prompt" fails fast with NotImplementedError.

CRITICAL DISTINCTION (vs for_loop):
* for_loop: known list to iterate over. Use when you have a parameter or extraction output containing all items up front.
* while_loop: iteration count is unknown until runtime. Use for pagination ("keep going while a Next button exists"), polling ("keep checking until status is ready"), or retry-until-success patterns.

Example 1 - Paginate until no Next button:
blocks:
  - block_type: while_loop
    label: paginate_results
    next_block_label: null
    condition:
      criteria_type: jinja2_template
      # With default lax templating, current_index == 0 carries the first iteration
      # before extract_page exists. From iteration 1 onward, the previous iteration's
      # extracted has_next_page flag drives the loop.
      expression: "{{ current_index == 0 or extract_page.has_next_page }}"
    loop_blocks:
      - block_type: extraction
        label: extract_page
        next_block_label: click_next
        data_extraction_goal: "Extract all rows on the current page and whether a 'Next' button is enabled"
        data_schema:
          type: object
          properties:
            rows: {type: array, items: {type: object}}
            has_next_page: {type: boolean}
      - block_type: action
        label: click_next
        next_block_label: null
        navigation_goal: "Click the 'Next' button to advance to the next page"

Example 2 - Retry-until-success bounded by an attempt counter:
parameters:
  - parameter_type: workflow
    key: max_attempts
    workflow_parameter_type: integer
    default_value: 5
blocks:
  - block_type: while_loop
    label: retry_until_done
    next_block_label: null
    condition:
      criteria_type: jinja2_template
      # With default lax templating, current_index == 0 carries the first attempt
      # before check_status exists. From attempt 1 onward, the previous attempt's
      # is_ready flag controls termination.
      expression: "{{ current_index < max_attempts and (current_index == 0 or not check_status.is_ready) }}"
    loop_blocks:
      - block_type: extraction
        label: check_status
        next_block_label: null
        data_extraction_goal: "Determine whether the result is ready"
        data_schema:
          type: object
          properties:
            is_ready: {type: boolean}

** CONDITIONAL BLOCK (conditional) **

Purpose: Branch to different blocks based on ordered conditions.

Structure:
block_type: conditional
label: <unique_label>
branch_conditions: []                    # Required: Ordered list of branch conditions

Branch Condition Structure:
criteria:                               # Optional for default branch
  criteria_type: <jinja2_template|prompt>  # Optional: inferred when omitted
  expression: <expression>                # Required when criteria present
  description: <optional description>
next_block_label: <label|null>           # Optional: Label to execute when matched
description: <optional description>
is_default: false                        # Required when criteria is omitted

Important Notes:
- At least one branch is required
- Branches are evaluated in order; first match wins
- Only one default branch is allowed
- Branches without criteria must set is_default: true

Example:
blocks:
  - block_type: conditional
    label: route_by_status
    next_block_label: null
    branch_conditions:
      - criteria:
          criteria_type: jinja2_template
          expression: "{{ account_status == 'active' }}"
        next_block_label: handle_active
        description: "Active accounts"
        is_default: false
      - is_default: true
        next_block_label: handle_inactive
        description: "Fallback for all other states"

** LOGIN BLOCK (login) **

Purpose: Handle authentication flows including username/password and TOTP/2FA.

Structure:
block_type: login
label: <unique_label>
url: <login_page_url>           # Optional: Login page URL
title: str                      # Required: The title of the block
navigation_goal: null           # Optional: Additional navigation after login
error_code_mapping: {}          # Optional: Custom error codes
max_retries: 0                  # Optional: Retry attempts
max_steps_per_run: null         # Optional: Step limit
parameter_keys: []              # Required: Should include credential parameters
complete_criterion: null        # Optional: Completion condition
terminate_criterion: null       # Optional: Termination condition
complete_verification: true     # Optional: Verify successful login

Use Cases:
- Login to websites with username/password
- Handle 2FA/TOTP authentication
- Manage credential-protected workflows
- Session initialization

Important Notes:
- Credentials should be stored as parameters (credential, bitwarden_login_credential, etc.)
- TOTP is automatically handled if the credential parameter has TOTP configured

Example:
workflow_definition:
  version: 2
  blocks:
    - block_type: login
      label: login_to_portal
      next_block_label: null
      url: "https://portal.example.com/login"
      parameter_keys:
        - my_credentials # This must match a 'key' from the parameters list above
      complete_criterion: "Current URL is 'https://portal.example.com/dashboard'"
      max_retries: 2
  parameters:
    - parameter_type: workflow
      workflow_parameter_type: credential_id
      key: my_credentials
      default_value: "cred_uuid_here"

** VALIDATION BLOCK (validation) **

Purpose: Validate workflow state and decide whether to continue or terminate.

Structure:
block_type: validation
label: <unique_label>
complete_criterion: <success_condition>        # Optional: Condition for success
terminate_criterion: <termination_condition>   # Optional: Condition to stop workflow
error_code_mapping: {}                         # Optional: Map errors to custom codes
parameter_keys: []                             # Optional: Parameters used in this block
disable_cache: false                           # Optional: Disable caching

Use Cases:
- Confirm a successful navigation or submission
- Stop the workflow when an error state appears
- Validate content on the page before extracting data

Example:
blocks:
  - block_type: validation
    label: verify_submission
    next_block_label: null
    complete_criterion: "Page contains 'Thank you for your submission'"
    terminate_criterion: "Page contains 'Error' or 'Try again'"

** WAIT BLOCK (wait) **

Purpose: Pause workflow execution for a specified duration.

Structure:
block_type: wait
label: <unique_label>
wait_sec: <seconds>                     # Required: Number of seconds to wait

Example:
blocks:
  - block_type: wait
    label: wait_for_processing
    next_block_label: check_results
    wait_sec: 30

** EXTRACTION BLOCK (extraction) **

Purpose: Extract structured data from the current page without navigation.

Structure:
block_type: extraction
label: <unique_label>
title: str                               # Required: The title of the block
data_extraction_goal: <what_to_extract>  # Required: Description of data to extract
data_schema: <json_schema>               # Optional: Structure of extracted data
url: <page_url>                          # Optional: URL to navigate to first
max_retries: 0                           # Optional: Retry attempts
max_steps_per_run: null                  # Optional: Step limit
parameter_keys: []                       # Optional: Parameters used
disable_cache: false                     # Optional: Disable cache
Use Cases:
- Extract structured data after other blocks
- Parse tables, lists, or forms
- Collect multiple data points from a page
- Data mining from web pages

Data Schema Formats:
* JSON Schema object:
   data_schema:
     type: object
     properties:
       field1: {type: string}
       field2: {type: number}

* JSON Schema array:
   data_schema:
     type: array
     items:
       type: object
       properties:
         name: {type: string}

* String format (for simple extractions):
   data_schema: "csv_string"

Example:
blocks:
  - block_type: extraction
    label: extract_product_list
    next_block_label: null
    data_extraction_goal: "Extract all products with their names, prices, and stock status"
    data_schema:
      type: array
      items:
        type: object
        properties:
          product_name: {type: string}
          price: {type: number}
          in_stock: {type: boolean}
          rating: {type: number}
    max_retries: 1

** FILE DOWNLOAD BLOCK (file_download) **

Purpose: Download files from a website. This is the "File Download Block" in the UI.

Structure:
block_type: file_download
label: <unique_label>
navigation_goal: <download_instruction>        # Required: How to trigger the download
url: <starting_url>                            # Optional: URL to navigate to first
title: str                                     # Optional: Title for the block
engine: skyvern-1.0                            # Optional: Run engine
error_code_mapping: {}                         # Optional: Map errors to custom codes
max_retries: 0                                 # Optional: Number of retry attempts
max_steps_per_run: null                        # Optional: Limit steps per execution
parameter_keys: []                             # Optional: Parameters used in this block
download_suffix: null                          # Optional: Full filename for the download
totp_verification_url: null                    # Optional: TOTP verification URL
totp_identifier: null                          # Optional: TOTP identifier
disable_cache: false                           # Optional: Disable caching
download_timeout: null                         # Optional: Download timeout in seconds

Use Cases:
- Download invoices, receipts, or reports from a portal
- Export data as CSV or PDF from a dashboard
- Trigger file downloads that require navigation steps

Example:
blocks:
  - block_type: file_download
    label: download_report
    next_block_label: null
    url: "https://portal.example.com/reports"
    navigation_goal: "Open the latest report and click Download as PDF"
    download_suffix: "latest_report.pdf"

** CLOUD STORAGE BLOCK (file_upload) **

Purpose: Upload files to storage. This is the "Cloud Storage Block" in the UI.

Structure:
block_type: file_upload
label: <unique_label>
storage_type: <s3|azure>                       # Optional: Storage backend (default: s3)
s3_bucket: <bucket_name>                       # Optional: S3 bucket name
aws_access_key_id: <access_key_id>             # Optional: AWS access key id
aws_secret_access_key: <secret_access_key>     # Optional: AWS secret access key
region_name: <aws_region>                      # Optional: AWS region
azure_storage_account_name: <account_name>     # Optional: Azure storage account
azure_storage_account_key: <account_key>       # Optional: Azure storage account key
azure_blob_container_name: <container_name>    # Optional: Azure blob container
azure_folder_path: <folder_path>               # Optional: Azure folder path
path: <local_or_workspace_path>                # Optional: File path to upload

Use Cases:
- Upload downloaded artifacts to S3
- Publish files to Azure Blob Storage
- Persist workflow outputs to storage

Example:
blocks:
  - block_type: file_upload
    label: upload_report
    next_block_label: null
    storage_type: s3
    s3_bucket: "my-reports"
    region_name: "us-west-2"
    path: "/tmp/latest_report.pdf"

** FILE PARSER BLOCK (file_url_parser) **

Purpose: Parse PDFs, CSVs, Excel files, images, and DOCX files. This is the "File Parser Block" in the UI.

Structure:
block_type: file_url_parser
label: <unique_label>
file_url: <https_url_or_file_url>             # Required: URL to the file
file_type: <auto_detect|csv|excel|pdf|image|docx>  # Optional: defaults to auto_detect (infer from URL/content)
json_schema: <json_schema>                     # Optional: Structure of parsed output

Use Cases:
- Parse a PDF invoice into structured fields
- Extract rows from a CSV file
- Read Excel sheets for downstream processing

Example:
blocks:
  - block_type: file_url_parser
    label: parse_invoice
    next_block_label: null
    file_url: "https://example.com/invoice.pdf"
    file_type: pdf
    json_schema:
      type: object
      properties:
        invoice_id: {type: string}
        total: {type: number}

** SEND EMAIL BLOCK (send_email) **

Purpose: Send email notifications. This is the "Send Email Block" in the UI.

Structure:
block_type: send_email
label: <unique_label>
smtp_host_secret_parameter_key: <param_key>     # Required: Secret parameter key for SMTP host
smtp_port_secret_parameter_key: <param_key>     # Required: Secret parameter key for SMTP port
smtp_username_secret_parameter_key: <param_key> # Required: Secret parameter key for SMTP username
smtp_password_secret_parameter_key: <param_key> # Required: Secret parameter key for SMTP password
sender: <email_address>                         # Required: Sender email address
recipients: [<email_address>]                   # Required: Recipient list
subject: <subject_line>                         # Required: Email subject
body: <email_body>                              # Required: Email body
file_attachments: [<file_path>]                 # Optional: Local file paths to attach

Use Cases:
- Notify a team when a workflow completes
- Send extracted data summaries to stakeholders
- Email reports or attachments

Example:
blocks:
  - block_type: send_email
    label: notify_ops
    next_block_label: null
    smtp_host_secret_parameter_key: smtp_host
    smtp_port_secret_parameter_key: smtp_port
    smtp_username_secret_parameter_key: smtp_user
    smtp_password_secret_parameter_key: smtp_pass
    sender: "automation@example.com"
    recipients: ["ops@example.com"]
    subject: "Daily report ready"
    body: "The latest report is ready for review."
    file_attachments: ["/tmp/latest_report.pdf"]

** TEXT PROMPT BLOCK (text_prompt) **

Purpose: Process text with LLM. This is the "Text Prompt Block" in the UI.

Structure:
block_type: text_prompt
label: <unique_label>
llm_key: <llm_key>                             # Optional: Model key override
prompt: <text_prompt>                          # Required: Prompt to run
parameter_keys: []                             # Optional: Parameters used in this block
json_schema: <json_schema>                     # Optional: Structured output schema

Referencing output from downstream blocks: fields live on the block directly, NOT under `.output`. Write `{{ summarize_notes.summary }}`, not `{{ summarize_notes.output.summary }}`. See PARAMETER TEMPLATING for the full per-block-type rule and the null-guard requirement.

Use Cases:
- Summarize extracted text or documents
- Normalize free-form content into a schema
- Generate classifications or tags

Example:
workflow_definition:
  version: 2
  blocks:
    - block_type: text_prompt
      label: summarize_notes
      next_block_label: null
      prompt: "Summarize these notes: {{ notes }}"
      json_schema:
        type: object
        properties:
          summary: {type: string}
          action_items: {type: array, items: {type: string}}
      parameter_keys: [notes]
  parameters:
    - parameter_type: workflow
      key: notes
      workflow_parameter_type: string

** HTTP REQUEST BLOCK (http_request) **

Purpose: Make HTTP API calls. This is the "HTTP Request Block" in the UI.

Structure:
block_type: http_request
label: <unique_label>
method: <GET|POST|PUT|PATCH|DELETE>            # Optional: HTTP method (default: GET)
url: <https_url>                               # Optional: Target URL
headers: {}                                    # Optional: HTTP headers
body: {}                                       # Optional: JSON body (MUST be a dict, not a string)
files: {}                                      # Optional: Multipart files mapping
timeout: 30                                    # Optional: Timeout in seconds
follow_redirects: true                         # Optional: Follow redirects
parameter_keys: []                             # Optional: Workflow parameters used (block outputs don't need to be listed)

Use Cases:
- Call third-party APIs for enrichment
- Post data to internal services
- Upload files via multipart requests
- Send results from previous blocks to webhooks

Example 1 - Using workflow parameters:
workflow_definition:
  version: 2
  blocks:
    - block_type: http_request
      label: lookup_customer
      next_block_label: null
      method: "POST"
      url: "https://api.example.com/customers/search"
      headers:
        Authorization: "Bearer {{ api_token }}"
      body:
        email: "{{ customer_email }}"
      parameter_keys: [api_token, customer_email]
  parameters:
    - parameter_type: workflow
      key: api_token
      workflow_parameter_type: string
    - parameter_type: workflow
      key: customer_email
      workflow_parameter_type: string

Example 2 - Sending block output to webhook (no parameter_keys needed):
workflow_definition:
  version: 2
  parameters: []
  blocks:
    - block_type: navigation
      label: get_data
      next_block_label: send_webhook
      navigation_goal: "Get top 3 hacker news items"
      url: "https://news.ycombinator.com"
    - block_type: http_request
      label: send_webhook
      next_block_label: null
      method: POST
      url: "http://example.com/webhook"
      body:
        data: "{{ get_data.output }}"
      parameter_keys: []

** PARAMETER TEMPLATING **

All string fields in blocks support Jinja2 templating to reference parameters and block outputs.

There are TWO types of references:

1. WORKFLOW PARAMETERS - Reference input parameters defined in the parameters section
   Syntax: {{ param_key }}
   Requires: parameter_keys list must include the param_key

2. BLOCK OUTPUTS - Reference output from a previous block by its label
   Syntax depends on the upstream block type; see below.
   Requires: NOTHING - block outputs are automatically available, no parameter_keys needed

IMPORTANT — block output shapes are NOT uniform across block types. Using the wrong shape either raises "dict object has no attribute 'output'" at render time or (worse) silently renders a None value into a URL field, producing `https://None` which fails with `net::ERR_NAME_NOT_RESOLVED`.

* Browser-task blocks (navigation, extraction, goto_url, login, action, file_download):
  - `{{ label.output }}` — the full TaskOutput envelope.
  - `{{ label.output.extracted_information }}` — the extracted payload.
  - Example: `{{ extract_items.output }}` inside a downstream prompt.

* Non-task blocks (text_prompt, http_request, file_url_parser):
  - `{{ label.<field> }}` — fields from the returned JSON are on the block directly; there is NO `.output` wrapper.
  - Example: `{{ summarize_notes.summary }}` (NOT `{{ summarize_notes.output.summary }}`).
  - Writing `{{ summarize_notes.output.summary }}` raises "dict object has no attribute 'output'" because `summarize_notes` IS the dict.

NULL-GUARD RULE — when a downstream field may legitimately be null/empty (e.g., a text_prompt returned `{"store_url": null}` because no such link exists on the page), wrap the reference in a `conditional` block BEFORE using it as a URL. Unguarded `goto_url` with `url: "{{ x.field }}"` where `x.field` is None renders as `https://None` and is not recoverable by retrying the same workflow — it is a nullity bug, not a navigation bug.

Examples - Workflow Parameters:

* In URL:
url: "https://example.com/search?q={{ search_term }}"
parameter_keys: [search_term]

* In goals:
navigation_goal: "Search for {{ product_name }} and filter by {{ category }}"
parameter_keys: [product_name, category]

Examples - Block Outputs (no parameter_keys needed):

* Send previous task-block output to webhook (task block → use .output):
blocks:
  - block_type: navigation
    label: get_data
    navigation_goal: "Get top 3 hacker news items"
    ...
  - block_type: http_request
    label: send_webhook
    method: POST
    url: "http://example.com/webhook"
    body:
      data: "{{ get_data.output }}"
    parameter_keys: []  # Empty - block outputs don't need to be declared

* Use extraction output in next block (task block → use .output):
blocks:
  - block_type: extraction
    label: extract_items
    ...
  - block_type: navigation
    label: process_items
    navigation_goal: "Process these items: {{ extract_items.output }}"

* Use text_prompt output in next block (non-task block → reference fields directly, NO .output):
blocks:
  - block_type: text_prompt
    label: summarize_notes
    prompt: "Summarize ..."
    json_schema: {type: object, properties: {summary: {type: string}}}
  - block_type: navigation
    label: use_summary
    navigation_goal: "Act on: {{ summarize_notes.summary }}"    # NOT {{ summarize_notes.output.summary }}

* In schemas (as descriptions):
data_schema:
  type: object
  properties:
    query_result:
      type: string
      description: "Result for query: {{ query }}"

** ERROR HANDLING AND RETRIES **

Error Code Mapping:
Map internal errors to custom error codes for easier handling:

error_code_mapping:
  "ElementNotFound": "ELEMENT_MISSING"
  "TimeoutError": "PAGE_TIMEOUT"
  "NavigationFailed": "NAV_ERROR"

Retry Configuration:
max_retries: 3  # Block will retry up to 3 times on failure

Conditional Continuation:
continue_on_failure: true  # Workflow continues even if block fails

Loop Continuation:
next_loop_on_failure: true  # Skip to next iteration in loops

Completion Criteria:
complete_criterion: "URL contains '/success'"  # Condition for success
terminate_criterion: "Element with text 'Error' exists"  # Condition to stop

** WORKFLOW EXECUTION FLOW **

Sequential Execution:
blocks:
  - block_type: goto_url
    label: step1
    next_block_label: step2
    url: "https://example.com/start"
  - block_type: extraction
    label: step2
    next_block_label: step3
  - block_type: navigation
    label: step3
    next_block_label: null
    navigation_goal: "Complete the final step on the page"
# Executes: step1 → step2 → step3

Explicit Flow Control (Skip blocks):
blocks:
  - block_type: goto_url
    label: login
    next_block_label: extract_data
    url: "https://app.example.com/login"
  - block_type: navigation
    label: handle_error
    next_block_label: null
    navigation_goal: "Handle the error state if it appears"
  - block_type: extraction
    label: extract_data
    next_block_label: null
# Executes: login → extract_data (skips handle_error)

Error Recovery Flow:
blocks:
  - block_type: navigation
    label: primary_task
    next_block_label: verify_result
    continue_on_failure: true
    navigation_goal: "Attempt the primary task on the page"
  - block_type: validation
    label: verify_result
    next_block_label: null

** BEST PRACTICES **

* Naming Conventions:
   - Use descriptive labels: "login_to_portal" not "step1"
   - Use snake_case for labels and parameter keys
   - Use snake_case for data schema field names (e.g., product_name, total_price)
   - Make labels unique and meaningful

* Goal Writing:
   - Be specific: "Click the blue 'Submit' button" vs "Submit the form"
   - Include context: "After clicking Search, wait for results to load"
   - Natural language: Write as you would instruct a human

* Parameter Usage:
   - Always list parameter_keys when using parameters in a block
   - Validate parameter types match usage
   - Provide default values for optional parameters

* Error Handling:
   - Set appropriate max_retries for flaky operations
   - Use complete_criterion for validation
   - Map errors to meaningful codes for debugging

* Data Extraction:
   - Always provide data_schema for structured extraction
   - Use specific extraction goals
   - Handle arrays vs objects appropriately

* Performance:
   - Use disable_cache: true for dynamic content
   - Set max_steps_per_run to prevent infinite loops
   - Use navigation blocks for actions and extraction blocks for data extraction

* Security:
   - Never hardcode credentials in workflows
   - Use credential parameters for sensitive data
   - Use AWS secrets or vault integrations

** COMMON PATTERNS **

Pattern 1: Login → Navigate → Extract
workflow_definition:
  version: 2
  blocks:
    - block_type: login
      label: authenticate
      next_block_label: go_to_reports
      url: "https://app.example.com/login"
      parameter_keys: [my_credentials]
    - block_type: navigation
      label: go_to_reports
      next_block_label: get_report_data
      navigation_goal: "Navigate to Reports section"
    - block_type: extraction
      label: get_report_data
      next_block_label: null
      data_extraction_goal: "Extract all report entries"
      data_schema:
        type: array
        items: {type: object}
  parameters:
    - parameter_type: workflow
      workflow_parameter_type: credential_id
      key: my_credentials
      default_value: "uuid"
    - parameter_type: output
      key: extracted_data

Pattern 2: Search with Dynamic Input
workflow_definition:
  version: 2
  blocks:
    - block_type: navigation
      label: search_and_extract
      next_block_label: null
      url: "https://example.com"
      navigation_goal: "Search for '{{ search_query }}' and extract the first 10 results with titles and URLs"
  parameters:
    - parameter_type: workflow
      key: search_query
      workflow_parameter_type: string

Pattern 3: Multi-Step Form Filling
workflow_definition:
  version: 2
  blocks:
    - block_type: goto_url
      label: open_form
      next_block_label: fill_personal_info
      url: "https://forms.example.com/application"
    - block_type: navigation
      label: fill_personal_info
      next_block_label: fill_address
      navigation_goal: "Fill in name as {{ name }}, email as {{ email }}"
      parameter_keys: [name, email]
    - block_type: navigation
      label: fill_address
      next_block_label: submit
      navigation_goal: "Fill in address fields and click Continue"
      parameter_keys: [address, city, zip]
    - block_type: navigation
      label: submit
      next_block_label: null
      navigation_goal: "Review information and click Submit"
  parameters:
    - parameter_type: workflow
      key: name
      workflow_parameter_type: string
    - parameter_type: workflow
      key: email
      workflow_parameter_type: string
    - parameter_type: workflow
      key: address
      workflow_parameter_type: string
    - parameter_type: workflow
      key: city
      workflow_parameter_type: string
    - parameter_type: workflow
      key: zip
      workflow_parameter_type: string

Pattern 4: Conditional Extraction
workflow_definition:
  version: 2
  blocks:
    - block_type: navigation
      label: search_product
      next_block_label: check_availability
      navigation_goal: "Search for {{ product }}"
    - block_type: extraction
      label: check_availability
      next_block_label: add_to_cart
      data_extraction_goal: "Check if product is in stock"
      data_schema:
        type: object
        properties:
          in_stock: {type: boolean}
    - block_type: navigation
      label: add_to_cart
      next_block_label: null
      navigation_goal: "If product is in stock, add to cart"
  parameters: []

** VALIDATION RULES **

Workflow-Level:
- All block labels must be unique
- Parameters referenced in blocks must be defined
- next_block_label must point to existing block labels or be null
- The last block in execution flow should have next_block_label: null

Block-Level:
- label is required and cannot be empty
- block_type must be a valid type
- For navigation blocks: navigation_goal is required
- For extraction blocks: data_extraction_goal is required
- For action blocks: navigation_goal is required
- For login blocks: parameter_keys should include credentials

Parameter-Level:
- key must be unique across parameters
- key cannot contain whitespace
- parameter_type must be valid
- Referenced keys (like source_parameter_key) must exist

** COMPLETE WORKFLOW EXAMPLE **

title: E-commerce Product Search and Purchase
description: Search for a product, extract details, and add to cart
workflow_definition:
  version: 2
  blocks:
    - block_type: login
      label: login_to_store
      next_block_label: search_and_filter
      url: "https://shop.example.com/login"
      parameter_keys:
        - account_creds
      complete_criterion: "URL contains '/dashboard'"

    - block_type: navigation
      label: search_and_filter
      next_block_label: get_product_info
      url: "https://shop.example.com/search"
      navigation_goal: "Search for {{ product_name }} and filter results by price under ${{ max_price }}"
      parameter_keys:
        - product_name
        - max_price
      max_retries: 2

    - block_type: extraction
      label: get_product_info
      next_block_label: add_to_cart
      data_extraction_goal: "Extract product name, price, rating, and availability"
      data_schema:
        type: object
        properties:
          name: {type: string}
          price: {type: number}
          rating: {type: number}
          available: {type: boolean}

    - block_type: navigation
      label: add_to_cart
      next_block_label: null
      navigation_goal: "Click on the first available product and add it to cart"
      max_retries: 3
  parameters:
    - parameter_type: workflow
      key: product_name
      workflow_parameter_type: string
      description: "Product to search for"
    - parameter_type: workflow
      key: max_price
      workflow_parameter_type: float
      description: "Maximum price willing to pay"
    - parameter_type: workflow
      workflow_parameter_type: credential_id
      key: account_creds
      default_value: "cred_12345"
    - parameter_type: output
      key: product_details
      description: "Extracted product information"

END OF KNOWLEDGE BASE
