Introduction

Graphex plugin enabling web automation.

Installation

Install the plugin with the command make all. This also sets up the required Playwright tools for automation.

Execution

This repository is not intended for standalone use. It bridges the gap between Python's Playwright web automation tool and the Graphex module.

Using Playwright Nodes

The graphex-webautomation-plugin leverages the Playwright Python package by Microsoft. This open-source tool automates browser interactions with Chromium, Firefox, and WebKit in Python, among other languages. A standout feature is Playwright's code generation tool, simplifying browser interaction scripting.

Interacting with the Browser

The Create Playwright Browser Context node initiates a browser context, akin to manually opening a new browser window. Multiple pages can be opened within this context.

image

For clarity: manually launching a browser and logging into a site means you won't need to log in again when opening a new tab in that browser. This is due to session retention (usually via cookies). Likewise, Playwright's browser context maintains session data across its pages, so actions like logging in on one page are recognized on others within the same context.

Crafting and Executing Page Commands

The Execute Playwright Page Script node allows synchronous execution of a series of Playwright commands in Python. The node accepts a page commands script which has access to these local variables:

The Execute Playwright Page Script node facilitates the synchronous execution of a series of Playwright commands in Python. It accepts a page commands script, which can access the following local variables:

For example, to download the Ubuntu 22.04.3 desktop iso, use:

page.goto("https://ubuntu.com/download/desktop")
page.get_by_role("button", name="Accept all and visit site").click()
page.get_by_role("link", name="Search Search").click()
page.get_by_placeholder("Search our sites").fill("22.04.3")
page.get_by_placeholder("Search our sites").press("Enter")
page.get_by_role("link", name="Ubuntu 22.04.3 LTS (Jammy Jellyfish)", exact=True).click()
with page.expect_download() as download_info:
    page.get_by_role("link", name="64-bit PC (AMD64) desktop image").click()
download = download_info.value

By default, the plugin waits up to 30 seconds for an element's appearance before erroring out. No need for explicit timeouts. Adjust this duration using the Element Timeout (ms) option in the Open a Playwright Page node. The plugin handles file downloads automatically, storing their paths in the Download Filepaths output.

Creating Page Commands

To use Playwright's code generation tool for creating page commands, follow these steps:

  1. Install playwright: pip install playwright.
  2. Set it up: python3 -m playwright install.

Use the codegen tool while bypassing HTTPS errors with:

python3 -m playwright codegen --ignore-https-errors --viewport "1920, 1080"

This command launches a Chromium browser alongside Playwright's code generator. Interact with the browser, and it'll log commands for you.

image

For the Execute Playwright Page Script node, extract commands post goto (assuming a predefined URL):

image

Then, input the code block into the node's page_commands.

Refining Codegen Commands

While Playwright's Python codegen tool is an excellent starting point for web automation, always review the auto-generated code. Here are points to consider:

  1. Is the selector overly specific?

    Avoid relying on hardcoded values, like specific version numbers. For instance, while automating the task of navigating to the Express NPM package page and selecting the latest version:

    image

    As of october 2023, playwright codegen will produce this code:

    page.goto("https://www.npmjs.com/package/express?activeTab=versions")
    page.locator("li").filter(has_text="4.18.217,767,703latest").get_by_label("4.18.2").click()
    

    This code becomes obsolete when the version updates. Here's the HTML structure from the browser's inspect tool for clarity:

    <li>
        <a href="/package/express/v/4.18.2" aria-label="4.18.2">4.18.2</a>
        <div>17,767,703 downloads</div>
        <div class="latest-tag">latest</div>
    </li>
    

    Given this structure, you'd want to target the li with the text "latest" and its child a element:

    page.locator('li', has_text='latest').locator('a').click()
    

    Now the commands will work if the version is updated.

  2. Does it handle delayed actions?

    Playwright might not always detect delayed actions like downloads. To handle such scenarios, use constructs like page.wait_for_event("download"). For example, if there's a missed 'with' context for a download:

    page.get_by_role("link", name="64-bit PC (AMD64) desktop image").click()
    

    Modify it to:

    with page.expect_download() as download_info:
        page.get_by_role("link", name="64-bit PC (AMD64) desktop image").click()
    download = download_info.value
    

By carefully reviewing and refining the codegen's output, your automation becomes more efficient and adaptable to web content changes.

Advanced Playwright Nodes

Web automation, particularly with tools as advanced as Playwright, offers an array of features that often go unnoticed by many. To truly harness the power of the graphex-webautomation-plugin and Playwright, it's crucial to delve deeper into the advanced nodes the plugin provides. A deeper understanding the playwright framework can be found here.

Locators

The essence of any web automation task lies in the ability to pinpoint and interact with specific elements on a web page. Playwright streamlines this task using 'locators'.

What is a Locator?

A locator in Playwright can be visualized as a guiding beacon that directs your automation script to the desired element(s) on a page. Instead of manually sifting through the webpage's code to identify the unique identifiers or attributes, locators enable you to set a criteria and allow Playwright to handle the element selection.

Using Get By Role

The get_by_role function in Playwright fetches elements based on their ARIA roles, making it particularly useful for targeting specific accessibility roles on web pages.

Examples:

Filters

In many cases, merely locating an element isn't enough. You might encounter scenarios where you need to fine-tune your selection. This is where filters come into play.

Working with Filters

Building on the locator's capabilities, filters refine the results to better match the desired criteria. Filters are invaluable when:

For instance, if you've used a locator to select all links on a page but only wish to interact with those containing a specific pattern, a regex filter can achieve this.

Example:

Lets say we want to click the link with text a text link:

image

We can use the following graph to find all links and filter to this exact name via regex:

image

In the above code, the locator fetches all anchor (a) tags, but the filter refines the selection to only those containing the exact match "a text link".

Actions

Once you've honed in on the target element(s) using locators and filters, actions breathe life into your automation script. With locators pinpointing the exact location, actions dictate what to do next.

Common Actions

By chaining locators, filters, and actions, you can construct powerful automation sequences that navigate, interact with, and even adapt to the dynamic content of modern webpages. This trifecta forms the bedrock of advanced web automation with Playwright in the graphex-webautomation-plugin.

Changelog

The changelog for this plugin can be found on this page.