What should the developer API look like?

How a developer writes code to use the tool. This shapes everything about the library design.

A

Primitive actions — low-level control

Developers call individual actions directly. The LLM decides what to do; the library executes it precisely.

browser = AgentBrowser()
page = await browser.open("https://example.com")

# LLM gets a compact page summary
state = await page.snapshot()

# LLM decides what to do, calls these
await page.click("Submit button")
await page.type("Email field", "user@email.com")
await page.extract({ "price": "str", "title": "str" })
Pros
Maximum control
Works with any agent loop
Cons
Developer writes more glue code
B

High-level task API — the library handles the loop

Developer gives a goal in plain English. The library runs its own reasoning loop and returns the result.

browser = AgentBrowser(llm=my_llm)

result = await browser.run(
  "Go to amazon.com and find "
  "the cheapest blue sneakers under $80"
)

print(result.data) # structured result
print(result.trace) # full action trace
Pros
Dead simple to use
5-line integration
Cons
Less control
Competes with browser-use directly
C

Both: primitives + high-level on top (recommended)

Low-level primitives for developers who want control. A high-level run() built on top of those same primitives for developers who want simplicity. Same library, two entry points.

# Simple: let it handle everything
result = await browser.run("Find cheapest sneakers")

# Or: full control for your own agent loop
state = await page.snapshot() # compact
action = my_llm.decide(state)
result = await page.execute(action)
Pros
Serves both audiences
Primitives = differentiation
High-level = easy adoption
Cons
Slightly more API surface
(manageable)
My recommendation: C. The primitives are what make this tool different from everything else — they're token-efficient, structured, and safe. The high-level run() is just glue on top. Build both, lead with the primitives in the docs.