Browser Automation

Every Egg tab speaks the Chrome DevTools Protocol natively. Three ways to drive a tab from outside Egg: raw CDP for clients that already speak it, ~120 stable micro-commands callable over HTTP, or the agent tool catalog the LLM uses. Each is a complete entry point on its own; they share the underlying engine and overlap in implementation.

Chrome DevTools Protocol

Method coverage matches the Chromium version WebView2 ships. Send any method, subscribe to any event.

The methods and events are the standard Chrome DevTools Protocol; Egg does not duplicate the protocol reference. See chromedevtools.github.io/devtools-protocol for the full method and event catalog.

# Send a CDP method to a tab
curl -s -X POST "http://127.0.0.1:$PORT/api/browser/tabs/$TAB/cdp" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"method":"Page.navigate","params":{"url":"https://eggbrowser.com"}}'

# Subscribe to CDP events as Server-Sent Events
curl -N "http://127.0.0.1:$PORT/api/browser/cdp/events" \
  -H "Authorization: Bearer $TOKEN"
# → data: {"tab_id":"...","event":"Page.frameNavigated","params":{...}}

Alternatively, launch with --remote-debugging-port=9222 and attach Playwright, Puppeteer, or any CDP client to http://127.0.0.1:9222.

Micro-commands

Micro-commands are a stable, named-operation surface for driving a tab. Each is a small verb (click, fill, wait, snapshot, set viewport) that wraps the underlying engine work behind a name that does not change between Chromium versions. The shape will be familiar to anyone who has used Playwright or Puppeteer, but the surface is HTTP-native: every command is one POST, returns JSON, and is callable from any language. There is no SDK to install.

The intended audience is anyone who wants browser automation without writing CDP by hand. Script writers in shell, Python, Node, Go, or Rust call the endpoints with their HTTP library of choice. CI pipelines run them as one-shots against a headless egg. The agent harness dispatches its tool calls to the same surface, so what you script by hand is what the agent runs internally. Compared to raw CDP, micro-commands give you stability across engine versions, accessibility-first locators (@e refs), wait conditions that match how pages actually load, and a batch endpoint that runs a sequence and stops on first error.

The catalog has ~120 commands across 16 categories. Each is callable as a single POST or composed in a batch.

# Single command
curl -s -X POST "http://127.0.0.1:$PORT/api/browser/cmd" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"tab_id":"'$TAB'","command":"click","args":{"selector":"@e4"}}'

# Batch (sequential, stop on error)
curl -s -X POST "http://127.0.0.1:$PORT/api/browser/batch" \
  -H "Authorization: Bearer $TOKEN" \
  -d '[
    {"command":"open","args":{"url":"https://example.com"}},
    {"command":"wait","args":{"for":"load","state":"networkidle"}},
    {"command":"snapshot","args":{}}
  ]'
CommandPurpose
open <url>Navigate the tab to a URL.
back / forwardStep through history.
reloadReload the current page.
closeClose the tab.

Interaction

CommandPurpose
click / dblclick <sel>Resolve selector, dispatch real mouse events at element coordinates.
fill <sel> <text>Focus, clear, type.
type <sel> <text>Type without clearing.
press <key>Press a key combo, e.g. Control+Enter.
hover, focus <sel>Mouse-hover or programmatic focus.
select <sel> <value>Select a dropdown option, fire change.
check / uncheck <sel>Toggle a checkbox to a known state.
scroll <dir> [px]Wheel-scroll the page or an element.
scrollintoview <sel>Scroll an element into view.
drag <src> <dst>Mouse down, move sequence, mouse up.

Find & query

CommandPurpose
snapshotFull accessibility tree with sequential @e refs. See below.
find role / text / label / placeholder / alt / title / testidFind elements by accessibility property. Returns refs.
find first / last / nth <n> <sel>Position-based selection.
get text / html / value / attr / box / stylesRead element state.
get title / url / countRead page-level state.
is visible / enabled / checked <sel>Boolean state checks.

Wait conditions

CommandPurpose
wait <selector>Until element exists.
wait <ms>Fixed duration.
wait --text "..."Until text appears anywhere on the page.
wait --url "..."Until the URL matches a pattern.
wait --load networkidleUntil the network goes quiet.
wait --fn "expr"Until a JS expression returns truthy.
wait <sel> --state hiddenUntil an element disappears.
wait --downloadUntil a download begins or completes.

Network

CommandPurpose
network route <url>Intercept matching requests.
network route <url> --abortBlock matching requests.
network route <url> --body <json>Mock a response body.
network unroute [url]Remove an interception rule.
network requests [--filter <pat>]List captured requests.
network har start / stop [file]Record HAR 1.2 logs.

Storage & cookies

CommandPurpose
cookies / cookies set / cookies clearRead, write, and clear browser cookies.
storage local / storage local set / clearlocalStorage read/write.
storage session / storage session set / clearsessionStorage read/write.
clipboard read / write / copy / pasteClipboard via the Web Clipboard API or synthesized key combos.

Emulation & settings

CommandPurpose
set viewport <w> <h> [scale]Override device metrics.
set device <name>Apply a preset (iPhone, Pixel, etc.).
set geo <lat> <lng>Override geolocation.
set offline [on/off]Toggle offline mode.
set headers <json>Set extra HTTP headers.
set credentials <user> <pass>HTTP basic auth via fetch interception.
set media [dark/light]Override prefers-color-scheme.
set useragent <ua>Override the user agent.

Capture & debug

CommandPurpose
screenshot [--full] [path]Capture the viewport or the whole page.
pdf <path>Save the page as a PDF.
eval <js>Run JavaScript in the page context, return the result.
console / errorsRead accumulated console messages and exceptions.
highlight <sel>Visual highlight overlay.
inspectOpen DevTools.
trace start / stop, profiler start / stopCDP tracing and profiling.

State & auth

CommandPurpose
state save / load / list / show / rename / clear / cleanSnapshot and restore cookies + localStorage + sessionStorage + URL.
auth save / login / list / show / deleteEncrypted auth-state vault for switching between logged-in personas.

Tabs & frames

CommandPurpose
tab / tab new [url] / tab <n> / tab close [n]List, create, switch, close.
window newOpen a new window.
frame <sel> / frame @e<n> / frame mainEnter an iframe, return to the top frame.

@e refs

Run snapshot to get the page’s accessibility tree with sequential refs (@e1, @e2, …) attached to interactive nodes. Subsequent commands accept refs anywhere a selector is allowed.

[document] Egg: the browser that works for you
  [navigation] Main Menu
    [link @e1] Product
    [link @e2] Developers
    [link @e3] Pricing
  [main]
    [heading @e4] Build with Egg.
    [textbox @e5] Search...
    [button @e6] Submit

Agent tools

Agent tools are the LLM-facing automation surface. When the agent harness opens a conversation it advertises this small, grouped catalog to the language model rather than the full ~120-verb micro-command list. The grouping mirrors an observation/action loop: separate tools for observing a page, acting on it, navigating, waiting, finding, extracting, screenshotting, and emulating. Tool names and argument shapes are tuned for clarity in tool-call output, so a session is readable when an operator audits what the model did and why.

Agent tools are not a public surface for external automation. Scripts and HTTP clients should call micro-commands directly. The catalog exists so the model sees a curated set: enough to do what it needs, few enough to reason about cleanly, and named in terms of intent rather than mechanism. Each tool dispatches to one or more micro-commands under the hood, so anything the LLM does through this catalog can be reproduced verbatim by hand against the same HTTP endpoints.

ToolDispatches to
browser_observesnapshot, get text/html/value/attr/box/styles, get title, get url, is visible/enabled/checked
browser_actclick, dblclick, fill, type, press, hover, focus, select, check, uncheck, scroll, scrollintoview, drag
browser_navigateopen, back, forward, reload
browser_waitwait variants
browser_findfind role/text/label/placeholder/alt/title/testid/first/last/nth
browser_tabstab, tab new/close/switch, window new
browser_extractget text, get html, reader-mode extraction
browser_screenshotscreenshot, pdf
browser_networknetwork route/unroute/requests/har
browser_storagecookies, storage local/session
browser_evaleval
browser_keyboardkeyboard type/inserttext, keydown, keyup, press
browser_mousemouse move/down/up/wheel, drag
browser_emulateset viewport/device/geo/offline/media/headers/useragent
browser_statestate save/load/list, auth save/login/list