Browser Automation

Every Egg tab speaks the Chrome DevTools Protocol natively. Three ways to drive a tab from outside Egg: raw CDP for clients that already speak it, ~120 stable micro-commands callable over HTTP, or the agent tool catalog the LLM uses. Each is a complete entry point on its own; they share the underlying engine and overlap in implementation.

Chrome DevTools Protocol

Method coverage matches the Chromium version WebView2 ships. Send any method, subscribe to any event.

The methods and events are the standard Chrome DevTools Protocol; Egg does not duplicate the protocol reference. See chromedevtools.github.io/devtools-protocol for the full method and event catalog.

# Send a CDP method to a tab
curl -s -X POST "http://127.0.0.1:$PORT/api/browser/tabs/$TAB/cdp" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"method":"Page.navigate","params":{"url":"https://eggbrowser.com"}}'

# Subscribe to CDP events as Server-Sent Events
curl -N "http://127.0.0.1:$PORT/api/browser/cdp/events" \
  -H "Authorization: Bearer $TOKEN"
# → data: {"tab_id":"...","event":"Page.frameNavigated","params":{...}}

Alternatively, launch with --remote-debugging-port=9222 and attach Playwright, Puppeteer, or any CDP client to http://127.0.0.1:9222.

Micro-commands

Micro-commands are a stable, named-operation surface for driving a tab. Each is a small verb (click, fill, wait, snapshot, set viewport) that wraps the underlying engine work behind a name that does not change between Chromium versions. The shape will be familiar to anyone who has used Playwright or Puppeteer, but the surface is HTTP-native: every command is one POST, returns JSON, and is callable from any language. There is no SDK to install.

The intended audience is anyone who wants browser automation without writing CDP by hand. Script writers in shell, Python, Node, Go, or Rust call the endpoints with their HTTP library of choice. CI pipelines run them as one-shots against a headless egg. The agent harness dispatches its tool calls to the same surface, so what you script by hand is what the agent runs internally. Compared to raw CDP, micro-commands give you stability across engine versions, accessibility-first locators (@e refs), wait conditions that match how pages actually load, and a batch endpoint that runs a sequence and stops on first error.

The catalog has ~120 commands across 16 categories. Each is callable as a single POST or composed in a batch.

# Single command
curl -s -X POST "http://127.0.0.1:$PORT/api/browser/cmd" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"tab_id":"'$TAB'","command":"click","args":{"selector":"@e4"}}'

# Batch (sequential, stop on error)
curl -s -X POST "http://127.0.0.1:$PORT/api/browser/batch" \
  -H "Authorization: Bearer $TOKEN" \
  -d '[
    {"command":"open","args":{"url":"https://example.com"}},
    {"command":"wait","args":{"for":"load","state":"networkidle"}},
    {"command":"snapshot","args":{}}
  ]'

Navigation

Command	Purpose
open <url>	Navigate the tab to a URL.
back / forward	Step through history.
reload	Reload the current page.
close	Close the tab.

Interaction

Command	Purpose
click / dblclick <sel>	Resolve selector, dispatch real mouse events at element coordinates.
fill <sel> <text>	Focus, clear, type.
type <sel> <text>	Type without clearing.
press <key>	Press a key combo, e.g. `Control+Enter`.
hover, focus <sel>	Mouse-hover or programmatic focus.
select <sel> <value>	Select a dropdown option, fire `change`.
check / uncheck <sel>	Toggle a checkbox to a known state.
scroll <dir> [px]	Wheel-scroll the page or an element.
scrollintoview <sel>	Scroll an element into view.
drag <src> <dst>	Mouse down, move sequence, mouse up.

Find & query

Command	Purpose
snapshot	Full accessibility tree with sequential `@e` refs. See below.
find role / text / label / placeholder / alt / title / testid	Find elements by accessibility property. Returns refs.
find first / last / nth <n> <sel>	Position-based selection.
get text / html / value / attr / box / styles	Read element state.
get title / url / count	Read page-level state.
is visible / enabled / checked <sel>	Boolean state checks.

Wait conditions

Command	Purpose
wait <selector>	Until element exists.
wait <ms>	Fixed duration.
wait --text "..."	Until text appears anywhere on the page.
wait --url "..."	Until the URL matches a pattern.
wait --load networkidle	Until the network goes quiet.
wait --fn "expr"	Until a JS expression returns truthy.
wait <sel> --state hidden	Until an element disappears.
wait --download	Until a download begins or completes.

Network

Command	Purpose
network route <url>	Intercept matching requests.
network route <url> --abort	Block matching requests.
network route <url> --body <json>	Mock a response body.
network unroute [url]	Remove an interception rule.
network requests [--filter <pat>]	List captured requests.
network har start / stop [file]	Record HAR 1.2 logs.

Storage & cookies

Command	Purpose
cookies / cookies set / cookies clear	Read, write, and clear browser cookies.
storage local / storage local set / clear	`localStorage` read/write.
storage session / storage session set / clear	`sessionStorage` read/write.
clipboard read / write / copy / paste	Clipboard via the Web Clipboard API or synthesized key combos.

Emulation & settings

Command	Purpose
set viewport <w> <h> [scale]	Override device metrics.
set device <name>	Apply a preset (iPhone, Pixel, etc.).
set geo <lat> <lng>	Override geolocation.
set offline [on/off]	Toggle offline mode.
set headers <json>	Set extra HTTP headers.
set credentials <user> <pass>	HTTP basic auth via fetch interception.
set media [dark/light]	Override `prefers-color-scheme`.
set useragent <ua>	Override the user agent.

Capture & debug

Command	Purpose
screenshot [--full] [path]	Capture the viewport or the whole page.
pdf <path>	Save the page as a PDF.
eval <js>	Run JavaScript in the page context, return the result.
console / errors	Read accumulated console messages and exceptions.
highlight <sel>	Visual highlight overlay.
inspect	Open DevTools.
trace start / stop, profiler start / stop	CDP tracing and profiling.

State & auth

Command	Purpose
state save / load / list / show / rename / clear / clean	Snapshot and restore cookies + localStorage + sessionStorage + URL.
auth save / login / list / show / delete	Encrypted auth-state vault for switching between logged-in personas.

Tabs & frames

Command	Purpose
tab / tab new [url] / tab <n> / tab close [n]	List, create, switch, close.
window new	Open a new window.
frame <sel> / frame @e<n> / frame main	Enter an iframe, return to the top frame.

@e refs

Run snapshot to get the page’s accessibility tree with sequential refs (@e1, @e2, …) attached to interactive nodes. Subsequent commands accept refs anywhere a selector is allowed.

[document] Egg: the browser that works for you
  [navigation] Main Menu
    [link @e1] Product
    [link @e2] Developers
    [link @e3] Pricing
  [main]
    [heading @e4] Build with Egg.
    [textbox @e5] Search...
    [button @e6] Submit

Refs are session-scoped: valid until the page navigates or you call snapshot again.
Refs map to backend node IDs server-side, not CSS strings. They survive non-structural CSS changes.

Agent tools

Agent tools are the LLM-facing automation surface. When the agent harness opens a conversation it advertises this small, grouped catalog to the language model rather than the full ~120-verb micro-command list. The grouping mirrors an observation/action loop: separate tools for observing a page, acting on it, navigating, waiting, finding, extracting, screenshotting, and emulating. Tool names and argument shapes are tuned for clarity in tool-call output, so a session is readable when an operator audits what the model did and why.

Agent tools are not a public surface for external automation. Scripts and HTTP clients should call micro-commands directly. The catalog exists so the model sees a curated set: enough to do what it needs, few enough to reason about cleanly, and named in terms of intent rather than mechanism. Each tool dispatches to one or more micro-commands under the hood, so anything the LLM does through this catalog can be reproduced verbatim by hand against the same HTTP endpoints.

Tool	Dispatches to
browser_observe	snapshot, get text/html/value/attr/box/styles, get title, get url, is visible/enabled/checked
browser_act	click, dblclick, fill, type, press, hover, focus, select, check, uncheck, scroll, scrollintoview, drag
browser_navigate	open, back, forward, reload
browser_wait	wait variants
browser_find	find role/text/label/placeholder/alt/title/testid/first/last/nth
browser_tabs	tab, tab new/close/switch, window new
browser_extract	get text, get html, reader-mode extraction
browser_screenshot	screenshot, pdf
browser_network	network route/unroute/requests/har
browser_storage	cookies, storage local/session
browser_eval	eval
browser_keyboard	keyboard type/inserttext, keydown, keyup, press
browser_mouse	mouse move/down/up/wheel, drag
browser_emulate	set viewport/device/geo/offline/media/headers/useragent
browser_state	state save/load/list, auth save/login/list