Browser Automation
Every Egg tab speaks the Chrome DevTools Protocol natively. Three ways to drive a tab from outside Egg: raw CDP for clients that already speak it, ~120 stable micro-commands callable over HTTP, or the agent tool catalog the LLM uses. Each is a complete entry point on its own; they share the underlying engine and overlap in implementation.
Chrome DevTools Protocol
Method coverage matches the Chromium version WebView2 ships. Send any method, subscribe to any event.
The methods and events are the standard Chrome DevTools Protocol; Egg does not duplicate the protocol reference. See chromedevtools.github.io/devtools-protocol for the full method and event catalog.
# Send a CDP method to a tab
curl -s -X POST "http://127.0.0.1:$PORT/api/browser/tabs/$TAB/cdp" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"method":"Page.navigate","params":{"url":"https://eggbrowser.com"}}'
# Subscribe to CDP events as Server-Sent Events
curl -N "http://127.0.0.1:$PORT/api/browser/cdp/events" \
-H "Authorization: Bearer $TOKEN"
# → data: {"tab_id":"...","event":"Page.frameNavigated","params":{...}}
Alternatively, launch with --remote-debugging-port=9222 and attach Playwright, Puppeteer, or any CDP client to http://127.0.0.1:9222.
Micro-commands
Micro-commands are a stable, named-operation surface for driving a tab. Each is a small verb (click, fill, wait, snapshot, set viewport) that wraps the underlying engine work behind a name that does not change between Chromium versions. The shape will be familiar to anyone who has used Playwright or Puppeteer, but the surface is HTTP-native: every command is one POST, returns JSON, and is callable from any language. There is no SDK to install.
The intended audience is anyone who wants browser automation without writing CDP by hand. Script writers in shell, Python, Node, Go, or Rust call the endpoints with their HTTP library of choice. CI pipelines run them as one-shots against a headless egg. The agent harness dispatches its tool calls to the same surface, so what you script by hand is what the agent runs internally. Compared to raw CDP, micro-commands give you stability across engine versions, accessibility-first locators (@e refs), wait conditions that match how pages actually load, and a batch endpoint that runs a sequence and stops on first error.
The catalog has ~120 commands across 16 categories. Each is callable as a single POST or composed in a batch.
# Single command
curl -s -X POST "http://127.0.0.1:$PORT/api/browser/cmd" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"tab_id":"'$TAB'","command":"click","args":{"selector":"@e4"}}'
# Batch (sequential, stop on error)
curl -s -X POST "http://127.0.0.1:$PORT/api/browser/batch" \
-H "Authorization: Bearer $TOKEN" \
-d '[
{"command":"open","args":{"url":"https://example.com"}},
{"command":"wait","args":{"for":"load","state":"networkidle"}},
{"command":"snapshot","args":{}}
]'
Navigation
| Command | Purpose |
| open <url> | Navigate the tab to a URL. |
| back / forward | Step through history. |
| reload | Reload the current page. |
| close | Close the tab. |
Interaction
| Command | Purpose |
| click / dblclick <sel> | Resolve selector, dispatch real mouse events at element coordinates. |
| fill <sel> <text> | Focus, clear, type. |
| type <sel> <text> | Type without clearing. |
| press <key> | Press a key combo, e.g. Control+Enter. |
| hover, focus <sel> | Mouse-hover or programmatic focus. |
| select <sel> <value> | Select a dropdown option, fire change. |
| check / uncheck <sel> | Toggle a checkbox to a known state. |
| scroll <dir> [px] | Wheel-scroll the page or an element. |
| scrollintoview <sel> | Scroll an element into view. |
| drag <src> <dst> | Mouse down, move sequence, mouse up. |
Find & query
| Command | Purpose |
| snapshot | Full accessibility tree with sequential @e refs. See below. |
| find role / text / label / placeholder / alt / title / testid | Find elements by accessibility property. Returns refs. |
| find first / last / nth <n> <sel> | Position-based selection. |
| get text / html / value / attr / box / styles | Read element state. |
| get title / url / count | Read page-level state. |
| is visible / enabled / checked <sel> | Boolean state checks. |
Wait conditions
| Command | Purpose |
| wait <selector> | Until element exists. |
| wait <ms> | Fixed duration. |
| wait --text "..." | Until text appears anywhere on the page. |
| wait --url "..." | Until the URL matches a pattern. |
| wait --load networkidle | Until the network goes quiet. |
| wait --fn "expr" | Until a JS expression returns truthy. |
| wait <sel> --state hidden | Until an element disappears. |
| wait --download | Until a download begins or completes. |
Network
| Command | Purpose |
| network route <url> | Intercept matching requests. |
| network route <url> --abort | Block matching requests. |
| network route <url> --body <json> | Mock a response body. |
| network unroute [url] | Remove an interception rule. |
| network requests [--filter <pat>] | List captured requests. |
| network har start / stop [file] | Record HAR 1.2 logs. |
Storage & cookies
| Command | Purpose |
| cookies / cookies set / cookies clear | Read, write, and clear browser cookies. |
| storage local / storage local set / clear | localStorage read/write. |
| storage session / storage session set / clear | sessionStorage read/write. |
| clipboard read / write / copy / paste | Clipboard via the Web Clipboard API or synthesized key combos. |
Emulation & settings
| Command | Purpose |
| set viewport <w> <h> [scale] | Override device metrics. |
| set device <name> | Apply a preset (iPhone, Pixel, etc.). |
| set geo <lat> <lng> | Override geolocation. |
| set offline [on/off] | Toggle offline mode. |
| set headers <json> | Set extra HTTP headers. |
| set credentials <user> <pass> | HTTP basic auth via fetch interception. |
| set media [dark/light] | Override prefers-color-scheme. |
| set useragent <ua> | Override the user agent. |
Capture & debug
| Command | Purpose |
| screenshot [--full] [path] | Capture the viewport or the whole page. |
| pdf <path> | Save the page as a PDF. |
| eval <js> | Run JavaScript in the page context, return the result. |
| console / errors | Read accumulated console messages and exceptions. |
| highlight <sel> | Visual highlight overlay. |
| inspect | Open DevTools. |
| trace start / stop, profiler start / stop | CDP tracing and profiling. |
State & auth
| Command | Purpose |
| state save / load / list / show / rename / clear / clean | Snapshot and restore cookies + localStorage + sessionStorage + URL. |
| auth save / login / list / show / delete | Encrypted auth-state vault for switching between logged-in personas. |
Tabs & frames
| Command | Purpose |
| tab / tab new [url] / tab <n> / tab close [n] | List, create, switch, close. |
| window new | Open a new window. |
| frame <sel> / frame @e<n> / frame main | Enter an iframe, return to the top frame. |
@e refs
Run snapshot to get the page’s accessibility tree with sequential refs (@e1, @e2, …) attached to interactive nodes. Subsequent commands accept refs anywhere a selector is allowed.
[document] Egg: the browser that works for you
[navigation] Main Menu
[link @e1] Product
[link @e2] Developers
[link @e3] Pricing
[main]
[heading @e4] Build with Egg.
[textbox @e5] Search...
[button @e6] Submit
- Refs are session-scoped: valid until the page navigates or you call
snapshot again.
- Refs map to backend node IDs server-side, not CSS strings. They survive non-structural CSS changes.
Agent tools are the LLM-facing automation surface. When the agent harness opens a conversation it advertises this small, grouped catalog to the language model rather than the full ~120-verb micro-command list. The grouping mirrors an observation/action loop: separate tools for observing a page, acting on it, navigating, waiting, finding, extracting, screenshotting, and emulating. Tool names and argument shapes are tuned for clarity in tool-call output, so a session is readable when an operator audits what the model did and why.
Agent tools are not a public surface for external automation. Scripts and HTTP clients should call micro-commands directly. The catalog exists so the model sees a curated set: enough to do what it needs, few enough to reason about cleanly, and named in terms of intent rather than mechanism. Each tool dispatches to one or more micro-commands under the hood, so anything the LLM does through this catalog can be reproduced verbatim by hand against the same HTTP endpoints.
| Tool | Dispatches to |
| browser_observe | snapshot, get text/html/value/attr/box/styles, get title, get url, is visible/enabled/checked |
| browser_act | click, dblclick, fill, type, press, hover, focus, select, check, uncheck, scroll, scrollintoview, drag |
| browser_navigate | open, back, forward, reload |
| browser_wait | wait variants |
| browser_find | find role/text/label/placeholder/alt/title/testid/first/last/nth |
| browser_tabs | tab, tab new/close/switch, window new |
| browser_extract | get text, get html, reader-mode extraction |
| browser_screenshot | screenshot, pdf |
| browser_network | network route/unroute/requests/har |
| browser_storage | cookies, storage local/session |
| browser_eval | eval |
| browser_keyboard | keyboard type/inserttext, keydown, keyup, press |
| browser_mouse | mouse move/down/up/wheel, drag |
| browser_emulate | set viewport/device/geo/offline/media/headers/useragent |
| browser_state | state save/load/list, auth save/login/list |