Agents

This page covers how an Egg agent is built: the conversation harness, the autonomy gate and plan-approval flow, skills, tools, memory, the firewalls that gate what an agent can reach, and how local (Ollama) and remote (cloud) LLM providers fit into a single routing surface. Both single-agent operation and the multi-agent setup for multiple always-on local-model agents are covered here.

The Gateway page describes where the agent runs (a process layout). This page describes what the agent is (a turn loop, a tool catalog, a set of permissions, and a memory model). The two pages are complementary; if you only read one, this is the conceptual one.

Tool results return to the harness and are folded into the next LLM turn. The loop continues until the LLM stops emitting tool calls.

Conversation harness

The harness is where every turn happens. User input arrives over HTTP: POST /api/agent/chat for plain SSE streaming, POST /api/ag-ui for the AG-UI streaming protocol. The harness assembles the turn context: the agent’s identity, personality, any skills loaded for this kind of work, recent reflection state, and a tool catalog filtered by the agent’s permissions. That bundle goes to whichever LLM the agent is configured to use.

The LLM responds with text, tool calls, or both. Text streams to the client as it arrives. Each tool call is checked against the agent’s autonomy level and the agent-to-browser firewall before it runs. Approved calls are dispatched over the gateway’s RPC bridge to the running browser, or to a headless WebView2 instance if no browser is open. Results return to the harness and are folded into the next LLM turn.

The loop continues until the LLM stops emitting tool calls. Every prompt, response, tool dispatch, and result is appended to a per-agent audit log so the full reasoning trail is recoverable later.

The browser process never runs the harness. All inference, dispatch, and audit-logging lives in egg-daemon.

Plan approval & autonomy

Two distinct mechanisms, often confused, deserve to be kept apart. The autonomy gate decides whether an agent is allowed to start work on its own right now. Plan approval is a separate flow the harness can use to ask the user for explicit confirmation before continuing. They cooperate, but they aren’t the same thing, and neither is a configurable "level" on the agent.

Autonomy gate: the private window

Each agent has an optional private window: a daily start and end time during which autonomous starters are allowed to fire. The window may cross midnight (for example, 22:00 to 06:00). Inside the window, the gate is open: curiosity drain, reflection, and block-runner-driven work can begin on their own. Outside the window, the gate is closed: the agent is purely reactive and will respond when prompted but will not kick off anything by itself. If no private window is set at all, the gate is closed all day; the agent is reactive 24/7.

The gate is boolean. The daemon helper check_autonomy_allowed returns true iff the agent is currently inside its window. GET /api/agents/:agent_id/autonomy exposes that state.

Standing reservations & time blocks

Inside the private window, work is organized as time blocks. Standing reservations are recurring rules ("30 minutes per day for curiosity," "20 minutes per day for reflection"). At the start of each day, the materializer turns standing reservations into concrete blocks inside the window. Ad-hoc time blocks are one-off intervals booked by the agent or the user. Each block has a start, end, activity kind, and status (planned, running, completed, cancelled) along with a source of standing, adhoc, or user. The block runner executes blocks as their start time arrives, but only when the gate is open.

HTTP surface: /api/agents/:agent_id/private-time, /api/agents/:agent_id/reservations, /api/agents/:agent_id/time-blocks, /api/agents/:agent_id/materialize-blocks, /api/agents/:agent_id/plan-now, /api/agents/:agent_id/soft-plan.

Plan approval

The harness can ask the user to approve a step before it runs. When it does, it creates a pending approval keyed by the agent, pauses the turn, and surfaces a UI prompt. The user approves or rejects through /api/agent/approve. Approved turns continue; rejected turns surface the rejection back to the LLM so the model can adjust rather than retry blindly.

Plan approval is a mechanism, not a configuration. The harness decides per turn whether to use it, based on what the LLM is about to do, the target URL’s agent-to-browser firewall permission, and any per-task policy. There is no agent-wide setting that says "always require plan approval."

What is and isn’t gated

The autonomy gate covers self-started work: curiosity drain, reflection, the block runner. User-scheduled work (task templates, page monitors) is not gated. Those run on their own schedules, configured by the user, whenever the user said. The point of the gate is to keep the agent from doing things on its own outside its window, not to silence the agent’s response to user requests or to override schedules the user explicitly set.

Background thinking

Background thinking is a persistent self-directed session: the agent runs on its own, without a user prompt, against a configured local model. It works through items in a curiosity queue (questions and topics the agent has flagged for follow-up) and writes reflections that feed into future active turns.

Background thinking defaults to a local Ollama model. The point is to keep the lights on: the agent should be able to think about things the user has asked about, or that it has noticed, even when the user is asleep, the laptop is closed but not suspended, or the cloud is unreachable.

Endpoints: /api/background (list sessions), /api/background/start, /api/background/:id (get session), /api/background/:id/continue, /api/curiosity (the queue itself).

Skills

Skills are reusable instruction packs the harness loads into the turn context. A skill might encode "how to fill out a tax form," "how to triage GitHub notifications," or "how to draft a calendar event in the user’s voice." Each skill is a small bundle of guidance, not code.

The catalog is registered in the gateway. The harness picks which skills to load each turn based on what the agent is being asked to do, the active site (if any), and any autonomy hints. Skills are additive: loading one doesn’t take another out.

Skills are distinct from tools. Skills shape how the agent thinks; tools are what the agent can do.

Tools

Tools are the LLM-callable verbs the agent has at its disposal. Each tool maps to one or more micro-commands (the catalog of low-level browser operations) and is executed against the running browser through the gateway’s RPC bridge. From the LLM’s perspective, a tool is a JSON function call; from the browser’s perspective, it’s a sequence of CDP commands and DOM actions.

The tool catalog is assembled per turn from the agent’s enabled tools, filtered by autonomy level and any active firewall rules. A tool that’s blocked for the current target URL won’t even appear in the catalog the LLM sees, so the model can’t propose calls it isn’t allowed to make.

The full micro-command catalog and naming conventions live on the Browser Automation page. This page covers how tools fit into the turn loop; that page covers which tools exist.

Memory & identity

An agent persists more than a chat transcript.

Identity. The agent’s name, role, and core traits. Configurable through /api/identity.
Personality. A set of trait files the agent reads to shape voice and disposition. Read and write through /api/personality/:file.
Reflection. State the agent writes between sessions: what it observed, what it concluded, what to revisit. Surfaced through /api/reflection and /api/reflections.
Working folder. A directory on disk the agent owns. Notes, intermediate documents, drafts, scratch work. The agent reads and writes here directly.

Together these make the agent recognizable across sessions. A blank-slate LLM doesn’t know it; an Egg agent does.

Free agents

Egg supports multiple agents running in parallel. Each has its own identity, personality, autonomy level, and configured model. Most users will run one. Power users might run several, each specialized: a research agent, a calendar agent, a monitoring agent.

Free agents communicate by leaving messages in each other’s inboxes. Reads are async. There is no implicit loop where one agent calls another and waits; coordination happens through the inbox protocol so a slow or stalled agent never blocks the others.

Heartbeats let the gateway track which agents are healthy. If an agent stops responding within an expected window, the gateway flags it and surfaces the state. A halted agent does not silently disappear.

The free-agents architecture is what makes background thinking practical at scale. Different specialized agents can each carry their own ongoing context without one giant shared transcript.

Firewalls

Two distinct rule tables gate what an agent can reach. They sit at different layers and exist for different reasons.

Inter-agent

The inter-agent firewall is a set of rules that gate one agent’s ability to send another a message. It exists because an agent is a software entity that can write to inboxes; without rules, a buggy or runaway agent could flood another agent’s inbox or fan out work uncontrollably.

Rules are simple: who can talk to whom, optionally constrained by message type or topic. The default policy is restrictive, so enabling cross-agent communication is always a deliberate choice rather than implicit behavior.

Agent-to-browser

The agent-to-browser firewall is a URL-pattern permission table that gates what an agent can do when it reaches into the browser. Every browser-touching tool call is checked against this table before it runs. Permission categories:

Permission	Effect
`block`	The agent cannot interact with the URL at all. The tool call is refused before it runs.
`ask`	The user is prompted for one-time approval before the call runs.
`read`	The agent may read page contents. It cannot submit forms or click destructive controls.
`allow`	The agent may interact freely with the URL.

Default seeds are conservative. Financial sites are ask. Healthcare sites are block. Email is read. Most other categories are allow until the user narrows them. The seed list is small and biased toward erring on the side of asking.

The firewall is editable in the agent settings UI. Rules are per-pattern and order-sensitive: the first matching rule wins. A specific subdomain rule can override a category default.

LLM providers & routing

Each agent is configured with one or more model providers. The gateway holds the provider catalog and the routing rules. The router picks which provider serves each turn based on agent configuration, task type, and any per-task overrides the user has set.

Local: Ollama

Ollama is Egg’s default for always-on workloads. Background thinking, free agents, and any task tagged "local-only" route here. Models run on the user’s hardware, with no cloud round-trip and no per-token cost.

The model marketplace inside the app surfaces the catalog. Pulling a model triggers a download into Ollama’s local store; the gateway picks it up automatically. Recommended baseline models are surfaced first; advanced users can install anything Ollama supports.

Ollama is a strategic part of the agent architecture, not a fallback. The expectation is that most ambient agent work happens locally; cloud is reserved for capability bursts.

Remote: cloud providers

Anthropic, OpenAI, and other compatible providers are supported for high-capability turns. API keys are stored under /credentials and never sent to the LLM as part of the prompt; the gateway holds them and attaches them to outbound requests at the network layer.

Routing rules let the user pin specific tasks to specific models. A common pattern: long-context summarization on a frontier cloud model, day-to-day chat on a local model. The router carries out those choices without the agent or the harness needing to know which provider answered.

Tool execution model

The full path a single tool call takes, from LLM JSON to the browser and back:

The LLM responds with a tool-call JSON object.
The harness checks the agent-to-browser firewall against the target URL. block refuses the call; ask opens a plan-approval prompt and pauses the turn until the user resolves it; read permits read-only operations and refuses anything mutating; allow permits everything.
If approved, the call is dispatched over the gateway’s RPC bridge to the running browser, or to a headless WebView2 instance if the browser isn’t open.
The micro-command runs: navigate, click, evaluate, snapshot, etc.
The result is returned to the harness.
The result is included in the prompt for the next LLM turn.
Every step is appended to the audit log.

If any check fails, the tool call doesn’t run. The failure reason is included in the next LLM turn so the model can adjust rather than retry blindly.