AI & ML

Egg ships local model support across multiple capabilities, not only chat. Web pages get a Chrome-compatible built-in AI surface; Egglets get a typed SDK; the agent draws from the same shared registry. Every AI call (local or cloud) flows through one place: the Gateway daemon, which owns capability resolution, tier gating, credentials, and outbound provider HTTP.

This page covers what is exposed, where each capability runs, and how Egglets, web pages, and the agent each consume the surface.

One path for every AI call

The browser process exposes JavaScript APIs to web pages and Egglets but does not make routing decisions, hold credentials, or call providers. The Gateway daemon is the single place where all of that lives. Every window.Writer, ctx.ai.*, agent skill, and background-thinking call lands at one of the daemon’s /api/ai/* endpoints over loopback HTTP.

This means: one place to apply per-tier daily caps. One credential vault. One implementation per provider (Anthropic, OpenAI, Google, xAI, OpenRouter, Ollama). When the user’s tier or BYOK config changes, every consumer sees the change at once.

Daemon endpoint	Purpose
`POST /api/ai/complete`	Non-streaming text completion. Capability-resolved (canonical) or explicit provider+model override (used by in-page builtin AI shims that pin per-API models).
`POST /api/ai/stream`	Streaming text completion as Server-Sent Events. Same resolution semantics as `complete`.
`POST /api/ai/embed`	Text embedding. Returns a vector; caller can hint a dimension family (e.g. 768d or 1536d) and the gateway picks a configured asset accordingly.
`POST /api/ai/transcribe`	Audio transcription (long-form). The gateway picks a local or cloud asset based on the registry; FFmpeg conversion happens server-side when needed.
`POST /api/ai/speech-recognize`	Short-form speech recognition. Optimized for utterances under ~60s.
`POST /api/ai/generate-image`	Image generation. Saves to local disk, returns a path.
`POST /api/ai/tts`	Text-to-speech. Saves to local disk, returns a path.
`POST /api/ai/vision`	Vision-capable text completion. Accepts an image (path, URL, or base64) plus prompt; routes through the configured vision capability.
`POST /api/ai/video-generate`	Video generation (async). Submits to the configured asset, polls for completion, returns the result URL.

Egglet authors do not hit these endpoints directly. They are the daemon’s internal surface; the renderer-side ctx.ai.* SDK and the window.* built-in AI globals proxy through them.

Built-in AI APIs

Web pages running in Egg can call the same window.* AI globals that Chrome ships under its built-in AI program. Egg routes each call through the Gateway daemon, which dispatches to the user’s local Ollama install. Settings > Local Models lets the user assign a different Ollama model per API, so a small fast model can handle Summarizer while a larger model handles Writer.

API	Global	What it does
Writer	`window.Writer`	Generate content from a prompt with tone, format, and length controls.
Rewriter	`window.Rewriter`	Rewrite existing text under tone, format, and length constraints.
Summarizer	`window.Summarizer`	Summarize text with configurable summary type and length.
Translator	`window.Translator`	Translate between source and target languages.
Language Detector	`window.LanguageDetector`	Identify the language of a text block; returns scored candidates.
Proofreader	`window.Proofreader`	Check and correct grammar, return a corrected version with notes.

Each global mirrors Chrome’s shape: const w = await Writer.create(opts), then await w.write(prompt, opts) with an writeStreaming async-iterator variant. A page that already targets Chrome’s built-in AI works in Egg without code changes.

The privacy implication is structural. A page calling Summarizer.create() in Chrome on a desktop without the bundled Gemini Nano gets “not available.” The same page in Egg gets a working summarizer because Egg routes to the user’s Ollama install. No cloud API key, no per-call cost, nothing leaving the device.

// Page-side example. Identical to the Chrome built-in AI shape.
const summarizer = await Summarizer.create({
  type: "key-points",
  format: "markdown",
  length: "short",
});
const summary = await summarizer.summarize(longText);

Capability registry

Every AI-related capability Egg offers has explicit provenance: local means it runs on the user’s machine; cloud means it talks to an Egg Cloud service. The Settings panel surfaces every capability with its provenance and current install state so the user can see, at a glance, what runs where. The registry lives in the Gateway daemon; the browser reads it over HTTP so there is one source of truth for both processes.

Capability	Provenance	Backed by	Used by
Chat completions	Local	Ollama	Built-in AI APIs, `ctx.ai.text.chat` in Egglets, the agent
Speech-to-text	Local	Whisper	Agent transcription tools, voice features
Media encoding	Local	FFmpeg	Agent media tools, headless workflows
Image generation	Cloud	Egg Cloud	Agent visual tools, media generation flows
Text-to-speech	Cloud	Egg Cloud	Agent voice output

Provenance is part of the user-facing trust story. An Egglet, a web page, or an agent skill can read the registry to find out whether a capability is currently available, whether it would run on-device or in the cloud, and what to tell the user if a required asset is missing.

Local assets

Three managed assets sit behind the local capabilities. Egg manages installation through Settings > Local Models, tracks the binary path on disk, and watches for changes so a missing asset surfaces a clear prompt rather than an opaque error.

Ollama

The user’s chat models. Egg expects an Ollama server running on the local loopback. The model marketplace inside Egg lists pullable models; pulling triggers a download into Ollama’s store, and Egg picks the new model up automatically. Ollama drives the built-in AI APIs by default, the chat router’s local slot, and any Egglet using ctx.ai.text.chat with a local model assignment.

Whisper

Speech-to-text. Two pieces: the Whisper command-line binary and a chosen speech model (the base.en model is the recommended default; smaller and larger options are available). Both install through Settings > Local Models. Whisper is what the agent uses for transcription tools, dictation, voice input, and any feature that converts audio to text.

FFmpeg

Media encoding and decoding. Used wherever Egg has to transcode audio or video on the user’s machine: preparing audio for Whisper, converting attachments, generating clip thumbnails, ad-hoc media work in agent skills. FFmpeg installs through Settings > Local Models. The path is resolved per platform with a managed fallback to a bundled binary where one exists.

Cloud-served capabilities

Image generation and text-to-speech currently route through Egg Cloud rather than running locally. These require an active Egg Cloud account and a network connection. Anything that goes to the cloud is labeled as such on the capability registry, so Egglets and web pages can detect provenance and warn the user before a call leaves the device.

The reason these are not local: image generation at user-acceptable quality and speed needs more compute than most laptops have available, and high-quality on-device TTS still produces installs measured in gigabytes. Both are candidates to migrate to local-only as the underlying technology improves. When that happens, the capability registry entry flips its provenance from cloud to local; the API a web page or Egglet calls does not change.

Agent integration

The agent harness draws from the same registry every other consumer uses, and goes through the same daemon endpoints. Tools that need transcription consult Whisper; tools that need media work consult FFmpeg; the LLM router picks among Ollama or cloud models for chat. The agent does not have privileged access to cloud-served capabilities either; image generation and TTS go through the same provenance-aware path, with the cloud usage surfaced to the user.

Background thinking (the deep-reasoning loop that runs while the agent is idle) applies the tier gate before every cloud call. Local-model background thinking bypasses the gate, since the user’s own machine is doing the work and there is nothing to meter.

Skill bundles can declare capability requirements. The harness checks the registry before offering a skill that needs a missing asset, so a user with no Whisper install does not see voice transcription as an active option until they install it. The Settings > Agent > Capabilities surface lists every capability, what it is currently mapped to, and what the user can change.

For Egglets

The renderer-side AI surface is unified under EggletContext.ai, organized by modality:

ctx.ai.text — chat completions (complete, stream), embed, rerank.
ctx.ai.image — generate (Imagen), analyze (Gemini-vision).
ctx.ai.audio — tts, transcribe (local Whisper, file/URL), recognize (cloud STT, inline base64), vad, listVoices.
ctx.ai.video — generate (Grok-Imagine), analyze (Gemini-vision on video).

Every method routes through the Gateway daemon end-to-end, including tier gating, credential lookup, and the actual provider call. The Egglet picks a capability slug (chat defaults to text via manifest.uses.llm); the user picks which provider/model serves that slug in Settings. Egglets call by capability and never see a model identifier directly. The companion ctx.fetch and ctx.headless.fetch surfaces cover browser-based work outside the AI shape.