Browser Automation

Opta includes a Playwright-based browser automation system that allows the AI to navigate web pages, interact with UI elements, capture screenshots, and execute JavaScript -- all under policy control and approval gating.

What is Browser Automation?

Browser automation gives the AI agent the ability to operate a real web browser programmatically. When the model determines that a task requires web interaction -- browsing documentation, filling out forms, testing a web application, or scraping data -- it can invoke browser tools to accomplish these actions.

This is not a simulated browser or a headless HTTP client. The automation runs a full Chromium browser instance via Playwright, with complete JavaScript execution, CSS rendering, and DOM interaction. The AI sees the page through accessibility tree snapshots and screenshots, making decisions about what to click, type, or navigate to next.

Playwright Foundation

The browser automation system is built on @playwright/mcp, which exposes 30+ browser control tools through the Model Context Protocol. Playwright provides:

Cross-browser support (Chromium, Firefox, WebKit)
Reliable element selection via accessibility tree and CSS selectors
Network interception and request monitoring
Screenshot and video recording capabilities
File upload and download handling
Multi-tab and multi-window management

Chromium by default

Opta uses Chromium as the default browser engine. This provides the best compatibility with modern web applications and the most reliable automation behavior.

Unlike scripted automation where every step is predetermined, Opta's browser automation is AI-driven. The model receives a high-level task (e.g., "find the pricing page and extract the enterprise plan cost") and decides which browser tools to invoke at each step.

The typical flow is:

The model calls navigate to load a URL
It calls snapshot to read the accessibility tree and understand the page structure
Based on the accessibility tree, it calls click, type, or other interaction tools
It may call screenshot to visually verify the result
It repeats until the task is complete

MCP Tool Routing

Browser tools are routed through the BrowserMcpInterceptor, which sits between the daemon's tool router and the Playwright MCP server. The interceptor:

Validates tool parameters before forwarding to Playwright
Applies policy rules to determine if the action requires approval
Logs all tool invocations for session recording
Handles error recovery and retry logic

Tool Routing Flow

Model requests tool → Daemon tool router
    → BrowserMcpInterceptor
        → Policy evaluation (approve / deny / ask)
        → @playwright/mcp server
            → Chromium browser instance
    ← Result returned to model

Policy and Approval

Not all browser actions are automatically approved. The policy system evaluates each tool call against the current permission rules:

Navigation to allowed domains -- auto-approved
Screenshots and snapshots -- auto-approved (read-only)
Clicks and form input -- may require approval depending on the target domain
JavaScript evaluation -- requires approval (can execute arbitrary code)
File uploads -- requires approval (sends local files to external servers)

In do mode, safe browser actions (navigation, screenshots, snapshots) are auto-approved to enable fluent autonomous browsing. Destructive or data-exfiltration-risk actions still require explicit confirmation.

Parallel Tab Support

The browser automation system supports multiple tabs running concurrently. The model can open new tabs, switch between them, and perform actions in parallel. This is useful for:

Comparing content across multiple pages simultaneously
Performing searches in one tab while reading results in another
Testing multi-tab workflows in web applications

Tab management tools include tabs (list all open tabs) and navigate_back (browser history navigation). Each tab maintains its own independent state and navigation history.

Daemon Controls

Tools