Browser Automation
Opta includes a Playwright-based browser automation system that allows the AI to navigate web pages, interact with UI elements, capture screenshots, and execute JavaScript -- all under policy control and approval gating.
What is Browser Automation?
Browser automation gives the AI agent the ability to operate a real web browser programmatically. When the model determines that a task requires web interaction -- browsing documentation, filling out forms, testing a web application, or scraping data -- it can invoke browser tools to accomplish these actions.
This is not a simulated browser or a headless HTTP client. The automation runs a full Chromium browser instance via Playwright, with complete JavaScript execution, CSS rendering, and DOM interaction. The AI sees the page through accessibility tree snapshots and screenshots, making decisions about what to click, type, or navigate to next.
Playwright Foundation
The browser automation system is built on @playwright/mcp, which exposes 30+ browser control tools through the Model Context Protocol. Playwright provides:
- Cross-browser support (Chromium, Firefox, WebKit)
- Reliable element selection via accessibility tree and CSS selectors
- Network interception and request monitoring
- Screenshot and video recording capabilities
- File upload and download handling
- Multi-tab and multi-window management
AI-Driven Navigation
Unlike scripted automation where every step is predetermined, Opta's browser automation is AI-driven. The model receives a high-level task (e.g., "find the pricing page and extract the enterprise plan cost") and decides which browser tools to invoke at each step.
The typical flow is:
- The model calls
navigateto load a URL - It calls
snapshotto read the accessibility tree and understand the page structure - Based on the accessibility tree, it calls
click,type, or other interaction tools - It may call
screenshotto visually verify the result - It repeats until the task is complete
MCP Tool Routing
Browser tools are routed through the BrowserMcpInterceptor, which sits between the daemon's tool router and the Playwright MCP server. The interceptor:
- Validates tool parameters before forwarding to Playwright
- Applies policy rules to determine if the action requires approval
- Logs all tool invocations for session recording
- Handles error recovery and retry logic
Model requests tool → Daemon tool router
→ BrowserMcpInterceptor
→ Policy evaluation (approve / deny / ask)
→ @playwright/mcp server
→ Chromium browser instance
← Result returned to modelPolicy and Approval
Not all browser actions are automatically approved. The policy system evaluates each tool call against the current permission rules:
- Navigation to allowed domains -- auto-approved
- Screenshots and snapshots -- auto-approved (read-only)
- Clicks and form input -- may require approval depending on the target domain
- JavaScript evaluation -- requires approval (can execute arbitrary code)
- File uploads -- requires approval (sends local files to external servers)
In do mode, safe browser actions (navigation, screenshots, snapshots) are auto-approved to enable fluent autonomous browsing. Destructive or data-exfiltration-risk actions still require explicit confirmation.
Parallel Tab Support
The browser automation system supports multiple tabs running concurrently. The model can open new tabs, switch between them, and perform actions in parallel. This is useful for:
- Comparing content across multiple pages simultaneously
- Performing searches in one tab while reading results in another
- Testing multi-tab workflows in web applications
Tab management tools include tabs (list all open tabs) and navigate_back (browser history navigation). Each tab maintains its own independent state and navigation history.