Recoleta Item Note
Auto-Browser – An MCP-native browser agent with human takeover
Auto-Browser is an open-source browser agent system for authorized scenarios that wraps a real browser as an MCP-native service and supports human takeover when automation fails. It emphasizes "log in once, reuse…
Summary
Auto-Browser is an open-source browser agent system for authorized scenarios that wraps a real browser as an MCP-native service and supports human takeover when automation fails. It emphasizes "log in once, reuse later," auditability, safety guardrails, and local self-hosting, rather than stealth scraping or bypassing anti-bot systems.
Problem
- Existing browser automation or LLM tool-calling often fails on real websites because of logins, pop-ups, complex flows, pre-CAPTCHA human-verification walls, or brittle UI flows, causing agents to be unable to complete tasks reliably.
- Many systems merely bolt a browser onto an agent framework, lacking a unified MCP interface, session persistence, human takeover, auditing, and approvals, which makes them hard to use for everyday authorized workflows.
- For real enterprise and personal scenarios, being able to safely reuse login state, avoid interrupting sessions when failures occur, and maintain traceability is important, because this directly determines whether a browser agent can enter production-assist workflows.
Approach
- The core mechanism is to implement the browser agent as an MCP server: the control layer uses FastAPI + Playwright to drive Chromium, exposing a unified tool interface to models, and it can also be invoked via REST or MCP JSON-RPC.
- The system operates through an "observe + act" loop: it returns screenshots, interactable element IDs, DOM/accessibility summaries, and optional OCR text, allowing the model to perform clicks, typing, hovering, selecting, waiting, pagination, and other actions based on the screen and page structure.
- When web flows become brittle or require human handling, it uses noVNC for human takeover, letting a person directly take over the same browser session and then resume automation afterward without losing context.
- To support "log in once, reuse later," the system can save encrypted auth state and named auth profiles and restore them in new sessions; it also provides host allowlists, upload approvals, operator identity headers, audit logs, metrics, and durable job records.
- For stronger isolation requirements, it supports per-session browser isolation with docker_ephemeral, dedicated noVNC ports, optional reverse-SSH remote access, and two model-provider integration modes: CLI and API.
Results
- The text does not provide quantitative metrics on standard benchmark datasets, nor does it report success rate, latency, cost, or public comparison figures against other browser agents.
- The strongest empirical claim made by the paper/project is runnable end-to-end capability: support for Claude Desktop, Cursor, any MCP JSON-RPC client, and direct REST calls, along with real MCP transport endpoints
/mcpand convenient tool endpoints/mcp/toolsand/mcp/tools/call. - It provides several verifiable smoke-test flows, such as
make doctorandmake release-audit, as well as scripted smoke tests for reverse-SSH, isolated sessions, and isolated session tunnels; the text states that these tests validate specific flows such as/readyz, session creation, observe, agent-step, remote noVNC connectivity, and isolated container cleanup. - In terms of interface capabilities, the system explicitly supports 1-step and multi-step agent orchestration, background durable jobs, session-level download capture, tab control, social-page assistance, upload approval gates, and Prometheus-style
/metrics, which together make up a more complete agent infrastructure than "pure Playwright scripts." - The most prominent application example is Outlook: log in once and save it as the
outlook-defaultprofile, then restore future new sessions directly from that auth profile, presented as the core demonstration of being "more useful than ordinary browser automation," but without giving success-rate or time-saved numbers.
Link
Built with Recoleta
Run your own research radar
Turn arXiv, Hacker News, OpenReview, Hugging Face Daily Papers, and RSS into local Markdown, Obsidian notes, Telegram digests, and a public site.