Recoleta Item Note

Auto-Browser – An MCP-native browser agent with human takeover

Auto-Browser is an open-source browser agent system for authorized scenarios that wraps a real browser as an MCP-native service and supports human takeover when automation fails. It emphasizes "log in once, reuse…

Software Intelligence

mcpbrowser-agenthuman-in-the-loopplaywrightauth-session-reuseagent-orchestration

Open GitHub Source markdown

Summary

Auto-Browser is an open-source browser agent system for authorized scenarios that wraps a real browser as an MCP-native service and supports human takeover when automation fails. It emphasizes "log in once, reuse later," auditability, safety guardrails, and local self-hosting, rather than stealth scraping or bypassing anti-bot systems.

Problem

Existing browser automation or LLM tool-calling often fails on real websites because of logins, pop-ups, complex flows, pre-CAPTCHA human-verification walls, or brittle UI flows, causing agents to be unable to complete tasks reliably.
Many systems merely bolt a browser onto an agent framework, lacking a unified MCP interface, session persistence, human takeover, auditing, and approvals, which makes them hard to use for everyday authorized workflows.
For real enterprise and personal scenarios, being able to safely reuse login state, avoid interrupting sessions when failures occur, and maintain traceability is important, because this directly determines whether a browser agent can enter production-assist workflows.

Approach

The core mechanism is to implement the browser agent as an MCP server: the control layer uses FastAPI + Playwright to drive Chromium, exposing a unified tool interface to models, and it can also be invoked via REST or MCP JSON-RPC.
The system operates through an "observe + act" loop: it returns screenshots, interactable element IDs, DOM/accessibility summaries, and optional OCR text, allowing the model to perform clicks, typing, hovering, selecting, waiting, pagination, and other actions based on the screen and page structure.
When web flows become brittle or require human handling, it uses noVNC for human takeover, letting a person directly take over the same browser session and then resume automation afterward without losing context.
To support "log in once, reuse later," the system can save encrypted auth state and named auth profiles and restore them in new sessions; it also provides host allowlists, upload approvals, operator identity headers, audit logs, metrics, and durable job records.
For stronger isolation requirements, it supports per-session browser isolation with docker_ephemeral, dedicated noVNC ports, optional reverse-SSH remote access, and two model-provider integration modes: CLI and API.

Results

The text does not provide quantitative metrics on standard benchmark datasets, nor does it report success rate, latency, cost, or public comparison figures against other browser agents.
The strongest empirical claim made by the paper/project is runnable end-to-end capability: support for Claude Desktop, Cursor, any MCP JSON-RPC client, and direct REST calls, along with real MCP transport endpoints /mcp and convenient tool endpoints /mcp/tools and /mcp/tools/call.
It provides several verifiable smoke-test flows, such as make doctor and make release-audit, as well as scripted smoke tests for reverse-SSH, isolated sessions, and isolated session tunnels; the text states that these tests validate specific flows such as /readyz, session creation, observe, agent-step, remote noVNC connectivity, and isolated container cleanup.
In terms of interface capabilities, the system explicitly supports 1-step and multi-step agent orchestration, background durable jobs, session-level download capture, tab control, social-page assistance, upload approval gates, and Prometheus-style /metrics, which together make up a more complete agent infrastructure than "pure Playwright scripts."
The most prominent application example is Outlook: log in once and save it as the outlook-default profile, then restore future new sessions directly from that auth profile, presented as the core demonstration of being "more useful than ordinary browser automation," but without giving success-rate or time-saved numbers.

Link

https://github.com/LvcidPsyche/auto-browser

Built with Recoleta

Run your own research radar

Turn arXiv, Hacker News, OpenReview, Hugging Face Daily Papers, and RSS into local Markdown, Obsidian notes, Telegram digests, and a public site.

View repo 5-minute quickstart