Idea brief · 2026-03-12

MCP agent infrastructure and production governance are heating up in parallel

The most worthwhile opportunities to follow today are not in 'building yet another more general agent,' but in filling in the runtime and governance layers required to bring agents into real workflows. The three…

The most worthwhile opportunities to follow today are not in 'building yet another more general agent,' but in filling in the runtime and governance layers required to bring agents into real workflows. The three strongest lines of evidence are:

  1. The MCP interface layer is starting to become usable: browsers, memory, and documents are all turning into system components that agents can directly connect to, rather than scattered plugins.
  2. Production governance is becoming a primary product layer rather than a side requirement: trace, replay, circuit breakers, sandboxing, contract-first, approvals, and auditing are all appearing at once, suggesting that enterprises are beginning to build both 'pre-deployment validation' and 'runtime constraint' capabilities for agents.
  3. High-constraint scenarios are providing more concrete deployment shapes: requirements engineering already has relatively strong quantitative results, while hospital scenarios are offering clear architectural directions around constrained execution and structured memory.

Therefore, the higher-value why-now opportunity in this cycle is to build concrete product or research wedges around 'how to make agents testable, auditable, and connectable to real systems,' rather than telling an abstract platform story around generalized capability.

3 opportunities

MCP sandbox and acceptance environment for agents in internal business workflows

Kind·tooling_wedgeTime horizon·near
Role
For platform engineering teams and test leads piloting agents for procurement, customer service operations, financial data entry, and internal back-office tasks; their job is to get agents working in a controlled environment first, then decide whether to grant real permissions.
Thesis

You could build a pre-deployment validation environment for enterprise internal operations teams that puts the MCP tool catalog, browser sessions, mock APIs, approval points, and trace/replay into a single workbench. The goal is not to replace agent frameworks, but to let teams validate the observable behavioral boundaries of agents on web pages and APIs before connecting them to real systems.

Why now

This is now possible because, for the first time, the key pieces of infrastructure required for agents to connect to real systems can be assembled into a closed loop: web interaction, tool contracts, observability and replay, and approval and auditing all now have clear implementation paths. The market gap is not in making yet another agent, but in integrating these production-governance capabilities into a pre-deployment validation layer.

What changed

The change is not in the model itself, but in the runtime components becoming complete enough: browsers can now be exposed natively via MCP, with support for human takeover and login-state reuse; mock/sandbox setups are being explicitly introduced into agent deployment workflows; and production tracing and replay are also becoming easy to integrate. Previously, these capabilities were usually scattered across different teams or homegrown scripts.

Validation next step

Find 5 teams that already have internal agent PoCs, collect their 10 most common high-risk actions (login, download, upload, modify records, send messages, call internal APIs), and connect browser MCP, mock APIs, approval gates, and trace/replay with a minimal product to verify whether a regression check can be turned from a manual script into a repeatable acceptance workflow.

Evidence

Multi-agent requirements negotiation and verifiable spec generation for high-constraint software projects

Kind·new_buildTime horizon·near
Role
For software architects, requirements engineers, and compliance leads in finance, autonomous driving, medical devices, and industrial systems; their job is to produce requirements specifications that are traceable, verifiable, and deliverable under conflicting constraints from multiple stakeholders.
Thesis

You could build a requirements-engineering workbench for software architects and product/compliance teams: split quality-attribute conflicts across multiple specialist agents for negotiation, then automatically turn the results into traceable documents, KAOS/specification drafts, and API contract drafts that feed downstream testing and sandbox workflows.

Why now

Historically, tools for this kind of requirements analysis were blocked by two issues: model outputs were hard to hold accountable for, and results were difficult to connect to downstream engineering workflows. Now multi-agent negotiation has quantitative results, and document provenance and bridges are starting to standardize, so a product that goes from requirements conflicts to verifiable specification drafts is much more feasible than before.

What changed

What changed is that requirements engineering today shows stronger evidence than generic agent orchestration: it is not just saying multi-agent systems have potential, but demonstrating explicit negotiation protocols, automated verification, and quantified improvements. At the same time, the interface layer for document collaboration and agent bridges is more mature, making it easier to turn analysis results into team artifacts.

Validation next step

Focus first on one high-constraint industry and run a controlled study on 20 real requirements documents: compare the manual process with 'multi-agent negotiation + spec draft generation' on conflict-exposure rate, review rework cycles, and speed of conversion into test artifacts, prioritizing validation of whether it truly reduces back-and-forth during requirements review.

Evidence

Constrained agent runtime layer and structured long-term memory for dynamic hospital workflows

Kind·research_gapTime horizon·frontier
Role
For hospital IT departments, clinical informatics teams, and healthcare software vendors; their job is to gradually introduce agent capabilities into chart summarization, cross-department collaboration, and long-horizon case analysis without breaking privacy, permission, and audit boundaries.
Thesis

It is worth researching and productizing a constrained agent runtime layer for hospital IT teams: agents can only call pre-audited skills, all cross-role collaboration lands in a document event stream, and long-term patient-record context uses page-indexed memory or a similar structured memory approach rather than a pure vector database.

Why now

This is a good time to enter because high-constraint industries are finally showing system-design signals that are closer to real deployment, rather than only general agent demos. Although clinical quantitative results are still limited, the infrastructure direction is already clear: constrained execution, auditable event streams, and traceable long-term memory. That gives healthcare software vendors a clear wedge from research into productization.

What changed

The change is that demand for agents in healthcare is no longer stuck at 'can they help with Q&A,' but is becoming specific about runtime constraints: forbid broad-permission execution, use documents as the center of collaboration, and replace fragmented vector retrieval with incrementally maintained structured memory. At the same time, local-first memory and observability tools also provide reusable engineering foundations.

Validation next step

Do not start with a hospital-wide system. First pick a document-heavy but relatively controllable scenario, such as MDT case organization or post-discharge follow-up summaries, and build a prototype with a constrained skill whitelist, document-event auditing, and structured memory. Compare it with current RAG approaches on update cost, explanatory traceability for retrospection, and permission isolation.

Evidence
Built with Recoleta

Run your own research radar

Turn arXiv, Hacker News, OpenReview, Hugging Face Daily Papers, and RSS into local Markdown, Obsidian notes, Telegram digests, and a public site.