Trend brief · 2026-03-05

Software agents are moving from task enhancement toward execution loops and domain reliability

6 tracked topics

software-agents coding-agents terminal-agents tool-creation repo-automation domain-agents

Overview

Today’s software-agent research is clearly moving from merely writing code to preparing tasks, setting up environments, and operating over long durations. The highlights are no longer just model capability, but also preprocessing, execution loops, and engineering constraints. Main observations - task input is becoming a core lever. CodeScout shows that first doing small-scope pre-exploration of the repository, then filling in reproduction steps, expected behavior, and repair hints, can significantly improve real bug-fixing performance. Compared with sending the agent in directly, this kind of upfront enhancement is more stable. - executability environment automation is filling in a key gap.

Clusters

Enhance the problem first, then execute the fix

Code agents are beginning to shift their focus from “stronger models” to “better task inputs.” CodeScout first performs lightweight pre-exploration of a repository, then rewrites vague requests into executable problem statements, directly reducing blind search and repeated repair attempts. This direction emphasizes clarifying the task before letting the agent act.

Representative sources

CodeScout: Contextual Problem Statement Enhancement for Software Agents — Manan Suri; Xiangci Li; Mehdi Shojaie; Songyang Han; Chao-Chun Hsu; Shweta Garg; …

Code agents are moving down into real repository execution environments

Another main thread is automating the very act of “getting the repository running.” RepoLaunch handles dependencies, compilation, and testing across multiple languages and platforms, and distills successful experience into reproducible scripts. This shows that the focus of software agents is expanding from one-off patches to complete engineering environments.

Representative sources

RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform — Kenan Li; Rongzhi Li; Linghao Zhang; Qirui Jin; Liao Zhu; Xiaosong Huang; …

Terminal-native agents are entering an engineering phase

Terminal-native agents continue to gain momentum, but the discussion is focused more on system design than on a single leaderboard. OpenDev summarizes separation of planning and execution, lazy tool discovery, adaptive context compression, and multi-layer safety guardrails, reflecting how the community is beginning to build CLI agents as long-running software systems.

Representative sources

Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned — Nghi D. Q. Bui

Benchmarks are starting to test agents’ tool-building ability

Evaluation is also being upgraded. Tool-Genesis no longer assumes tool interfaces are already known, but instead tests whether agents can design and implement tools themselves from abstract requirements. The results show that one-shot generation is fragile, while closed-loop repair is significantly more effective. This pushes the research focus from being able to call tools to being able to build tools and repair tools.

Representative sources

Tool-Genesis: A Task-Driven Tool Creation Benchmark for Self-Evolving Language Agent — Bowei Xia; Mengkang Hu; Shijian Wang; Jiarui Jin; Wenxiang Jiao; Yuan Lu; …

Domain agents achieve high reliability through retrieval and validation

Domain-specific agents remain a high-certainty value area. MOOSEnger combines retrieval-augmented generation with deterministic syntax prechecks and runtime validation, substantially raising executability in multiphysics configuration generation from a very low general-purpose baseline. The trend is that high-risk, high-rule-density tasks are better suited to a general agent foundation plus domain validators.

Representative sources

MOOSEnger -- a Domain-Specific AI Agent for the MOOSE Ecosystem — Mengnan Li; Jason Miller; Zachary Prince; Alexander Lindsay; Cody Permann

Built with Recoleta

Run your own research radar

Turn arXiv, Hacker News, OpenReview, Hugging Face Daily Papers, and RSS into local Markdown, Obsidian notes, Telegram digests, and a public site.

View repo 5-minute quickstart