VLAs shift toward on-demand reasoning and failure recovery
The strongest theme this week is pushing vision-language-action models (VLAs) from demo systems toward deployable systems. Methods are no longer trying to do heavy re-reasoning at every step, and instead emphasize on-demand activation, asynchronous scheduling, and failure recovery. The representative work is Tri-System: it uses a visual Critic to monitor execution and only wakes the high-level VLM when a subtask is completed, an incident occurs, or progress stalls, clearly outperforming single-system and dual-system approaches on real long-horizon tasks.
Representative sources
- Critic in the Loop: A Tri-System VLA Framework for Robust Long-Horizon Manipulation — Pengfei Yi; Yingjie Ma; Wenjiang Xu; Yanan Hao; Shuai Gan; Wanting Li; …