VLA开始强调按需推理与失败恢复
这组工作把研究重点从“更大模型”转向“更聪明的调度”。Tri-System在高层视觉语言模型(VLM)和低层视觉语言动作模型(VLA)之间加入视觉Critic,只在完成、事故或停滞时重规划。Act-Think-Abstain则把每次执行先分成直接做、先思考、或拒绝做三类。共同信号很明确:实时性、安全性和分布外稳健性,正在成为VLA系统设计的一等目标。
Representative sources
- Critic in the Loop: A Tri-System VLA Framework for Robust Long-Horizon Manipulation — Pengfei Yi; Yingjie Ma; Wenjiang Xu; Yanan Hao; Shuai Gan; Wanting Li; …
- Act, Think or Abstain: Complexity-Aware Adaptive Inference for Vision-Language-Action Models — Riccardo Andrea Izzo; Gianluca Bardaro; Matteo Matteucci