机器人智能体从“会看会做”走向“会部署、会修复”
机器人方向的主线很清楚:一类工作在提升VLA的时序世界理解,另一类工作在把VLA真正压到边缘设备上,还有工作尝试让多模态大模型直接改写控制器代码。CoWVLA用潜在运动替代整帧预测,重点解决长时序动态建模的效率问题;LiteVLA-Edge强调量化后本地闭环;AOR则把“失败后可自修复”推进到低层控制代码。三者共同指向更可部署、更可迭代的机器人系统。
Representative sources
- Chain of World: World Model Thinking in Latent Motion — Fuxiang Yang; Donglin Di; Lulu Tang; Xuancheng Zhang; Lei Fan; Hao Li; …
- LiteVLA-Edge: Quantized On-Device Multimodal Control for Embedded Robotics — Justin Williams; Kishor Datta Gupta; Roy George; Mrinmoy Sarkar
- Act-Observe-Rewrite: Multimodal Coding Agents as In-Context Policy Learners for Robot Manipulation — Vaishak Kumar