动作表示从离散输出转向连续动力学
多篇工作把改进重点放在动作表示本身。Pri4R在训练期加入3D点轨迹监督,让模型学到“动作如何改变世界”。NIAF把离散动作块改成连续函数,可直接得到速度、加速度与jerk。Mean-Flow则把多步流匹配压缩到单步生成,主打低时延部署。共同方向是让VLA更懂几何、更平滑、也更接近真实控制需求。
Representative sources
- Pri4R: Learning World Dynamics for Vision-Language-Action Models with Privileged 4D Representation — Jisoo Kim; Jungbin Cho; Sanghyeok Chu; Ananya Bal; Jinhyung Kim; Gunhee Lee; …
- Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models — Haoyun Liu; Jianzhuang Zhao; Xinyuan Chang; Tianle Shi; Chuanzhang Meng; Jiayuan Tan; …
- Mean-Flow based One-Step Vision-Language-Action — Yang Chen; Xiaoguang Ma; Bin Zhao