Action representation is shifting from discrete outputs to continuous dynamics
Several works focus their improvements on the action representation itself. Pri4R adds 3D point trajectory supervision during training so the model learns “how actions change the world.” NIAF replaces discrete action chunks with continuous functions, allowing direct access to velocity, acceleration, and jerk. Mean-Flow compresses multi-step flow matching into one-step generation, targeting low-latency deployment. The shared direction is to make VLA better understand geometry, smoother, and closer to real control needs.
Representative sources
- Pri4R: Learning World Dynamics for Vision-Language-Action Models with Privileged 4D Representation — Jisoo Kim; Jungbin Cho; Sanghyeok Chu; Ananya Bal; Jinhyung Kim; Gunhee Lee; …
- Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models — Haoyun Liu; Jianzhuang Zhao; Xinyuan Chang; Tianle Shi; Chuanzhang Meng; Jiayuan Tan; …
- Mean-Flow based One-Step Vision-Language-Action — Yang Chen; Xiaoguang Ma; Bin Zhao