---
kind: trend
trend_doc_id: 63
granularity: day
period_start: '2026-03-02T00:00:00'
period_end: '2026-03-03T00:00:00'
topics:
- robotics
- VLA
- continuous-actions
- inference-optimization
- long-horizon
- online-RL
run_id: materialize-outputs
aliases:
- recoleta-trend-63
tags:
- recoleta/trend
- topic/robotics
- topic/vla
- topic/continuous-actions
- topic/inference-optimization
- topic/long-horizon
- topic/online-rl
language_code: zh-CN
---

# VLA走向连续动力学、快速推理与长时程记忆

## Overview
今天的机器人研究很集中。焦点几乎都落在视觉-语言-动作模型（VLA）上。主线很清楚：让动作更连续，让推理更快，让长期决策更稳。主要观察1.动作表示正在升级过去不少VLA输出离散动作点或固定长度动作块。今天的工作更强调连续性和世界变化。-Pri4R让模型在训练时额外预测3D点轨迹，学习“动作之后世界会怎么变”。这类监督不进入测试期，因此部署开销不变。

## Clusters

### 动作表示从离散输出转向连续动力学

多篇工作把改进重点放在动作表示本身。Pri4R在训练期加入3D点轨迹监督，让模型学到“动作如何改变世界”。NIAF把离散动作块改成连续函数，可直接得到速度、加速度与jerk。Mean-Flow则把多步流匹配压缩到单步生成，主打低时延部署。共同方向是让VLA更懂几何、更平滑、也更接近真实控制需求。

#### Representative sources
- [Pri4R: Learning World Dynamics for Vision-Language-Action Models with Privileged 4D Representation](../Inbox/2026-03-02--pri4r-learning-world-dynamics-for-vision-language-action-models-with-privileged-4d-representation.md) — Jisoo Kim; Jungbin Cho; Sanghyeok Chu; Ananya Bal; Jinhyung Kim; Gunhee Lee; …
- [Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models](../Inbox/2026-03-02--neural-implicit-action-fields-from-discrete-waypoints-to-continuous-functions-for-vision-language-action-models.md) — Haoyun Liu; Jianzhuang Zhao; Xinyuan Chang; Tianle Shi; Chuanzhang Meng; Jiayuan Tan; …
- [Mean-Flow based One-Step Vision-Language-Action](../Inbox/2026-03-02--mean-flow-based-one-step-vision-language-action.md) — Yang Chen; Xiaoguang Ma; Bin Zhao


### 推理侧优化成为VLA落地关键

另一条主线是尽量不改大模型训练成本，却直接在推理期提质提速。ATA用注意力引导和动作引导做免训练增强，在多种VLA上提升成功率。KERV把运动学预测接入speculative decoding，降低重推理代价，拿到接近1.5倍以上加速。这里的共同点是：通过更聪明的推理机制，弥补VLA在实时闭环中的短板。

#### Representative sources
- [ATA: Bridging Implicit Reasoning with Attention-Guided and Action-Guided Inference for Vision-Language Action Models](../Inbox/2026-03-02--ata-bridging-implicit-reasoning-with-attention-guided-and-action-guided-inference-for-vision-language-action-models.md) — Cheng Yang; Jianhao Jiao; Lingyi Huang; Jinqi Xiao; Zhexiang Tang; Yu Gong; …
- [KERV: Kinematic-Rectified Speculative Decoding for Embodied VLA Models](../Inbox/2026-03-02--kerv-kinematic-rectified-speculative-decoding-for-embodied-vla-models.md) — Zihao Zheng; Zhihao Mao; Maoliang Li; Jiayu Chen; Xinhao Sun; Zhaobo Zhang; …


### 长时程记忆与在线适应同步升温

长时程操作开始不再假设任务近似马尔可夫。Keyframe-Chaining用少量关键帧替代密集历史，显著提升依赖早期事件的任务成功率。π-StepNFT则在在线强化学习中扩大探索，并用逐步排序信号稳定微调流式VLA。二者都在解决同一问题：机器人不能只看眼前一步，还要能在偏离、记忆和恢复中持续决策。

#### Representative sources
- [Non-Markovian Long-Horizon Robot Manipulation via Keyframe Chaining](../Inbox/2026-03-02--non-markovian-long-horizon-robot-manipulation-via-keyframe-chaining.md) — Yipeng Chen; Wentao Tan; Lei Zhu; Fengling Li; Jingjing Li; Guoli Yang; …
- [$π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs](../Inbox/2026-03-02--p-stepnft-wider-space-needs-finer-steps-in-online-rl-for-flow-based-vlas.md) — Siting Wang; Xiaofeng Wang; Zheng Zhu; Minnan Pei; Xinyu Cui; Cheng Deng; …


### 物理结构先验扩展到高维灵巧操作

除通用机械臂VLA外，具身智能论文也在向更复杂物理结构扩展。PhysGraph把双手、工具、物体表示成物理图，在高维接触任务中强调结构先验与参数效率。这说明趋势不只是在‘更大的VLA’，也在于把物理与形体结构显式写进策略网络。

#### Representative sources
- [PhysGraph: Physically-Grounded Graph-Transformer Policies for Bimanual Dexterous Hand-Tool-Object Manipulation](../Inbox/2026-03-02--physgraph-physically-grounded-graph-transformer-policies-for-bimanual-dexterous-hand-tool-object-manipulation.md) — Runfa Blark Li; David Kim; Xinshuang Liu; Keito Suzuki; Dwait Bhatt; Nikola Raicevic; …
