VLA进入“轻改造、强适配”阶段
当天最强主线是把预训练视觉-语言-动作模型从“能用”推向“更稳可迁移”。一类工作直接改微调容量分配:LoRA-SP 用按样本激活的动态秩替代固定秩,缓解跨任务和跨机器人本体时的容量不足与调参敏感。另一类工作在不重训骨干的前提下补时间记忆:TempoFit 复用中间层 K/V 缓存,让单帧决策模型获得长时序上下文。两者共同指向一个趋势:VLA 不再只拼更大底座,而是通过更轻量、可插拔的机制提升部署适应性。
Representative sources
- Adaptive Capacity Allocation for Vision Language Action Fine-tuning — Donghoon Kim; Minji Bae; Unghui Nam; Gyeonghun Kim; Suyun Lee; Kyuhong Shim; …
- TempoFit: Plug-and-Play Layer-Wise Temporal KV Memory for Long-Horizon Vision-Language-Action Manipulation — Jun Sun; Boyu Yang; Jiahao Zhang; Ning Ma; Chencheng Wu; Siqing Zhang; …