VLA shifts from “running benchmarks” to “fixing deployment weaknesses”
The strongest theme of the day is that VLA has entered a “deployment patching” phase. Multiple works are no longer chasing larger models, but instead directly patching vulnerabilities in real use: failure of language constraints, changes in camera viewpoint, and insufficient long-horizon skill representations. A shared characteristic is minimal model changes, with more enhancement done at inference time or through data organization.
Representative sources
- Restoring Linguistic Grounding in VLA Models via Train-Free Attention Recalibration — Ninghao Zhang; Bin Zhu; Shijie Zhou; Jingjing Chen
- AnyCamVLA: Zero-Shot Camera Adaptation for Viewpoint Robust Vision-Language-Action Models — Hyeongjun Heo; Seungyeon Woo; Sang Min Kim; Junho Kim; Junho Lee; Yonghyeon Lee; …
- Hierarchical Latent Action Model — Hanjung Kim; Lerrel Pinto; Seon Joo Kim