Robotic agents are moving from “can see and act” to “can deploy and self-repair”
The main threads in robotics are clear: one line of work is improving VLA temporal world understanding, another is pushing VLA onto edge devices for real deployment, and another tries to let multimodal large models directly rewrite controller code. CoWVLA replaces full-frame prediction with latent motion, focusing on the efficiency of long-horizon dynamic modeling; LiteVLA-Edge emphasizes quantized on-device closed-loop control; AOR pushes “self-repair after failure” down to low-level control code. Together, the three point toward more deployable and more iterative robotic systems.
Representative sources
- Chain of World: World Model Thinking in Latent Motion — Fuxiang Yang; Donglin Di; Lulu Tang; Xuancheng Zhang; Lei Fan; Hao Li; …
- LiteVLA-Edge: Quantized On-Device Multimodal Control for Embedded Robotics — Justin Williams; Kishor Datta Gupta; Roy George; Mrinmoy Sarkar
- Act-Observe-Rewrite: Multimodal Coding Agents as In-Context Policy Learners for Robot Manipulation — Vaishak Kumar