Trend brief · 2026-03-04

Robot research shifts toward memory evaluation, structured control, and large-scale benchmarks

Robot research was highly concentrated on this day. The key theme was not simply “larger models,” but a clearer decomposition of where capabilities come from: memory, benchmarks, structured control, and continual…

7 tracked topics

Embodied AI

robotics vla memory benchmark continual-learning dexterous-manipulation dual-arm

Source markdown

Overview

Memory has become the clearest theme, but the research focus has shifted from “adding history to the model” to “which tasks need which kind of memory.”
Benchmark construction continues to accelerate. One line of work expands simulation scale, while another begins to fill in standardized real-world evaluation.
Structural priors are becoming important again. Both dual-arm systems and dexterous hands are replacing end-to-end entangled control with more compositional representations.

Clusters

Robot memory is moving from a capability slogan to systematic evaluation and hierarchical design

The strongest theme of the day is that robots need memory, but memory is not a single module. RoboMME first standardizes memory evaluation and shows that different tasks depend on different memory representations and injection methods. MEM then turns this into an operational system: short-term video memory handles details, while long-term language memory tracks task progress. Taken together, these two works shift the discussion from “whether memory is needed” to “how to allocate memory types by task.”

Representative sources

RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies — Yinpei Dai; Hongze Fu; Jayjun Lee; Yuejiang Liu; Haoran Zhang; Jianing Yang; …
MEM: Multi-Scale Embodied Memory for Vision Language Action Models — Marcel Torne; Karl Pertsch; Homer Walke; Kyle Vedder; Suraj Nair; Brian Ichter; …

Large-scale benchmarks are expanding across both simulation and the real world

The second theme is that generalist robots need larger and more unified training and evaluation environments. RoboCasa365 scales up tasks, scenes, and demonstrations together, using a unified protocol to measure multitask training, pretraining gains, and lifelong learning challenges. ManipulationNet, by contrast, brings the focus back to the real world, emphasizing standardized object kits, submission workflows, and centralized review in an attempt to build a comparable and verifiable real-world manipulation benchmark.

Representative sources

RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots — Soroush Nasiriany; Sepehr Nasiriany; Abhiram Maddukuri; Yuke Zhu
ManipulationNet: An Infrastructure for Benchmarking Real-World Robot Manipulation with Physical Skill Challenges and Embodied Multimodal Reasoning — Yiting Chen; Kenneth Kimble; Edward H. Adelson; Tamim Asfour; Podshara Chanrungmaneekul; Sachin Chitta; …

Structured action representations are beginning to replace monolithic black-box control

The third theme is that generality no longer comes only from larger models, but also from better structural inductive bias. SkillVLA decomposes dual-arm manipulation into reusable single-arm skills with communication enabled only when needed, addressing the near-total failure on unseen skill pairings. SAT rewrites dexterous-hand actions as 3D structured sequences organized by joint, allowing the same model to transfer more naturally across hand morphologies. Both reduce unnecessary coupling in action representations.

Representative sources

SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse — Xuanran Zhai; Zekai Huang; Longyan Wu; Qianyou Zhao; Qiaojun Yu; Jieji Ren; …
Structural Action Transformer for 3D Dexterous Manipulation — Xiaohan Lei; Min Wang; Bohong Weng; Wengang Zhou; Houqiang Li

Pretrained VLAs show stronger resistance to forgetting in continual learning

A somewhat optimistic conclusion has emerged in continual learning: catastrophic forgetting in large pretrained VLAs may be less severe than expected. This work shows that simple experience replay is enough for Pi0 and GR00T to maintain low forgetting across multiple LIBERO suites, clearly outperforming smaller models trained from scratch. This suggests that future skill library expansion may depend more on pretrained backbones plus limited replay than on complex anti-forgetting techniques.

Representative sources

Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning — Huihan Liu; Changyeon Kim; Bo Liu; Minghuan Liu; Yuke Zhu
RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots — Soroush Nasiriany; Sepehr Nasiriany; Abhiram Maddukuri; Yuke Zhu

Built with Recoleta

Run your own research radar

Turn arXiv, Hacker News, OpenReview, Hugging Face Daily Papers, and RSS into local Markdown, Obsidian notes, Telegram digests, and a public site.

View repo 5-minute quickstart