Recoleta Item Note

Data Analogies Enable Efficient Cross-Embodiment Transfer

This paper studies which data organization methods are most effective for cross-robot morphology transfer. The conclusion is: compared with simply increasing the number of heterogeneous demonstrations, paired…

cross-embodiment-transferrobot-data-scalingvision-language-actiontrajectory-pairingsim2real

This paper studies which data organization methods are most effective for cross-robot morphology transfer. The conclusion is: compared with simply increasing the number of heterogeneous demonstrations, paired demonstrations with cross-robot "data analogies," especially trajectory-level pairing, are more effective at improving few-shot cross-embodiment transfer.

  • The paper addresses the following question: when the target robot has only a small number of examples, how can data from other robots, viewpoints, and scenes be used to improve the target robot's task success rate?
  • This is important because generalist robot policies increasingly rely on large-scale heterogeneous data, but it is still unclear whether what truly helps is "more data" or "more structured data."
  • Especially under morphology differences (different grippers / robot arms), simply piling on more data may fail to learn transferable control correspondences.
  • Without changing the model architecture or training algorithm, the paper studies only data composition: under a fixed budget, it compares coverage (targeted vs. diverse) and pairing (unpaired / task-paired / trajectory-paired).
  • It proposes data analogies: demonstrations that are cross-embodiment but aligned in scene, task instance, or execution trajectory, allowing the model to see "how different robots do the same thing."
  • In simulation, it systematically controls three types of distribution shift: viewpoint, morphology, appearance; on real robots, it verifies whether the same trends hold.
  • Trajectory pairing uses DTW to align cross-robot trajectories for the same task instance; during training, these "translation dataset" samples are jointly used in a 50:50 ratio with the target robot's 50-shot data to fine-tune a pretrained VLA (pi_0.5-style).
  • In simulation, compared with the large-scale but unpaired open dataset OXE, the authors' compositional OXE+Translational data design improves average success rate by 19%.
  • In real-world experiments, changing only the data composition improves average success rate by 22.5% over large-scale unpaired data.
  • For morphology shifts, pairing matters more than diversity alone: the paper reports that targeted-trajectory-paired and diverse-trajectory-paired achieve about 62% vs. 64%, respectively, while the average gap between paired and unpaired settings is about 23%.
  • For viewpoint and appearance, increasing diversity is more effective; as diversity increases, success rate rises by about 17% on average, and trajectory pairing is still on average 6% better than weaker pairing schemes.
  • For morphology scaling, increasing diversity without pairing is almost ineffective, with performance only around 42% -> 44%; this indicates that simply adding more robot arm / gripper samples is insufficient to bridge differences in control and kinematics.
  • In terms of experimental setup, in the real world each transfer direction uses 50 source-robot demonstrations and 50 translational demonstrations per axis / scene / robot; simulation results are based on 100 random seeds, and real-world results are based on 5 random initializations.
Built with Recoleta

Run your own research radar

Turn arXiv, Hacker News, OpenReview, Hugging Face Daily Papers, and RSS into local Markdown, Obsidian notes, Telegram digests, and a public site.