Recoleta Item Note
Data Analogies Enable Efficient Cross-Embodiment Transfer
This paper studies which data organization methods are most effective for cross-robot morphology transfer. The conclusion is: compared with simply increasing the number of heterogeneous demonstrations, paired…
Summary
This paper studies which data organization methods are most effective for cross-robot morphology transfer. The conclusion is: compared with simply increasing the number of heterogeneous demonstrations, paired demonstrations with cross-robot "data analogies," especially trajectory-level pairing, are more effective at improving few-shot cross-embodiment transfer.
Problem
- The paper addresses the following question: when the target robot has only a small number of examples, how can data from other robots, viewpoints, and scenes be used to improve the target robot's task success rate?
- This is important because generalist robot policies increasingly rely on large-scale heterogeneous data, but it is still unclear whether what truly helps is "more data" or "more structured data."
- Especially under morphology differences (different grippers / robot arms), simply piling on more data may fail to learn transferable control correspondences.
Approach
- Without changing the model architecture or training algorithm, the paper studies only data composition: under a fixed budget, it compares coverage (targeted vs. diverse) and pairing (unpaired / task-paired / trajectory-paired).
- It proposes data analogies: demonstrations that are cross-embodiment but aligned in scene, task instance, or execution trajectory, allowing the model to see "how different robots do the same thing."
- In simulation, it systematically controls three types of distribution shift: viewpoint, morphology, appearance; on real robots, it verifies whether the same trends hold.
- Trajectory pairing uses DTW to align cross-robot trajectories for the same task instance; during training, these "translation dataset" samples are jointly used in a 50:50 ratio with the target robot's 50-shot data to fine-tune a pretrained VLA (pi_0.5-style).
Results
- In simulation, compared with the large-scale but unpaired open dataset OXE, the authors' compositional OXE+Translational data design improves average success rate by 19%.
- In real-world experiments, changing only the data composition improves average success rate by 22.5% over large-scale unpaired data.
- For morphology shifts, pairing matters more than diversity alone: the paper reports that targeted-trajectory-paired and diverse-trajectory-paired achieve about 62% vs. 64%, respectively, while the average gap between paired and unpaired settings is about 23%.
- For viewpoint and appearance, increasing diversity is more effective; as diversity increases, success rate rises by about 17% on average, and trajectory pairing is still on average 6% better than weaker pairing schemes.
- For morphology scaling, increasing diversity without pairing is almost ineffective, with performance only around 42% -> 44%; this indicates that simply adding more robot arm / gripper samples is insufficient to bridge differences in control and kinematics.
- In terms of experimental setup, in the real world each transfer direction uses 50 source-robot demonstrations and 50 translational demonstrations per axis / scene / robot; simulation results are based on 100 random seeds, and real-world results are based on 5 random initializations.
Link
Built with Recoleta
Run your own research radar
Turn arXiv, Hacker News, OpenReview, Hugging Face Daily Papers, and RSS into local Markdown, Obsidian notes, Telegram digests, and a public site.