代码模型把“自纠错”学进参数
代码生成开始从“写出答案”转向“先写、再反思、再修正”。ReflexiCoder用强化学习把这条轨迹直接学进模型参数,目标是在没有外部测试器或评论器时也能自我调试。它强调两点:一是减少推理期外部依赖,二是把多轮修复压缩成更省 token 的内生能力。这说明代码模型竞争点正在从首答质量,转向可内化的纠错能力。代表文献还显示,这类能力与智能体失败解释、故障分类形成互补:前者提升修复,后者提升诊断。
Representative sources
- ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning — Juyong Jiang; Jiasi Shen; Sunghun Kim; Kang Min Yoo; Jeonghoon Kim; Sungju Kim
- XAI for Coding Agent Failures: Transforming Raw Execution Traces into Actionable Insights — Arun Joshi
- Characterizing Faults in Agentic AI: A Taxonomy of Types, Symptoms, and Root Causes — Mehil B Shah; Mohammad Mehdi Morovati; Mohammad Masudur Rahman; Foutse Khomh