🤖 AI Summary
Current embodied agents typically encode their policies as black-box representations within neural networks or prompts, which are difficult to inspect, reuse, or compose. This work proposes a white-box policy learning framework that represents policies as typed, executable knowledge entries. The approach leverages trajectory replay to drive localized policy edits and incorporates a verification-gated mechanism to ensure correctness. Notably, it eliminates the need for large language model queries during inference by relying on a tool-constrained intelligent editing loop and a deterministic symbolic executor. This enables interpretable, editable, and verifiable policy evolution. Evaluated on long-horizon text-based agents and object-centric manipulation tasks, the framework maintains high performance while significantly improving policy inspectability, local editability, and deployment safety.
📝 Abstract
Modern embodied agents achieve impressive performance, but their task knowledge is often stored in neural weights, latent state, or prompt-bound memory, making individual policy knowledge difficult to inspect, validate, recombine, and reuse. We introduce \textbf{Kintsugi}, a white-box policy-learning framework that treats embodied policy improvement as verifier-gated construction of a typed executable Knowledge Base (KB). Kintsugi represents task-level policy knowledge as composable typed entries -- predicates, operators, policy schemas, monitors, recovery rules, experience records, and goals -- and improves this artifact through localized typed edits induced from rollout evidence, rather than relying on test-time language-model reasoning. Between rollouts, a tool-constrained agentic editing loop diagnoses trajectory failures, localizes them to editable KB layers, and proposes candidate edits. A deterministic verification gate admits an edit only when the candidate type-checks, the resulting KB executes, and focused validation success or trajectory-health metrics improve without violating protected-regression checks. At inference, the accepted KB is executed by a deterministic symbolic executor with zero LLM calls. Across long-horizon text-agent benchmarks and representative object-centric manipulation settings, Kintsugi achieves strong endpoint performance while preserving inspectability, local editability, and verifier-gated deployment. These results suggest that embodied policy improvement can be organized around executable task knowledge.