🤖 AI Summary
Inverse problem optimization lacks interpretability for domain experts, hindering trustworthy deployment in critical fields such as healthcare and climate science. To address this, we propose a novel interpretability framework that instruments differentiable simulators to capture forward- and backward-propagation events along the optimization trajectory, then encodes these events as structured natural language descriptions. A large language model (LLM) subsequently maps these descriptions to domain-specific abstractions, generating human-understandable explanations of optimization behavior. This is the first method to directly translate iterative optimization trajectories into domain-level natural language explanations. Experiments on neural network training and canonical inverse problems demonstrate significant improvements in explanation accuracy and expert acceptance, effectively bridging the interpretability gap between low-level optimizer dynamics and high-level domain semantics.
📝 Abstract
Inverse problems are central to a wide range of fields, including healthcare, climate science, and agriculture. They involve the estimation of inputs, typically via iterative optimization, to some known forward model so that it produces a desired outcome. Despite considerable development in the explainability and interpretability of forward models, the iterative optimization of inverse problems remains largely cryptic to domain experts. We propose a methodology to produce explanations, from traces produced by an optimizer, that are interpretable by humans at the abstraction of the domain. The central idea in our approach is to instrument a differentiable simulator so that it emits natural language events during its forward and backward passes. In a post-process, we use a Language Model to create an explanation from the list of events. We demonstrate the effectiveness of our approach with an illustrative optimization problem and an example involving the training of a neural network.