Geo-Code: A Code Framework for Reverse Code Generation from Geometric Images Based on Two-Stage Multi-Agent Evolution

πŸ“… 2026-02-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes the first multi-agent system–based inverse programming framework for geometric image reconstruction, addressing the challenge that existing inverse graphics methods struggle to preserve critical structural and semantic constraints in complex geometric imagery. The approach employs a two-stage pipeline: it first leverages visual operators and large language models to accurately extract pixel coordinates and visual attributes, then establishes a synthesis-rendering-verification loop where bidirectional visual feedback drives iterative code self-correction. Innovatively, the method decouples inverse generation into geometry modeling and metric-driven code evolution, orchestrated through a collaborative multi-agent mechanism. Experiments demonstrate significant improvements in geometric accuracy and visual consistency, with reconstructed outputs performing on par with original images in multimodal reasoning tasks. The authors also release the Geo-coder dataset (1,500+ samples) and the GeocodeLM model.

Technology Category

Application Category

πŸ“ Abstract
Program code serves as a bridge linking vision and logic, providing a feasible supervisory approach for enhancing the multimodal reasoning capability of large models through geometric operations such as auxiliary line construction and perspective transformation. Nevertheless, current inverse graphics methods face tremendous challenges in accurately reconstructing complex geometric details, which often results in the loss of key geometric constraints or structural distortion. To address this bottleneck, we propose Geo-coder -- the first inverse programming framework for geometric images based on a multi-agent system. Our method innovatively decouples the process into geometric modeling via pixel-wise anchoring and metric-driven code evolution: Stage 1 leverages the complementary advantages of visual operators and large models to achieve precise capture of pixel coordinates and visual attributes; Stage 2 introduces a synthesis-rendering-validation closed loop, where bidirectional visual feedback drives the self-correction of code. Extensive experiments demonstrate that Geo-coder achieves a substantial lead in both geometric reconstruction accuracy and visual consistency. Notably, by effectively preserving the core geometric semantics, the images reconstructed with our method exhibit equivalent performance to the original ones in multimodal reasoning tasks, which fully validates the robustness of the framework. Finally, to further reduce research costs, we have open-sourced the Geo-coder dataset constructed on the GeoCode framework, which contains more than 1,500 samples. On this basis, we have also open-sourced the GeocodeLM model, laying a solid data and model foundation for subsequent research in this field.
Problem

Research questions and friction points this paper is trying to address.

inverse graphics
geometric reconstruction
geometric constraints
structural distortion
multimodal reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent system
inverse programming
geometric reconstruction
code evolution
visual feedback loop
Z
Zhenyu Wu
School of Artificial Intelligence, Beijing Normal University, Beijing, China
Y
Yanxi Long
School of Artificial Intelligence, Beijing Normal University, Beijing, China
Jian Li
Jian Li
Beijing Normal University
large language modelslarge-scale machine learningstatistical learning theory
Hua Huang
Hua Huang
Beijing Normal University
Visual ComputingComputer GraphicsComputational Photography