Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the capability of large language multimodal models to faithfully translate complex structured digital images into executable code—a task demanding both precise visual perception and symbolically accurate code generation. To this end, we introduce Omni-I2C, a comprehensive benchmark comprising 1,080 real-world user samples spanning diverse topics, image modalities, and programming languages, and formally frame image-to-code generation as a unified task balancing perceptual fidelity and symbolic precision. Leveraging a novel decoupled evaluation framework and a dual-dimensional assessment protocol that separately measures perceptual and symbolic performance, our analysis uncovers fundamental limitations in current models’ structural understanding and logical consistency. Experimental results demonstrate substantial performance gaps among state-of-the-art models on this high-complexity task, particularly in simultaneously ensuring syntactic correctness and visual-semantic alignment in generated code.

Technology Category

Application Category

📝 Abstract
We present Omni-I2C, a comprehensive benchmark designed to evaluate the capability of Large Multimodal Models (LMMs) in converting complex, structured digital graphics into executable code. We argue that this task represents a non-trivial challenge for the current generation of LMMs: it demands an unprecedented synergy between high-fidelity visual perception -- to parse intricate spatial hierarchies and symbolic details -- and precise generative expression -- to synthesize syntactically sound and logically consistent code. Unlike traditional descriptive tasks, Omni-I2C requires a holistic understanding where any minor perceptual hallucination or coding error leads to a complete failure in visual reconstruction. Omni-I2C features 1080 meticulously curated samples, defined by its breadth across subjects, image modalities, and programming languages. By incorporating authentic user-sourced cases, the benchmark spans a vast spectrum of digital content -- from scientific visualizations to complex symbolic notations -- each paired with executable reference code. To complement this diversity, our evaluation framework provides necessary depth; by decoupling performance into perceptual fidelity and symbolic precision, it transcends surface-level accuracy to expose the granular structural failures and reasoning bottlenecks of current LMMs. Our evaluation reveals a substantial performance gap among leading LMMs; even state-of-the-art models struggle to preserve structural integrity in complex scenarios, underscoring that multimodal code generation remains a formidable challenge. Data and code are available at https://github.com/MiliLab/Omni-I2C.
Problem

Research questions and friction points this paper is trying to address.

Image-to-Code Generation
Large Multimodal Models
High-Fidelity Visual Perception
Executable Code Synthesis
Multimodal Benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Image-to-Code Generation
Multimodal Benchmark
Perceptual Fidelity
Symbolic Precision
Executable Code Synthesis
🔎 Similar Papers
No similar papers found.