🤖 AI Summary
Existing fuzzing tools for AI systems exhibit poor generalizability and high rates of invalid inputs when generating test cases for highly structured 3D data (e.g., meshes, point clouds).
Method: We propose the first graph-based unified test input generation framework: (1) mapping diverse structured inputs to constraint graphs; (2) designing a neighborhood-similarity-guided graph mutation strategy; and (3) introducing a constraint-driven graph refinement mechanism to jointly enforce structural validity and semantic preservation. The framework supports cross-modal structural modeling, joint structure–semantics verification, and prediction consistency analysis.
Results: Evaluated on eight real-world AI systems, our approach achieves up to 2.8× higher structural validity, improves semantic retention by 41.3%, reduces invalid input discard rate by 67.5%, and maintains tractable generation overhead—outperforming baselines including AFL and MeshAttack.
📝 Abstract
[Context] Modern AI applications increasingly process highly structured data, such as 3D meshes and point clouds, where test input generation must preserve both structural and semantic validity. However, existing fuzzing tools and input generators are typically handcrafted for specific input types and often generate invalid inputs that are subsequently discarded, leading to inefficiency and poor generalizability. [Objective] This study investigates whether test inputs for structured domains can be unified through a graph-based representation, enabling general, reusable mutation strategies while enforcing structural constraints. We will evaluate the effectiveness of this approach in enhancing input validity and semantic preservation across eight AI systems. [Method] We develop and evaluate GRAphRef, a graph-based test input generation framework that supports constraint-based mutation and refinement. GRAphRef maps structured inputs to graphs, applies neighbor-similarity-guided mutations, and uses a constraint-refinement phase to repair invalid inputs. We will conduct a confirmatory study across eight real-world mesh-processing AI systems, comparing GRAphRef with AFL, MeshAttack, Saffron, and two ablated variants. Evaluation metrics include structural validity, semantic preservation (via prediction consistency), and performance overhead. Experimental data is derived from ShapeNetCore mesh seeds and model outputs from systems like MeshCNN and HodgeNet. Statistical analysis and component latency breakdowns will be used to assess each hypothesis.