SURFACEBENCH: Can Self-Evolving LLMs Find the Equations of 3D Scientific Surfaces?

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Symbolic regression methods for discovering 3D scientific surface equations suffer from weak generalization, heavy reliance on predefined functional forms, and memorization bias. Method: We introduce SurfaceBench—the first comprehensive, surface-oriented benchmark—comprising 15 levels of symbolic complexity and 183 tasks, supporting explicit, implicit, and parametric equation representations while integrating physical semantics and geometric structure. It features three innovations: (i) scientific grounding-based construction, (ii) memorization-resistant symbolic generation, and (iii) geometry-aware evaluation via Chamfer/Hausdorff distances coupled with symbolic verification—moving beyond scalar regression. Contribution/Results: Experiments reveal significant performance degradation of current SOTA methods under cross-representation transfer and high-complexity regimes. SurfaceBench establishes a diagnostic, challenging evaluation platform for large language models in geometry-aware reasoning, compositional generalization, and data-driven scientific discovery.

Technology Category

Application Category

📝 Abstract
Equation discovery from data is a core challenge in machine learning for science, requiring the recovery of concise symbolic expressions that govern complex physical and geometric phenomena. Recent approaches with large language models (LLMs) show promise in symbolic regression, but their success often hinges on memorized formulas or overly simplified functional forms. Existing benchmarks exacerbate this limitation: they focus on scalar functions, ignore domain grounding, and rely on brittle string-matching based metrics that fail to capture scientific equivalence. We introduce SurfaceBench, first comprehensive benchmark for symbolic surface discovery. SurfaceBench comprises 183 tasks across 15 categories of symbolic complexity, spanning explicit, implicit, and parametric equation representation forms. Each task includes ground-truth equations, variable semantics, and synthetically sampled three dimensional data. Unlike prior SR datasets, our tasks reflect surface-level structure, resist LLM memorization through novel symbolic compositions, and are grounded in scientific domains such as fluid dynamics, robotics, electromagnetics, and geometry. To evaluate equation discovery quality, we pair symbolic checks with geometry-aware metrics such as Chamfer and Hausdorff distances, capturing both algebraic fidelity and spatial reconstruction accuracy. Our experiments reveal that state-of-the-art frameworks, while occasionally successful on specific families, struggle to generalize across representation types and surface complexities. SurfaceBench thus establishes a challenging and diagnostic testbed that bridges symbolic reasoning with geometric reconstruction, enabling principled benchmarking of progress in compositional generalization, data-driven scientific induction, and geometry-aware reasoning with LLMs. We release the code here: https://github.com/Sanchit-404/surfacebench
Problem

Research questions and friction points this paper is trying to address.

Discovering symbolic equations governing 3D scientific surfaces from data
Addressing limitations of existing benchmarks that focus on scalar functions
Evaluating LLMs' ability to generalize across surface representation types
Innovation

Methods, ideas, or system contributions that make the work stand out.

SurfaceBench benchmark for symbolic surface discovery
Geometry-aware metrics replacing string-matching evaluation
Novel symbolic compositions resisting LLM memorization
🔎 Similar Papers
No similar papers found.
S
Sanchit Kabra
Department of Computer Science, Virginia Tech
S
Shobhnik Kriplani
Department of Computer Science, Virginia Tech
P
Parshin Shojaee
Department of Computer Science, Virginia Tech
Chandan K. Reddy
Chandan K. Reddy
Professor, Computer Science, Virginia Tech
Deep LearningData AnalyticsBig DataHealthcareText Mining