Runtime Failure Hunting for Physics Engine Based Software Systems: How Far Can We Go?

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Physical engines (PEs) widely deployed in safety-critical systems—such as autonomous driving and medical robotics—frequently exhibit semantic-level physical failures, i.e., deviations from real-world physical behavior. However, existing testing approaches predominantly rely on white-box access and focus on crash detection, rendering them ineffective for identifying such subtle, semantics-driven failures. This paper presents the first large-scale empirical study to systematically characterize manifestations and root causes of physical failures in PEs, proposing the first fine-grained taxonomy. We comparatively evaluate diverse detection techniques and integrate deep learning, prompt engineering, and multimodal large language models to enable automated, semantic-level failure identification. We release PhysiXFails—an open-source benchmark dataset—and accompanying code, tools, and reproducible pipelines. Furthermore, informed by developer surveys, we propose actionable, deployable improvement strategies. Our work establishes a foundational framework—grounded in theory, empirical evidence, and practical implementation—for enhancing PE reliability.

Technology Category

Application Category

📝 Abstract

Physics Engines (PEs) are fundamental software frameworks that simulate physical interactions in applications ranging from entertainment to safety-critical systems. Despite their importance, PEs suffer from physics failures, deviations from expected physical behaviors that can compromise software reliability, degrade user experience, and potentially cause critical failures in autonomous vehicles or medical robotics. Current testing approaches for PE-based software are inadequate, typically requiring white-box access and focusing on crash detection rather than semantically complex physics failures. This paper presents the first large-scale empirical study characterizing physics failures in PE-based software. We investigate three research questions addressing the manifestations of physics failures, the effectiveness of detection techniques, and developer perceptions of current detection practices. Our contributions include: (1) a taxonomy of physics failure manifestations; (2) a comprehensive evaluation of detection methods including deep learning, prompt-based techniques, and large multimodal models; and (3) actionable insights from developer experiences for improving detection approaches. To support future research, we release PhysiXFails, code, and other materials at https://sites.google.com/view/physics-failure-detection.

Problem

Research questions and friction points this paper is trying to address.

Detecting physics failures in Physics Engine-based software systems

Evaluating effectiveness of current physics failure detection techniques

Understanding developer perceptions on physics failure detection practices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale empirical study on physics failures

Evaluation of deep learning detection methods

Developer insights for improving failure detection

🔎 Similar Papers

Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis

2024-06-27arXiv.orgCitations: 1