PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a method for automatically generating high-density, highly realistic 3D interactive scenes to support high-quality data collection in robotic simulation. The approach explicitly models complex physical relationships—including contact, support, balance, and containment—through a novel closed-loop framework that integrates large language model (LLM) agents with a physics engine. In this framework, the LLM iteratively generates object configurations based on spatial and physical predicates, while the physics solver validates their feasibility and provides feedback to refine the layout. By combining probabilistic programming with heuristic rules, the method achieves both scene stability and fine-grained controllability. Experiments demonstrate that the generated scenes significantly outperform existing approaches in complexity, visual fidelity, and physical plausibility, offering a unified and efficient pipeline for high-fidelity 3D scene generation in robotic manipulation tasks.

Technology Category

Application Category

📝 Abstract
Automatically generating interactive 3D environments is crucial for scaling up robotic data collection in simulation. While prior work has primarily focused on 3D asset placement, it often overlooks the physical relationships between objects (e.g., contact, support, balance, and containment), which are essential for creating complex and realistic manipulation scenarios such as tabletop arrangements, shelf organization, or box packing. Compared to classical 3D layout generation, producing complex physical scenes introduces additional challenges: (a) higher object density and complexity (e.g., a small shelf may hold dozens of books), (b) richer supporting relationships and compact spatial layouts, and (c) the need to accurately model both spatial placement and physical properties. To address these challenges, we propose PhyScensis, an LLM agent-based framework powered by a physics engine, to produce physically plausible scene configurations with high complexity. Specifically, our framework consists of three main components: an LLM agent iteratively proposes assets with spatial and physical predicates; a solver, equipped with a physics engine, realizes these predicates into a 3D scene; and feedback from the solver informs the agent to refine and enrich the configuration. Moreover, our framework preserves strong controllability over fine-grained textual descriptions and numerical parameters (e.g., relative positions, scene stability), enabled through probabilistic programming for stability and a complementary heuristic that jointly regulates stability and spatial relations. Experimental results show that our method outperforms prior approaches in scene complexity, visual quality, and physical accuracy, offering a unified pipeline for generating complex physical scene layouts for robotic manipulation.
Problem

Research questions and friction points this paper is trying to address.

physical scene generation
3D layout
object relationships
robotic simulation
scene complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-Augmented LLM
Physical Scene Generation
Robotic Simulation
Stability-aware Layout
LLM Agent with Physics Feedback