Learn2Fold: Structured Origami Generation with World Model Planning

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of directly generating physically feasible and semantically consistent complex origami folding sequences from natural language descriptions. It proposes a neuro-symbolic framework that formulates origami generation as conditional program induction over crease graphs: leveraging large language models to produce candidate folding programs and employing a differentiable graph-structured world model to verify physical feasibility and perform lookahead planning. This approach achieves, for the first time, a decoupling of semantic generation and physical validation, effectively integrating symbolic reasoning with embodied simulation to generate long-horizon, high-fidelity origami sequences from sparse textual prompts. Experiments demonstrate that the system reliably produces folding sequences for both complex and out-of-distribution origami patterns, significantly improving physical plausibility and semantic alignment.
📝 Abstract
The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence. As a result, origami demands long-horizon constructive reasoning that jointly satisfies precise physical laws and high-level semantic intent. Existing approaches fall into two disjoint paradigms: optimization-based methods enforce physical validity but require dense, precisely specified inputs, making them unsuitable for sparse natural language descriptions, while generative foundation models excel at semantic and perceptual synthesis yet fail to produce long-horizon, physics-consistent folding processes. Consequently, generating valid origami folding sequences directly from text remains an open challenge. To address this gap, we introduce Learn2Fold, a neuro-symbolic framework that formulates origami folding as conditional program induction over a crease-pattern graph. Our key insight is to decouple semantic proposal from physical verification. A large language model generates candidate folding programs from abstract text prompts, while a learned graph-structured world model serves as a differentiable surrogate simulator that predicts physical feasibility and failure modes before execution. Integrated within a lookahead planning loop, Learn2Fold enables robust generation of physically valid folding sequences for complex and out-of-distribution patterns, demonstrating that effective spatial intelligence arises from the synergy between symbolic reasoning and grounded physical simulation.
Problem

Research questions and friction points this paper is trying to address.

origami generation
physical intelligence
long-horizon reasoning
text-to-folding
kinematic constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic
origami generation
world model
differentiable simulation
program induction
🔎 Similar Papers
No similar papers found.