Learn2Fold: Structured Origami Generation with World Model Planning

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the challenge of directly generating physically feasible and semantically consistent complex origami folding sequences from natural language descriptions. It proposes a neuro-symbolic framework that formulates origami generation as conditional program induction over crease graphs: leveraging large language models to produce candidate folding programs and employing a differentiable graph-structured world model to verify physical feasibility and perform lookahead planning. This approach achieves, for the first time, a decoupling of semantic generation and physical validation, effectively integrating symbolic reasoning with embodied simulation to generate long-horizon, high-fidelity origami sequences from sparse textual prompts. Experiments demonstrate that the system reliably produces folding sequences for both complex and out-of-distribution origami patterns, significantly improving physical plausibility and semantic alignment.

Technology Category

Application Category

📝 Abstract

The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence. As a result, origami demands long-horizon constructive reasoning that jointly satisfies precise physical laws and high-level semantic intent. Existing approaches fall into two disjoint paradigms: optimization-based methods enforce physical validity but require dense, precisely specified inputs, making them unsuitable for sparse natural language descriptions, while generative foundation models excel at semantic and perceptual synthesis yet fail to produce long-horizon, physics-consistent folding processes. Consequently, generating valid origami folding sequences directly from text remains an open challenge. To address this gap, we introduce Learn2Fold, a neuro-symbolic framework that formulates origami folding as conditional program induction over a crease-pattern graph. Our key insight is to decouple semantic proposal from physical verification. A large language model generates candidate folding programs from abstract text prompts, while a learned graph-structured world model serves as a differentiable surrogate simulator that predicts physical feasibility and failure modes before execution. Integrated within a lookahead planning loop, Learn2Fold enables robust generation of physically valid folding sequences for complex and out-of-distribution patterns, demonstrating that effective spatial intelligence arises from the synergy between symbolic reasoning and grounded physical simulation.

Problem

Research questions and friction points this paper is trying to address.

origami generation

physical intelligence

long-horizon reasoning

text-to-folding

kinematic constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic

origami generation

world model