Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-similarity metrics (e.g., BLEU, BERTScore) fail to capture logical dependencies and spatiotemporal constraints among sewing steps, hindering accurate evaluation of temporal ordering and spatial coherence in automatically generated sewing instructions. Method: We propose a tree-structured automatic evaluation framework that models instruction sequences as ordered dependency trees, explicitly encoding both temporal precedence and spatial reliance among steps. Leveraging tree-aware representation learning and large language model (LLM)-generated counterfactual perturbations, we construct a domain-adapted robustness validation pipeline. Contribution/Results: Our metric achieves strong correlations with human annotations—ρ = 0.89 with manual error counts and ρ = 0.92 with human quality scores—significantly outperforming baseline methods. It provides a novel, interpretable, highly correlated, and robust evaluation paradigm for embodied reasoning–oriented generation tasks.

Technology Category

Application Category

📝 Abstract
In this paper, we propose a novel, automatic tree-based evaluation metric for LLM-generated step-by-step assembly instructions, that more accurately reflects spatiotemporal aspects of construction than traditional metrics such as BLEU and BERT similarity scores. We apply our proposed metric to the domain of sewing instructions, and show that our metric better correlates with manually-annotated error counts as well as human quality ratings, demonstrating our metric's superiority for evaluating the spatiotemporal soundness of sewing instructions. Further experiments show that our metric is more robust than traditional approaches against artificially-constructed counterfactual examples that are specifically constructed to confound metrics that rely on textual similarity.
Problem

Research questions and friction points this paper is trying to address.

Evaluating spatiotemporal consistency in automatically generated sewing instructions
Developing a tree-based metric for LLM-generated assembly instructions
Improving evaluation robustness against artificially-constructed counterfactual examples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree-based metric evaluates LLM assembly instructions
Measures spatiotemporal consistency in sewing instructions
Robust against counterfactual examples confusing text similarity
🔎 Similar Papers
No similar papers found.
L
Luisa Geiger
Congree Language Technologies GmbH
M
Mareike Hartmann
Department of Language Science and Technology, Saarland University
M
Michael Sullivan
Department of Language Science and Technology, Saarland University
Alexander Koller
Alexander Koller
Professor of Computational Linguistics, Saarland University, Saarland Informatics Campus
Computational linguisticsartificial intelligence