🤖 AI Summary
This work addresses the problem of personalized and generalizable household object rearrangement without explicit instructions, by implicitly learning user preferences from environmental context. To this end, we introduce PARSEC—the first benchmark for preference-aware rearrangement—comprising 110K crowdsourced samples, 93 object categories, and 15 distinct environments. We propose ContextSortLM, a novel architecture that pioneers preference-adaptive modeling via joint integration of historical and current scene contexts, featuring a scene-encoding–preference-alignment module and a multi-source context fusion inference mechanism. Additionally, we establish a crowdsourcing-driven preference evaluation protocol. Experiments demonstrate that our method significantly outperforms baselines in target-user arrangement reproduction and ranks among the top two in human evaluations across three unseen environments. Ablations validate the efficacy of multi-context modeling, while analysis reveals key challenges in cross-environment semantic transfer.
📝 Abstract
Object rearrangement is a key task for household robots requiring personalization without explicit instructions, meaningful object placement in environments occupied with objects, and generalization to unseen objects and new environments. To facilitate research addressing these challenges, we introduce PARSEC, an object rearrangement benchmark for learning user organizational preferences from observed scene context to place objects in a partially arranged environment. PARSEC is built upon a novel dataset of 110K rearrangement examples crowdsourced from 72 users, featuring 93 object categories and 15 environments. We also propose ContextSortLM, an LLM-based rearrangement model that places objects in partially arranged environments by adapting to user preferences from prior and current scene context while accounting for multiple valid placements. We evaluate ContextSortLM and existing personalized rearrangement approaches on the PARSEC benchmark and complement these findings with a crowdsourced evaluation of 108 online raters ranking model predictions based on alignment with user preferences. Our results indicate that personalized rearrangement models leveraging multiple scene context sources perform better than models relying on a single context source. Moreover, ContextSortLM outperforms other models in placing objects to replicate the target user's arrangement and ranks among the top two in all three environment categories, as rated by online evaluators. Importantly, our evaluation highlights challenges associated with modeling environment semantics across different environment categories and provides recommendations for future work.