🤖 AI Summary
Automated grocery packing for fresh food and perishables in retail logistics remains underexplored, particularly concerning damage mitigation when fragile and heavy items are co-packed.
Method: We propose the first zero-shot, multimodal large language model–based framework for grocery packing strategy generation. Leveraging vision-language models (VLMs) for item recognition and semantic understanding, the framework employs hierarchical prompt engineering to emulate human packing logic—requiring no category-specific annotations or model retraining.
Contribution/Results: Evaluated on a real-world supermarket product dataset, our approach significantly outperforms rule-based baselines in packing合理性 (structural soundness) and safety (damage prevention). The modular design enables plug-and-play integration of upgraded foundation models. To foster reproducibility and community advancement, the source code will be publicly released.
📝 Abstract
Robotics and automation are increasingly influential in logistics but remain largely confined to traditional warehouses. In grocery retail, advancements such as cashier-less supermarkets exist, yet customers still manually pick and pack groceries. While there has been a substantial focus in robotics on the bin picking problem, the task of packing objects and groceries has remained largely untouched. However, packing grocery items in the right order is crucial for preventing product damage, e.g., heavy objects should not be placed on top of fragile ones. However, the exact criteria for the right packing order are hard to define, in particular given the huge variety of objects typically found in stores. In this paper, we introduce LLM-Pack, a novel approach for grocery packing. LLM-Pack leverages language and vision foundation models for identifying groceries and generating a packing sequence that mimics human packing strategy. LLM-Pack does not require dedicated training to handle new grocery items and its modularity allows easy upgrades of the underlying foundation models. We extensively evaluate our approach to demonstrate its performance. We will make the source code of LLMPack publicly available upon the publication of this manuscript.