🤖 AI Summary
This work proposes an end-to-end automated visual asset generation method to address inefficiencies in digital collage creation, including cumbersome retrieval, manual image matting, and disorganized asset management. By integrating image annotation, object detection, and segmentation techniques, the approach uniquely leverages a large language model (LLM) to interpret user-provided narrative descriptions and automatically generate semantically coherent primary and auxiliary tags. These tags guide the multi-scale cropping and semantic-level clustering of visual assets, ensuring alignment with the intended narrative. The proposed framework significantly enhances both the efficiency of asset preparation and the semantic consistency between source materials and the creative narrative, thereby enabling users to focus more intently on artistic composition and expressive intent.
📝 Abstract
Digital collage is an artistic practice that combines image cutouts to tell stories. However, preparing cutouts from a set of photos remains a tedious and time-consuming task. A formative study identified three main challenges: 1) inefficient search for relevant photos, 2) manual image cutout, and 3) difficulty in organizing large sets of cutouts. To meet these challenges and facilitate asset preparation for collage, we propose Collaposer, a tool that transforms a collection of photos into organized, ready-to-use visual cutouts based on user-provided story descriptions. Collaposer tags, detects, and segments photos, and then uses an LLM to select central and related labels based on the user-provided story description. Collaposer presents the resulting visuals in varying sizes, clustered according to semantic hierarchy. Our evaluation shows that Collaposer effectively automates the preparation process to produce diverse sets of visual cutouts adhering to the storyline, allowing users to focus on collaging these assets for storytelling. Project website: https://jiayzhou.github.io/collaposer-website/