Catalogue Grounded Multimodal Attribution for Museum Video under Resource and Regulatory Constraints

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the challenge of retrieving museum audiovisual archives, which is hindered by missing metadata and the high cost of manual annotation. To overcome these limitations, the authors propose a locally deployable, automated archival framework built upon an open-source video-language model. The framework employs a multi-stage pipeline that sequentially generates artwork summaries, catalog-style descriptions, and type labels, followed by conservative similarity matching to infer titles and artist attributions. Designed with constraints on computational resources, data sovereignty, and regulatory compliance in mind, the approach demonstrates significant improvements in the discoverability of audiovisual archival content in preliminary evaluations on painting catalogs. The study thus offers a transferable, application-driven machine learning paradigm tailored for high-stakes cultural heritage domains.

Technology Category

Application Category

📝 Abstract

Audiovisual (AV) archives in museums and galleries are growing rapidly, but much of this material remains effectively locked away because it lacks consistent, searchable metadata. Existing method for archiving requires extensive manual effort. We address this by automating the most labour intensive part of the workflow: catalogue style metadata curation for in gallery video, grounded in an existing collection database. Concretely, we propose catalogue-grounded multimodal attribution for museum AV content using an open, locally deployable video language model. We design a multi pass pipeline that (i) summarises artworks in a video, (ii) generates catalogue style descriptions and genre labels, and (iii) attempts to attribute title and artist via conservative similarity matching to the structured catalogue. Early deployments on a painting catalogue suggest that this framework can improve AV archive discoverability while respecting resource constraints, data sovereignty, and emerging regulation, offering a transferable template for application-driven machine learning in other high-stakes domains.

Problem

Research questions and friction points this paper is trying to address.

AV archives

metadata curation

catalogue grounding

museum video

resource constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

catalogue-grounded attribution

multimodal video understanding

locally deployable video language model