MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of manual industry classification in business registration data, which is labor-intensive and struggles to adapt to evolving taxonomies. To overcome these limitations, the authors propose a training-free, multimodal automatic classification approach that integrates textual sources—such as corporate websites, Wikipedia, and Wikidata—with geospatial data from OpenStreetMap and satellite imagery. They establish the first multimodal benchmark for industry classification and introduce a multi-agent system framework that simulates expert validation through iterative interactions, context augmentation, and explanation generation—all without model training. Evaluated on a dataset of 1,000 European enterprises, the method achieves classification accuracies of 74.10% and 62.10% using closed-source and open-source multimodal large language models, respectively, yielding performance improvements of up to 22.80%.
📝 Abstract
Industry classification schemes are integral parts of public and corporate databases as they classify businesses based on economic activity. Due to the size of the company registers, manual annotation is costly, and fine-tuning models with every update in industry classification schemes requires significant data collection. We replicate the manual expert verification by using existing or easily retrievable multimodal resources for industry classification. We present MONETA, the first multimodal industry classification benchmark with text (Website, Wikipedia, Wikidata) and geospatial sources (OpenStreetMap and satellite imagery). Our dataset enlists 1,000 businesses in Europe with 20 economic activity labels according to EU guidelines (NACE). Our training-free baseline reaches 62.10% and 74.10% with open and closed-source Multimodal Large Language Models (MLLM). We observe an increase of up to 22.80% with the combination of multi-turn design, context enrichment, and classification explanations. We will release our dataset and the enhanced guidelines.
Problem

Research questions and friction points this paper is trying to address.

industry classification
multimodal data
geographic information
company registers
NACE
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal industry classification
geospatial information
multimodal large language models
training-free baseline
multi-agent systems
🔎 Similar Papers
No similar papers found.
A
Arda Yüksel
Trustworthy Human Language Technologies, Technical University of Darmstadt, Germany; Research Center for Trustworthy Data Science and Security, Ruhr University Bochum, Germany
G
Gabriel Thiem
Technical University of Darmstadt, Germany; Deutsche Bundesbank
S
Susanne Walter
Deutsche Bundesbank
P
Patrick Felka
Deutsche Bundesbank
G
Gabriela Alves Werb
Deutsche Bundesbank; Frankfurt University of Applied Sciences, Germany
Ivan Habernal
Ivan Habernal
Ruhr University Bochum
natural language processingprivacy-preserving NLPlegal NLPargumentation mining