Towards deployment-centric multimodal AI beyond vision and language

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal AI research overemphasizes vision-language (V+L) modalities and often neglects deployment constraints until late stages, hindering real-world adoption. To address this, we propose a *deployment-centric multimodal AI* paradigm that integrates deployability as a first-class design objective throughout the entire development lifecycle—extending beyond V+L to non-standard domains including healthcare, engineering, and climate science, as well as broader socio-technical systems. Methodologically, we introduce a cross-modal, multi-level fusion architecture incorporating heterogeneous multimodal data modeling, deployment-aware neural network design, domain-specific constraint embedding, and an open collaborative framework. We validate our approach on three real-world use cases: pandemic response, autonomous vehicle design, and climate change adaptation. Results demonstrate substantial improvements in model deployability and societal utility, while revealing cross-disciplinary deployment bottlenecks. Our work provides a principled methodology for sustainable, application-oriented multimodal AI.

Technology Category

Application Category

📝 Abstract
Multimodal artificial intelligence (AI) integrates diverse types of data via machine learning to improve understanding, prediction, and decision-making across disciplines such as healthcare, science, and engineering. However, most multimodal AI advances focus on models for vision and language data, while their deployability remains a key challenge. We advocate a deployment-centric workflow that incorporates deployment constraints early to reduce the likelihood of undeployable solutions, complementing data-centric and model-centric approaches. We also emphasise deeper integration across multiple levels of multimodality and multidisciplinary collaboration to significantly broaden the research scope beyond vision and language. To facilitate this approach, we identify common multimodal-AI-specific challenges shared across disciplines and examine three real-world use cases: pandemic response, self-driving car design, and climate change adaptation, drawing expertise from healthcare, social science, engineering, science, sustainability, and finance. By fostering multidisciplinary dialogue and open research practices, our community can accelerate deployment-centric development for broad societal impact.
Problem

Research questions and friction points this paper is trying to address.

Expanding multimodal AI beyond vision and language applications
Addressing deployability challenges in multimodal AI solutions
Integrating multidisciplinary expertise for real-world AI deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deployment-centric workflow integrates constraints early
Deeper multimodality integration beyond vision and language
Multidisciplinary collaboration for real-world AI solutions
🔎 Similar Papers
No similar papers found.
Xianyuan Liu
Xianyuan Liu
University of Sheffield
Deep LearningMaterials DesignMachine Learning
Jiayang Zhang
Jiayang Zhang
AI Research Engineer, The University of Sheffield
Healthcare AIAI for biomedicineMultimodal AI
S
Shuo Zhou
School of Computer Science, University of Sheffield, Sheffield, UK
T
Thijs L. van der Plas
The Alan Turing Institute, London, UK
A
Avish Vijayaraghavan
Department of Metabolism, Digestion and Reproduction, Imperial College London, London, UK
Anastasiia Grishina
Anastasiia Grishina
Former: Simula and University of Oslo (Norway), Alan Turing Institute (UK)
large language modelsautomated code repairsustainability
M
Mengdie Zhuang
Information School, University of Sheffield, Sheffield, UK
Daniel Schofield
Daniel Schofield
Institute of Health Informatics, University College London, London, UK
C
Christopher Tomlinson
Y
Yuhan Wang
Department of Engineering, King’s College London, London, UK
R
Ruizhe Li
Department of Computing Science, University of Aberdeen, Aberdeen, UK
L
Louisa van Zeeland
The Alan Turing Institute, London, UK
Sina Tabakhi
Sina Tabakhi
Doctoral Researcher, School of Computer Science, University of Sheffield
Machine LearningGraph Neural NetworksFeature SelectionMultimodal LearningMultiomics
C
Cyndie Demeocq
School of Informatics, University of Edinburgh, Edinburgh, UK
X
Xiang Li
School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
A
Arunav Das
Department of Informatics, King’s College London, London, UK
O
Orlando Timmerman
Department of Earth Sciences, University of Cambridge, Cambridge, UK
Thomas Baldwin-McDonald
Thomas Baldwin-McDonald
Senior Machine Learning Technologist, Ofcom
Machine LearningGaussian ProcessesRecommender SystemsResponsible AI
Jinge Wu
Jinge Wu
University College London
AI in HealthcareNatural Language ProcessingMachine Learning
P
Peizhen Bai
School of Computer Science, University of Sheffield, Sheffield, UK
Z
Zahraa Al Sahili
Department of Computer Science, Queen Mary University of London, London, UK
Omnia Alwazzan
Omnia Alwazzan
PhD student, Queen Mary University of London
Deep learningComputer-aid diagnosisImage analysisMultimodal learningComputational pathology
T
Thao N. Do
Department of Computer Science, University of Bath, Bath, UK
M
Mohammod N.I. Suvon
School of Computer Science, University of Sheffield, Sheffield, UK
A
Angeline Wang
Department of Classics, King’s College London, London, UK
L
Lucia Cipolina-Kun
School of Electrical, Electronic and Mechanical Engineering, University of Bristol, Bristol, UK
L
Luigi A. Moretti
School of Engineering, University of the West of England, Bristol, UK
L
Lucas Farndale
Cancer Research UK Scotland Institute, Glasgow, UK
N
Nitisha Jain
Department of Informatics, King’s College London, London, UK
N
Natalia Efremova
School of Business and Management, Queen Mary University of London, London, UK
Y
Yan Ge
School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK
Marta Varela
Marta Varela
City St George's University of London
AIMathematical ModellingMRIPhysics-Informed Machine Learning
H
Hak-Keung Lam
Department of Engineering, King’s College London, London, UK
Oya Celiktutan
Oya Celiktutan
Reader in AI & Robotics (Associate Professor) | Director of SAIR Lab @Centre for Robotics Research
Multimodal PerceptionMachine LearningHuman-Robot Interaction
B
Ben R. Evans
British Antarctic Survey, Cambridge, UK
Alejandro Coca-Castro
Alejandro Coca-Castro
The Alan Turing Institute
artificial intelligenceland-use modellingconservationspatial analysisenvironmental modelling
Honghan Wu
Honghan Wu
Professor of Health Informatics and AI, University of Glasgow
AI in medicineHealth Informatics
Zahraa S. Abdallah
Zahraa S. Abdallah
Senior Lecturer, School of Engineering Mathematics and Technology, University of Bristol, UK
Machine LearningTime SeriesDigital HealthMulti-modalitiesXAI
C
Chen Chen
V
Valentin Danchev
School of Business and Management, Queen Mary University of London, London, UK
N
Nataliya Tkachenko
Chief Data & AI Office, Lloyds Banking Group, London, UK
L
Lei Lu
School of Life Course & Population Sciences, King’s College London, London, UK
Tingting Zhu
Tingting Zhu
Associate Professor, University of Oxford
Machine LearningSensor FusionHealth InformaticsTime-series AnalysisClustering
G
Gregory G. Slabaugh
Digital Environment Research Institute, Queen Mary University of London, London, UK
Roger K. Moore
Roger K. Moore
Professor of Spoken Language Processing, Sheffield University
Speech technologyspeechspoken language processingspeech recognitionspeech synthesis
William K. Cheung
William K. Cheung
Professor of Computer Science, Hong Kong Baptist University
Data MiningArtificial IntelligenceHealth InformaticsSocial Network Analysis
P
Peter H. Charlton
Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
Haiping Lu
Haiping Lu
Professor of Machine Learning, University of Sheffield
Machine learningMultimodal AIAI4HealthAI4ScienceOpen-source software