🤖 AI Summary
Multimodal large language models (MLLMs) remain unexplored for fingerprint analysis—a critical domain in biometrics and forensic science—due to the absence of dedicated benchmarks and systematic evaluation protocols. Method: We introduce FPBench, the first comprehensive benchmark for fingerprint analysis, comprising seven real and synthetic datasets and eight fine-grained tasks—including quality assessment, matching reasoning, and forensic interpretation—to evaluate 20 open- and closed-source MLLMs under zero-shot and chain-of-thought (CoT) inference settings. Contribution/Results: Our study reveals, for the first time, fundamental limitations of current MLLMs in texture perception, causal reasoning, and explanation generation. We publicly release FPBench—including its datasets, evaluation protocols, and baseline results—to establish a standardized foundation for developing fingerprint-aware foundation models.
📝 Abstract
Multimodal LLMs (MLLMs) have gained significant traction in complex data analysis, visual question answering, generation, and reasoning. Recently, they have been used for analyzing the biometric utility of iris and face images. However, their capabilities in fingerprint understanding are yet unexplored. In this work, we design a comprehensive benchmark, extsc{FPBench} that evaluates the performance of 20 MLLMs (open-source and proprietary) across 7 real and synthetic datasets on 8 biometric and forensic tasks using zero-shot and chain-of-thought prompting strategies. We discuss our findings in terms of performance, explainability and share our insights into the challenges and limitations. We establish extsc{FPBench} as the first comprehensive benchmark for fingerprint domain understanding with MLLMs paving the path for foundation models for fingerprints.