🤖 AI Summary
The opaque provenance of open-source large language models (LLMs) and the difficulty of attributing LoRA-adapted models to their base models hinder trust, accountability, and regulatory compliance.
Method: We propose the first formal method for LoRA fine-tuning origin identification. By modeling the low-rank structure of weight residuals and leveraging singular value decomposition coupled with subspace alignment, our approach extracts confusion-invariant features that robustly identify the base model identity and enable invertible estimation of the LoRA rank.
Contribution/Results: The method maintains high accuracy under strong confounding transformations—including weight permutation and scaling—thereby significantly enhancing verification interpretability. Evaluated on 31 real-world open-source LLMs, it achieves reliable attribution across diverse architectures and training configurations. Our work establishes a new benchmark for LLM provenance tracing and provides a foundational framework for model lineage authentication in open-model ecosystems.
📝 Abstract
As large language models (LLMs) continue to advance, their deployment often involves fine-tuning to enhance performance on specific downstream tasks. However, this customization is sometimes accompanied by misleading claims about the origins, raising significant concerns about transparency and trust within the open-source community. Existing model verification techniques typically assess functional, representational, and weight similarities. However, these approaches often struggle against obfuscation techniques, such as permutations and scaling transformations. To address this limitation, we propose a novel detection method Origin-Tracer that rigorously determines whether a model has been fine-tuned from a specified base model. This method includes the ability to extract the LoRA rank utilized during the fine-tuning process, providing a more robust verification framework. This framework is the first to provide a formalized approach specifically aimed at pinpointing the sources of model fine-tuning. We empirically validated our method on thirty-one diverse open-source models under conditions that simulate real-world obfuscation scenarios. We empirically analyze the effectiveness of our framework and finally, discuss its limitations. The results demonstrate the effectiveness of our approach and indicate its potential to establish new benchmarks for model verification.