🤖 AI Summary
This work addresses the challenge of fine-tuning large language models for structured concept retrieval in resource-constrained Web4Good applications within the social sciences, where high computational costs and geometric deficiencies in existing parameter-efficient fine-tuning (PEFT) methods—such as LoRA—lead to suboptimal performance. The authors propose OrthoGeoLoRA, which introduces orthogonal constraints into the LoRA framework by reconstructing the update term via singular value decomposition (SVD) and enforcing its low-rank factors to reside on the Stiefel manifold. This geometric regularization effectively mitigates issues of gauge freedom, scale ambiguity, and rank collapse. Compatible with standard training pipelines, OrthoGeoLoRA achieves significantly improved ranking performance over conventional LoRA and other strong PEFT baselines under identical low-rank budgets on the multilingual ELSST social science thesaurus hierarchical retrieval benchmark, offering an efficient adaptation strategy for resource-limited settings.
📝 Abstract
Large language models and text encoders increasingly power web-based information systems in the social sciences, including digital libraries, data catalogues, and search interfaces used by researchers, policymakers, and civil society. Full fine-tuning is often computationally and energy intensive, which can be prohibitive for smaller institutions and non-profit organizations in the Web4Good ecosystem. Parameter-Efficient Fine-Tuning (PEFT), especially Low-Rank Adaptation (LoRA), reduces this cost by updating only a small number of parameters. We show that the standard LoRA update $\Delta W = BA^\top$ has geometric drawbacks: gauge freedom, scale ambiguity, and a tendency toward rank collapse. We introduce OrthoGeoLoRA, which enforces an SVD-like form $\Delta W = B\Sigma A^\top$ by constraining the low-rank factors to be orthogonal (Stiefel manifold). A geometric reparameterization implements this constraint while remaining compatible with standard optimizers such as Adam and existing fine-tuning pipelines. We also propose a benchmark for hierarchical concept retrieval over the European Language Social Science Thesaurus (ELSST), widely used to organize social science resources in digital repositories. Experiments with a multilingual sentence encoder show that OrthoGeoLoRA outperforms standard LoRA and several strong PEFT variants on ranking metrics under the same low-rank budget, offering a more compute- and parameter-efficient path to adapt foundation models in resource-constrained settings.