SALMAN: Stability Analysis of Language Models Through the Maps Between Graph-based Manifolds

๐Ÿ“… 2025-08-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Evaluating the robustness of large language models (LLMs) against input perturbations remains challenging due to the lack of scalable, general-purpose assessment methods and overreliance on complex, sample-specific adversarial examples. To address this, we propose a unified local robustness analysis framework that requires no model parameter modification and imposes no dependence on input-specific perturbations. Our core innovation lies in constructing a graph-structured manifold mapping between input and output spaces and introducing the Distance Mapping Distortion (DMD) metricโ€”a theoretically grounded, near-linear-complexity measure for sample-level stability quantification. The framework ensures both analytical interpretability and computational efficiency. Empirical evaluation across multiple Transformer architectures of varying scales demonstrates substantial improvements in adversarial attack efficacy and robustness-aware training performance, while confirming the methodโ€™s validity, cross-scale generality, and scalability.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent strides in pretrained transformer-based language models have propelled state-of-the-art performance in numerous NLP tasks. Yet, as these models grow in size and deployment, their robustness under input perturbations becomes an increasingly urgent question. Existing robustness methods often diverge between small-parameter and large-scale models (LLMs), and they typically rely on labor-intensive, sample-specific adversarial designs. In this paper, we propose a unified, local (sample-level) robustness framework (SALMAN) that evaluates model stability without modifying internal parameters or resorting to complex perturbation heuristics. Central to our approach is a novel Distance Mapping Distortion (DMD) measure, which ranks each sample's susceptibility by comparing input-to-output distance mappings in a near-linear complexity manner. By demonstrating significant gains in attack efficiency and robust training, we position our framework as a practical, model-agnostic tool for advancing the reliability of transformer-based NLP systems.
Problem

Research questions and friction points this paper is trying to address.

Evaluating robustness of language models under input perturbations
Providing model-agnostic stability analysis without parameter modification
Measuring sample susceptibility through efficient distance mapping comparison
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based manifold mapping for stability analysis
Distance Mapping Distortion measure for susceptibility ranking
Model-agnostic robustness framework without parameter modification
๐Ÿ”Ž Similar Papers
No similar papers found.
W
Wuxinlin Cheng
Stevens Institute of Technology
Yupeng Cao
Yupeng Cao
Stevens Institute of Technology
Natural Language ProcessingMultiModalTrustworthy AI
J
Jinwen Wu
Stevens Institute of Technology
K
Koduvayur Subbalakshmi
Stevens Institute of Technology
Tian Han
Tian Han
Stevens Insititute of Technology
Machine LearningArtificial IntelligenceComputer Vision
Z
Zhuo Feng
Stevens Institute of Technology