🤖 AI Summary
This work addresses the insufficient robustness of multimodal large language models (MLLMs) to semantically equivalent textual paraphrasings in reasoning-based segmentation tasks. We propose the first sentence-level black-box adversarial paraphrasing framework. Methodologically, we construct a semantic latent space using a text autoencoder and employ reinforcement learning to generate adversarial queries that degrade segmentation performance while preserving grammatical correctness and semantic equivalence. Our key contributions are: (i) transferring adversarial attacks from the visual to the textual modality—thereby eliminating reliance on image perturbations—and (ii) designing an automated evaluation protocol to rigorously ensure paraphrase quality. Experiments on ReasonSeg and LLMSeg-40k demonstrate that our method achieves up to a 2× improvement in attack success rate over prior approaches. Crucially, it provides the first systematic evidence of the intrinsic vulnerability of state-of-the-art reasoning segmentation models to linguistic diversity in natural language queries.
📝 Abstract
Multimodal large language models (MLLMs) have shown impressive capabilities in vision-language tasks such as reasoning segmentation, where models generate segmentation masks based on textual queries. While prior work has primarily focused on perturbing image inputs, semantically equivalent textual paraphrases-crucial in real-world applications where users express the same intent in varied ways-remain underexplored. To address this gap, we introduce a novel adversarial paraphrasing task: generating grammatically correct paraphrases that preserve the original query meaning while degrading segmentation performance. To evaluate the quality of adversarial paraphrases, we develop a comprehensive automatic evaluation protocol validated with human studies. Furthermore, we introduce SPARTA-a black-box, sentence-level optimization method that operates in the low-dimensional semantic latent space of a text autoencoder, guided by reinforcement learning. SPARTA achieves significantly higher success rates, outperforming prior methods by up to 2x on both the ReasonSeg and LLMSeg-40k datasets. We use SPARTA and competitive baselines to assess the robustness of advanced reasoning segmentation models. We reveal that they remain vulnerable to adversarial paraphrasing-even under strict semantic and grammatical constraints. All code and data will be released publicly upon acceptance.