🤖 AI Summary
Research on safety alignment for large language models (LLMs) in long-context settings remains severely underexplored. Method: We introduce LongSafety, the first benchmark dataset specifically designed for long-context safety evaluation and training—comprising 10 safety task categories, 17K high-quality samples, and an average context length of 40.9K tokens. We establish rigorous long-text risk modeling for annotation and propose a multi-task mixed training strategy that jointly optimizes safety performance across both long and short contexts without compromising general capabilities. Contribution/Results: Our work empirically demonstrates that long-context safety is not reducible to short-context safety, revealing critical failures in cross-length generalization and scenario transfer. Evaluated on multiple long-text safety benchmarks, our approach achieves state-of-the-art results, substantially improving long-context safety while also enhancing short-context safety—thereby enabling bidirectional safety improvement.
📝 Abstract
Recent advancements in model architectures and length extrapolation techniques have significantly extended the context length of large language models (LLMs), paving the way for their application in increasingly complex tasks. However, despite the growing capabilities of long-context LLMs, the safety issues in long-context scenarios remain underexplored. While safety alignment in short context has been widely studied, the safety concerns of long-context LLMs have not been adequately addressed. In this work, we introduce extbf{LongSafety}, a comprehensive safety alignment dataset for long-context LLMs, containing 10 tasks and 17k samples, with an average length of 40.9k tokens. Our experiments demonstrate that training with LongSafety can enhance long-context safety performance while enhancing short-context safety and preserving general capabilities. Furthermore, we demonstrate that long-context safety does not equal long-context alignment with short-context safety data and LongSafety has generalizing capabilities in context length and long-context safety scenarios.