๐ค AI Summary
This work addresses a critical limitation in existing large language model alignment methodsโsuch as supervised fine-tuning (SFT) and direct preference optimization (DPO)โwhich rely solely on local token likelihoods or scalar preference scores while ignoring the global geometric structure of semantic representations in latent space. To bridge this gap, the study introduces persistent homology into alignment for the first time, modeling text generation as semantic trajectories in latent space. Leveraging 0-dimensional persistent homology, it extracts topological bridge structures between prompts and responses, yielding two novel components: Trajectory Topological Loss (TTL) for supervised fine-tuning and Topological Preference Optimization (TPO) for preference learning. These guide model updates to align with semantic topology. Combined with a dynamic loss weighting strategy, the approach significantly outperforms non-topological baselines on Qwen2.5-7B-Instruct across UltraChat and Anthropic HH-RLHF datasets, achieving superior performance on both automated preference metrics and LLM-based evaluations while maintaining or reducing toxicity.
๐ Abstract
Alignment of large language models (LLMs) via SFT and RLHF/DPO typically ignores the global geometry of the representation space, relying instead on local token likelihoods or scalar scores. We view generation as tracing a semantic trajectory in hidden space and propose a topology-enhanced alignment framework that regularizes these trajectories using 0-dimensional persistent homology. First, for SFT, we introduce Trajectory Topology Loss (TTL). Treating prompt and gold-answer embeddings as a mixed point cloud, we use a 0D persistent homology algorithm to extract "prompt-answer bridges." TTL aligns the model's actual update direction with these topological bridges rather than arbitrary directions. Second, for DPO, we propose Topological Preference Optimization (TPO). TPO constructs topic-specific semantic preference vectors and aligns the improvement direction between rejected and chosen responses with these vectors in an intermediate hidden layer. We also introduce a dynamic weighting scheme to balance DPO and TPO losses. Evaluating on Qwen2.5-7B-Instruct using UltraChat and Anthropic HH-RLHF, our topology-enhanced objectives consistently outperform strong non-topological baselines (e.g., per-example, nearest-neighbor, random regularizers) on automatic preference metrics and LLM-judge evaluations, while maintaining or improving toxicity. Results show persistent homology and trajectory geometry offer a promising direction for controllable alignment.