Controlla: Learning Controllability via Graph-Constrained Latent Geometry

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
This work addresses identity drift and cross-modal inconsistency in controllable multimodal generation, which arise from the lack of explicit structure in semantic attribute evolution. To this end, the authors propose Controlla, a novel framework that, for the first time, integrates graph priors with optimal transport to construct a structured latent space. By imposing graph-based constraints, Controlla explicitly aligns identity and attribute factors, guiding attribute evolution along consistent geometric trajectories. The method introduces a geometry-aware metric to evaluate trajectory consistency and disentanglement and establishes AffectHuman-43K, a leakage-resistant multimodal benchmark. Experimental results demonstrate that Controlla significantly outperforms existing approaches in controllability, identity preservation, and cross-modal alignment, confirming its advantages in graph sensitivity, scalability, and robustness.
📝 Abstract
Controllable multimodal generation is commonly formulated as an inference-time conditioning problem using prompts, guidance, or auxiliary modules. While effective, such approaches do not explicitly structure how semantic attributes evolve, which can lead to identity drift and inconsistent cross-modal behavior. We propose Controlla, a modular factorized-control framework that treats controllability as a property of structured latent geometry. Controlla learns identity and attribute factors from multimodal inputs and aligns them with graph priors using graph-constrained optimal transport, encouraging attributes to follow graph-consistent trajectories while preserving reference identity. To evaluate this setting, we construct AffectHuman-43K, a leakage-aware multimodal benchmark for reference-grounded affective control, and introduce geometry-aware metrics for trajectory consistency and latent disentanglement. Experiments show consistent improvements in controllability, identity preservation, and cross-modal alignment, with additional analyses on graph sensitivity, extensibility, and robustness.
Problem

Research questions and friction points this paper is trying to address.

controllable generation
multimodal generation
identity drift
cross-modal consistency
latent geometry
Innovation

Methods, ideas, or system contributions that make the work stand out.

graph-constrained optimal transport
structured latent geometry
factorized control
multimodal controllable generation
latent disentanglement
🔎 Similar Papers
No similar papers found.