JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching (arXiv 2025): A unified framework for synchronized facial motion and speech generation using flow matching and MM-DiT
Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing (ICCV 2025): Systematic analysis of bidirectional attention in MM-DiT and a robust prompt-based editing method
InstantDrag: Improving Interactivity in Drag-based Image Editing (SIGGRAPH Asia 2024): Optimization-free pipeline for fast drag-based image editing
Fill-Up: Balancing Long-Tailed Data with Generative Models (arXiv 2023): Two-stage method using textual-inverted tokens for long-tailed recognition
StudioGAN: A Taxonomy and Benchmark of GANs for Image Synthesis (TPAMI 2023): Comprehensive library reproducing 30+ GAN models with standardized benchmarks