Scholar

Weiting Tan

Google Scholar ID: hD8E4gYAAAAJ

Johns Hopkins University, Bytedance Seed

Natural Language ProcessingMulti-modal LLMMultilingual LLM

Citations & Impact

All-time

Citations

451

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

6 items

2026

Cited

2025

Cited

2025

Cited

arXiv.org · 2024

Cited

arXiv.org · 2024

Cited

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

Sep 25, 2025: Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents released on arXiv.
Aug 20, 2025: Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation accepted to EMNLP 2025!
May 20, 2025: STAR and SSR accepted to IWSLT 2025!
Sep 25, 2024: DiffNorm accepted to NeurIPS 2024!
Paper SSR: Alignment-Aware Modality Connector for Speech Language Models proposes a method to better fuse modalities by segmenting and compressing speech features.
Paper DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation introduces a diffusion-based normalization strategy to simplify data distributions.

Research Experience

Recently focused on allowing multi-modal agents to seamlessly reason, utilize tools, and interact with users/envs. Also working on unified multimodal understanding and generation LLMs.

Education

Currently a PhD Candidate in Computer Science at Johns Hopkins University, advised by Prof. Philipp Koehn. Previously, completed Undergraduate and Master’s degrees in Computer Science at JHU.

Background

Research interests: machine learning and natural language processing; particularly excited by efficient and scalable representation learning methods for cross-modal applications; also interested in derivatives pricing and hedging strategies.

Miscellany