Scholar

Haoqin Tu

Google Scholar ID: hyFMd54AAAAJ

University of California Santa Cruz

natural language processinggenerationmultimodal

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,029

H-index

i10-index

Publications

Co-authors

list available

Contact

CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

25 items

You Only Align Once: Propagating Cooperative Behaviors in Multi-Agent Systems through Seed Agents

2026

Cited

From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

2026

Cited

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

2026

Cited

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

2026

Cited

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

2026

Cited

Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows

2026

Cited

Target-Oriented Pretraining Data Selection via Neuron-Activated Graph

2026

Cited

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

2026

Cited

Resume (English only)

Academic Achievements

Published papers include 'AHELM: A Holistic Evaluation of Audio-Language Models', 'ViLBench: A Suite for Vision-Language Process Reward Modeling' and more, accepted to NeurIPS 25, EMNLP 2025, etc. Involved in the development of open-source projects like OpenVision.

Research Experience

Currently working on Controllable/Efficient/Multimodal Text Generation (CTG / Efficient / MMGen). Also interested in AI-Safety problems in LLM-based models. Joining ByteDance Seed as a Student Research Scientist in June 2025.

Education

Ph.D. student at UCSC CSE, advised by Prof. Cihang Xie and Prof. Yuyin Zhou; M.Eng. from UCAS.

Background

Research interests: Natural Language Processing (NLP), multi-modal learning and their applications. Particularly interested in efficient&controllable generation (e.g., unsupervised, Plug-and-Play), multi-modal interactions (e.g., visual dialogue, captioning), and the combination of both. The ultimate goal is to empower any off-the-shelf language model with the ability to understand real-world experiences and interact with people.

Miscellany

Open for research collaborations and looking for potential intern positions in the summer of 2025. Contact: tuisaac163(at)gmail.com, Google Scholar, Github, Twitter

Co-authors

15 total