Published multiple papers such as 'LongViTU: Instruction Tuning for Long-Form Video Understanding', 'Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting', etc. Also participated in organizing the NeurIPS 2024 Workshop on Open-World Agents.
Research Experience
Involved in several research projects including robotic manipulation, enhanced perception for embodied agents, and multi-modal open-world agents.
Education
Received Bachelor's degree in Computer Science from Tsinghua University; Ph.D. in Computer Science from the University of California, Los Angeles (UCLA).
Background
A machine learning researcher. Interested in multimodal learning, representation learning, and embodied agents. Specifically, interested in building models/agents that can learn from 2D/3D vision and text data, and perform a wide range of reasoning and embodied control tasks.
Miscellany
Contact: jeasinema [at] gmail [dot] com / Google Scholar / LinkedIn