Yuyang Dong
Scholar

Yuyang Dong

Google Scholar ID: EmmeWH0AAAAJ
SB Intuitions, Japan
databasedata miningmachine learning
Citations & Impact
All-time
Citations
411
 
H-index
8
 
i10-index
8
 
Publications
20
 
Co-authors
5
list available
Resume (English only)
Academic Achievements
  • 2025: Published 'SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation' (Arxiv).
  • 2025: Published 'Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs' (Arxiv).
  • 2024: Led the development and release of Jellyfish-7B/8B/13B, LLMs specialized for data preprocessing, achieving GPT-4-level performance on tasks like entity matching, data imputation, and error detection while enabling secure, cost-effective local execution; paper accepted at EMNLP 2024.
  • 2024: Presented tutorial 'On the Use of Large Language Models for Table Tasks' at CIKM 2024.
  • 2024: Published 'Large Language Models as Data Preprocessors' at TaDA workshop@VLDB 2024.
  • 2023: Published 'QA-Matcher: Unsupervised Entity Matching Using A Question Answering Model' at PAKDD 2023.
  • 2023: Published 'DeepJoin: Joinable Table Discovery with Pre-trained Language Models' at VLDB 2023.
  • 2022: Published demo paper 'Table Enrichment System for Machine Learning' at SIGIR 2022.
  • 2021: Published 'Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach' at ICDE 2021.
  • 2021: Published multiple papers on entity matching and quality control in hierarchical classification at PAKDD 2021 and SFDI workshop@VLDB 2021.
  • 2020: Published 'Learning from Unsure Responses' at AAAI 2020.