Dong Deng
Scholar

Dong Deng

Google Scholar ID: mNUsQysAAAAJ
Rutgers University
Data ManagementData PreparationData IntegrationData Science
Citations & Impact
All-time
Citations
1,790
 
H-index
22
 
i10-index
40
 
Publications
20
 
Co-authors
6
list available
Resume (English only)
Academic Achievements
  • Publications:
  • - "Near-Duplicate Text Alignment with One Permutation Hash" accepted by SIGMOD 2025
  • - "SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search" accepted by SIGMOD 2024
  • - "ARKGraph: All-Range Approximate K-Nearest-Neighbor Graph" accepted by PVLDB 2023
  • - "Near-Duplicate Sequence Search at Scale for Large Language Model Memorization" accepted by SIGMOD 2023
  • - "The Case for Learned Provenance Graph Storage Systems" accepted by USENIX Security Symposium 2023
  • - "TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection" accepted by SIGMOD 2022
  • - "SPINE: Scaling up Programming-by-Negative-Example for String Filtering and Transformation" accepted by SIGMOD 2022
  • - "Efficient Load-Balanced Butterfly Counting on GPU" accepted by PVLDB 2022
  • - "Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts" accepted by SIGMOD 2021
  • Funded Projects:
  • - NSF-funded project "III: Small: Large-Scale High Dimensional Dense Vector Management"
  • - NSF-funded project "CDSE: Computation-Informed Learning of Melt Pool Dynamics for Real-Time Prognosis"
Research Experience
  • Assistant Professor in the Computer Science Department at Rutgers University since 2019; Regularly publishes research at major data management and database system conferences such as SIGMOD, PVLDB, and ICDE.
Education
  • Ph.D. from Tsinghua University; Postdoc training at MIT CSAIL; Assistant Professor in the Computer Science Department at Rutgers University since 2019.
Background
  • Research Interests: Data management, data science, and database systems, focusing on developing novel algorithms and building practical systems to address data problems. Current research topics include scalable data curation (textual data curation, structured data curation, and feature data curation), data manipulation and wrangling at scale, data integration, data cleaning, and data discovery, scientific dataset management, data lake management, and data warehouse management.
Miscellany
  • Recruiting research interns and Ph.D. students; welcomes students with excellent programming skills and knowledge of commonly used algorithms and data structures; organized the SIGMOD Student Programming Contest 2023; personal interests not mentioned