One-Pass Diversified Sampling with Application to Terabyte-Scale Genomic Sequence Streams (ICML22)
Practical Near Neighbor Search via Group Testing (NeurIPS21, Spotlight Talk - Top 3%)
A One-Pass Distributed and Private Sketch for Kernel Sums with Applications to Machine Learning at Scale (CCS21)
Sub-linear RACE Sketches for Approximate Kernel Density Estimation on Streaming Data (WWW20)
Sub-linear Memory Sketches for Near Neighbor Search on Streaming Data (ICML20)
Revisiting Consistent Hashing with Bounded Loads (AAAI21)
Fast processing and querying of 170TB of genomics data via a repeated and merged bloom filter (RAMBO) (SIGMOD21)
Research Experience
Current work focuses on efficient approximate algorithms for low-level building blocks of machine learning, such as kernel sums and near-neighbor search, as well as fast training and inference.
Background
Interested in randomized algorithms for scalable machine learning. By replacing expensive exact algorithms with lightweight approximate methods, the resources needed to run a program can be substantially reduced. Particularly interested in simple methods with theoretical guarantees that also work well in a web-scale production environment.