Accelerate Aggregation Queries with JOINs over Unstructured Data (In submission)
Breaking Barriers: Do Reinforcement Post Training Gains Transfer To Unseen Domains? (Preprint)
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities (Preprint)
Establishing Best Practices for Building Rigorous Agentic Benchmarks (NeurIPS 2025, first place at Berkeley AgentX Summit (Benchmark & Evaluation Track))
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines (VLDB 2026)
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench (ACL main 2025)
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities (ICML 2025 Spotlight, SafeBench winner, adopted by US AISI, second place at Berkeley AgentX Summit (AI Safety & Alignment Track))
PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees (SIGMOD 2025)
Efficient Approximate Query Processing with Block Sampling (CIDR 2025)
FedTrans: Efficient Federated Learning via Multi-Model Transformation (MLSys 2024)
SlabCity: Whole-Query Optimization using Program Synthesis (VLDB 2023)
An Energy-efficient Computing Offloading Framework for Blockchain-enabled Video Streaming Systems (GlobeCom 2022)
Sharding for Blockchain-based Mobile Edge Computing System: A Deep Reinforcement Learning Approach (GlobeCom 2021)
Background
Research interests: data + AI/ML; Research focus: developing statistically grounded approaches to enable efficient data analytics, rigorous AI evaluations, and AI for safety.