Scholar

Chunwei Liu

Google Scholar ID: Q0LOhAgAAAAJ

Massachusetts Institute of Technology

DatabasesCompound AI SystemsLLMData CompressionIoT

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

642

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailchunwei@csail.mit.edu TwitterOpen ↗LinkedInOpen ↗

Publications

13 items

SemJoin: Semantic Join Optimization

2026

Cited

TabClean: Reusable LLM-Synthesized Programs for Tabular Data Cleaning

2026

Cited

The Table Says Otherwise: Testing LLMs with Counterfactual Relational Data

2026

Cited

Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies

2026

Cited

SAGE: Selective Attention-Guided Extraction for Token-Efficient

2026

Cited

SA-CycleGAN-2.5D: Self-Attention CycleGAN with Tri-Planar Context for Multi-Site MRI Harmonization

2026

Cited

ResQ: Realistic Performance-Aware Query Generation

2026

Cited

iPDB -- Optimizing SQL Queries with ML and LLM Predicates

2026

Cited

Resume (English only)

Academic Achievements

June 2025: Paper on cloud analytics workload synthesis tool, “PBench: Workload Synthesizer with Real Statistics for Cloud Analytics Benchmarking,” was accepted by VLDB 2025. Same month, another paper on non-intrusive DBMS scheduling, “Improving DBMS Scheduling Decisions with Accurate Performance Prediction on Concurrent Queries,” was also accepted by VLDB 2025. Additionally, a comprehensive study of lossless floating-point compression, “Beyond Compression: A Comprehensive Evaluation of Lossless Floating-Point Compression,” was accepted by VLDB 2025. March 2025: Demo paper on a chat interface for declarative AI frameworks, “PalimpChat: Declarative and Interactive AI Analytics,” was accepted by SIGMOD 2025. February 2025: Papers on scientific discovery and open columnar formats evaluation were accepted by AISD @ NAACL 2025 and the VLDB Journal special issue “Best of VLDB,” respectively. January 2025: Paper on LLM scheduling, “Don't Stop Me Now: Embedding Based Scheduling for LLMs,” was accepted to ICLR 2025.

Research Experience

Currently a Postdoctoral Associate at MIT CSAIL, working with Michael Cafarella.

Education

Received his Ph.D. from the Department of Computer Science at the University of Chicago, where he worked in the ChiData group advised by Aaron Elmore.

Background

His research interests span compound AI systems, database systems, cloud/edge computing, and database benchmarking. Focuses on optimizing data systems for both conventional data analytics and emerging AI-powered pipelines. Develops privacy-preserving workload generation techniques for evaluating cloud database systems, collaborating with major cloud vendors such as Microsoft, Amazon, Intel, Meta, and Google. Also explores novel data compression methods, implementing adaptive compression selection in both conventional and resource-constrained databases and machine learning systems. Engaged in high-dimensional data analysis, with a particular emphasis on time series applications.

Co-authors

14 total