Scholar

Jian Qian

Google Scholar ID: RTFd7KwAAAAJ

MIT

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

592

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailjianqian@mit.edu

Publications

14 items

What should post-training optimize? A test-time scaling law perspective

2026

Cited

Self-Normalized Martingales and Uniform Regret Bounds for Linear Regression

2026

Cited

Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

2026

Cited

Leave-One-Out Prediction for General Hypothesis Classes

2026

Cited

Boosting for Vector-Valued Prediction and Conditional Density Estimation

2026

Cited

Ratio Covers of Convex Sets and Optimal Mixture Density Estimation

2026

Cited

Data Reconstruction: Identifiability and Optimization with Sample Splitting

2026

Cited

Achieving Optimal Static and Dynamic Regret Simultaneously in Bandits with Deterministic Losses

2026

Cited

Resume (English only)

Academic Achievements

Published several preprints and conference papers, including works presented at NeurIPS, ICML, and AISTATS. Specific publications include:
- To bootstrap or to rollout? An optimal and adaptive interpolation
- Refined Risk Bounds for Unbounded Losses via Transductive Priors
- The Statistical Complexity of Interactive Decision Making
- Bridging multiple worlds: multi-marginal optimal transport for causal partial-identification problem
- Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability
- Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff
- Online Estimation via Offline Estimation: An Information-Theoretic Framework
- How Does Variance Shape the Regret in Contextual Bandits?
- The Non-linear F-Design and Applications to Interactive Learning
- Model-free reinforcement learning with the decision-estimation coefficient
- Convex and Non-Convex Optimization under Generalized Smoothness
- Byzantine-robust federated linear bandits
- Robust Learning Under Clean-label Attack
- Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
- Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes
- Importance Resampling for Off-policy Prediction
- Concentration inequalities for multinoulli random variables

Research Experience

Teaching Assistant for Dynamic Programming & Reinforcement Learning (Spring 2022).

Education

Pursuing a Ph.D. at MIT EECS, advised by Sasha Rakhlin.

Background

A final-year Ph.D. student at MIT EECS, focusing on the intersection between machine learning theory and interactive decision making, including online learning, bandits, and reinforcement learning.

Miscellany