Scholar

Branislav Kveton

Google Scholar ID: CZaDvPgAAAAJ

Adobe Research

Artificial IntelligenceMachine Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

4,162

H-index

i10-index

Publications

Co-authors

Contact

No contact links provided.

Publications

30 items

AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

2026

Cited

MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization

2026

Cited

Spectral bandits for smooth graph functions with applications in recommender systems

2026

Cited

Evidence-based anomaly detection in clinical domains

2026

Cited

Learning from a single labeled face and a stream of unlabeled data

2026

Cited

Online semi-supervised perception: Real-time learning without explicit feedback

2026

Cited

Semi-supervised learning with max-margin graph cuts

2026

Cited

Spectral bandits

2026

Cited

Resume (English only)

Academic Achievements

2025. 'Personalization of Large Language Models: A Survey.' Transactions on Machine Learning Research.
2025. 'GUI Agents: A Survey.' In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL).
2025. 'From Selection to Generation: A Survey of LLM-based Active Learning.' ACL 2025.
2025. 'Adaptive Submodular Policy Optimization.' In Proceedings of the 2nd Reinforcement Learning Conference (RLC).
2025. 'FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain.' In Proceedings of the 42nd International Conference on Machine Learning (ICML).

Research Experience

2024–present: Principal Research Scientist at Adobe Research.
2021–2024: Research Scientist at Amazon.
2018–2021: Research Scientist at Google Research.
2014–2018: Research Scientist at Adobe Research.
2011–2014: Research Scientist at Technicolor’s Research Center.
2006–2011: Research Scientist at Intel Research.

Background

Proposes, analyzes, and applies algorithms that learn incrementally, run in real time, and converge to near-optimal solutions as observations increase.
Recent work focuses on applying these ideas to modern generative models and human feedback.
Studies seamless human-machine interaction—the holy grail of AI—traditionally approached via reinforcement learning and bandit frameworks.
Made fundamental contributions to the bandit field, especially in structured problems involving graphs, submodularity, semi-bandit feedback, and low-rank matrices.
Developed online learning-to-rank bandit algorithms capable of handling exponentially large action spaces and partial feedback; these are simple, theoretically sound, robust, and state-of-the-art.
Recent efforts enhance practicality of bandit algorithms via randomization-based exploration (compatible with neural networks) and reducing statistical complexity through meta-, multi-task, and federated learning.
Explores persistent challenges of exploration and statistically-efficient adaptivity in the era of pre-trained models, such as optimal experimental design for efficient LLM fine-tuning and off-policy evaluation using logged human feedback.

Co-authors

0 total

Co-authors: 0 (list not available)