Scholar

Subhojyoti Mukherjee

Google Scholar ID: VFixSK8AAAAJ

Adobe Research

Multi-armed BanditsReinforcement LearningLarge Language ModelsRLHF

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

274

H-index

i10-index

Publications

Co-authors

list available

Contact

CVOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

18 items

AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

2026

Cited

MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization

2026

Cited

Sparse Personalized Text Generation with Multi-Trajectory Reasoning

2026

Cited

A Survey on LLM-based Conversational User Simulation

2026

Cited

Stepwise Credit Assignment for GRPO on Flow-Matching Models

2026

Cited

Agentic Planning with Reasoning for Image Styling via Offline RL

2026

Cited

Partial Policy Gradients for RL in LLMs

2026

Cited

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

2026

Cited

Resume (English only)

Academic Achievements

- Paper 'Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning' accepted at NeurIPS 2025 (main conference).
- Paper 'From Selection to Generation: A Survey of LLM-based Active Learning' accepted at ACL 2025 (main conference).
- Paper 'Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning' accepted at RLC 2025 (main conference).

Research Experience

- Adobe Research (San Jose): Research Scientist/Engineer (Mar 2025 - Present). Involved in pre-training and post-training of small LMs for Adobe Document Cloud; contributed to the English Document Overview model in Acrobat Reader; worked on the AI Assistant project in Adobe Express.
- Amazon AWS AI (Santa Clara, USA): Summer 2024 (full-time), hosted by Branislav Kveton et al., Area of Research: Multi-objective alignment for LLMs.
- Amazon AWS AI (Santa Clara, USA): Fall 2023 (part-time), hosted by Branislav Kveton et al., Area of Research: RLHF with LLMs.
- Amazon AWS AI (Santa Clara, USA): Summer 2023 (full-time), hosted by Branislav Kveton et al., Area of Research: Active In-Context Learning with LLMs.
- CMU, ECE Dept. (Pittsburgh, USA): Summer 2019, hosted by Prof. Gauri Joshi, Area of Research: Structured Bandits.
- Adobe Research (San Jose, USA): Spring 2018, hosted by Branislav Kveton, Area of Research: Item recommendation with Ranking and Bandits.
- INRIA, SequeL Lab (Lille, France): Fall 2017, hosted by Odalric Maillard, Area of Research: Non-stationary Bandits.

Education

- Ph.D.: Fall 2019 to Feb 2025, ECE, University of Wisconsin-Madison, advised by Dr. Robert Nowak, Dr. Josiah Hanna, and Dr. Qiaomin Xie. Areas of Research: Reinforcement Learning, Active Learning, incorporating deep active learning strategies for Large Language Models (LLMs), etc.
- M.S. by Research: 2015 to 2018, CSE, Indian Institute of Technology (IIT) Madras, advised by Dr. Balaraman Ravindran and Dr. Nandan Sudarsanam. Areas of Research: Reinforcement learning, Multi-Armed Bandit settings.
- Bachelor of Technology: 2009 to 2013, Dept. of Computer Science and Engineering, Meghnad Saha Institute of Technology, Kolkata, under West Bengal University of Technology, India.

Background

Research interests include training machine learning models, reinforcement learning, fine-tuning and alignment of large language models (LLMs). Serves as a research scientist at Adobe Research, focusing on pre-training and post-training of small language models.

Miscellany