Hugh Zhang
Scholar

Hugh Zhang

Google Scholar ID: CgZ9uJkAAAAJ
Scale AI
large language modelsreinforcement learninggame theorycode generation
Citations & Impact
All-time
Citations
1,274
 
H-index
9
 
i10-index
9
 
Publications
14
 
Co-authors
0
 
Publications
1 items
Resume (English only)
Academic Achievements
  • Paper 'Reconstructing O1 Test-Time Compute Scaling Laws': Reconstructed o1 test-time scaling laws using public API access to o1-mini.
  • Paper 'Planning In Natural Language Improves LLM Search For Code Generation': Demonstrated that searching over diverse natural language plans significantly improves code generation.
  • Paper 'LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet' accepted at Red Teaming GenAI Workshop @ NeurIPS 2024, showing >70% success rates for multi-turn human jailbreaks against current defenses.
  • Paper 'A Careful Examination of Large Language Model Performance on Grade School Arithmetic' selected as NeurIPS 2023 Spotlight (Datasets and Benchmarks Track), cloning GSM8k to measure dataset contamination.
  • Paper 'Learning Goal-Conditioned Representations for Language Reward Models' published at NeurIPS 2024, exploring representation learning for LLM post-training.
  • Paper 'Q-Probe: A Lightweight Approach to Reward Maximization for Language Models' proposes a lightweight alternative to fine-tuning that outperforms LoRA on very small datasets.
  • Paper 'Chain-of-Thought Reasoning is a Policy Improvement Operator' presented at NeurIPS 2023 workshop, showing chain-of-thought training enables self-improvement and generalization.
  • Paper 'Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization' introduces a unified algorithm for RL and game theory, solving MDPs and imperfect-information games with a single hyperparameter set.
Co-authors
0 total
Co-authors: 0 (list not available)