Paper 'Reconstructing O1 Test-Time Compute Scaling Laws': Reconstructed o1 test-time scaling laws using public API access to o1-mini.
Paper 'Planning In Natural Language Improves LLM Search For Code Generation': Demonstrated that searching over diverse natural language plans significantly improves code generation.
Paper 'LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet' accepted at Red Teaming GenAI Workshop @ NeurIPS 2024, showing >70% success rates for multi-turn human jailbreaks against current defenses.
Paper 'A Careful Examination of Large Language Model Performance on Grade School Arithmetic' selected as NeurIPS 2023 Spotlight (Datasets and Benchmarks Track), cloning GSM8k to measure dataset contamination.
Paper 'Learning Goal-Conditioned Representations for Language Reward Models' published at NeurIPS 2024, exploring representation learning for LLM post-training.
Paper 'Q-Probe: A Lightweight Approach to Reward Maximization for Language Models' proposes a lightweight alternative to fine-tuning that outperforms LoRA on very small datasets.
Paper 'Chain-of-Thought Reasoning is a Policy Improvement Operator' presented at NeurIPS 2023 workshop, showing chain-of-thought training enables self-improvement and generalization.
Paper 'Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization' introduces a unified algorithm for RL and game theory, solving MDPs and imperfect-information games with a single hyperparameter set.