Agora | Research Hub

Citations & Impact

All-time

Citations

1,487

H-index

13

i10-index

15

Publications

20

Co-authors

12

list available

Contact

Emaileyuansu71@gmail.com TwitterOpen ↗GitHubOpen ↗

Publications

11 items

LogDx-CI: Benchmarking Log Reduction Tools for LLM Root-Cause Diagnosis

2026

Cited

0

Hint Tuning: Less Data Makes Better Reasoners

2026

Cited

0

Beyond Multiple Choice: A Hybrid Framework for Unifying Robust Evaluation and Verifiable Reasoning Training

2025

Cited

0

LaoBench: A Large-Scale Multidimensional Lao Benchmark for Large Language Models

2025

Cited

0

BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions

2025

Cited

0

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions

2025

Cited

0

Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information

2025

Cited

0

SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications

2025

Cited

0

Resume (English only)

Academic Achievements

Developed FlagJudge (a GenRM model) in Dec 2023, predating DeepMind’s similar work by one year
Team ranked 7th out of 100+ global teams in the AI Safety and Security Challenge (hosted by AI Singapore & NUS) in July 2024; invited to Singapore International Cyber Week (SICW) 2024
FlagEval released its latest leaderboard in Sep 2024, covering nearly 300 models across subjective, objective, arena battle, debate, multimodal, text-to-image, and text-to-video evaluations
Published in top-tier conferences including AAAI, NeurIPS, and ACL Findings (e.g., 'Before generation, align it!', 'Can LLM already serve as a database interface?', 'Graphix-t5')
Authored multiple arXiv preprints, including 'Towards analyzing and understanding the limitations of DPO' and 'Towards understanding the influence of reward margin on preference model performance'

Co-authors

12 total