ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
This work addresses the need for verifiable ranking outputs in decision support systems by introducing the Evidence-Certified Candidate Ranking (ECCR) task, which jointly optimizes ranking and evidence generation to ensure that cited text segments are sufficient to reproduce the final decision. To this end, the authors propose ECPO, a listwise policy optimization framework that integrates skeleton alignment rewards, argument consistency constraints, graph-based features, and an evidence loop reward mechanism. They also introduce CertNDCG—a novel evaluation metric—and an unsupervised certainty verifier to enforce coherence between decisions and their supporting evidence. Experiments on the MAVEN-ERE and RAMS datasets demonstrate that the proposed approach significantly outperforms zero-shot, supervised fine-tuning (SFT), and GRPO baselines, achieving state-of-the-art CertNDCG performance across diverse candidate configurations.
📝 Abstract
Ranking systems used in decision-support settings should not only order candidates but also expose evidence that can be independently checked. We study evidence-certified candidate ranking: given an intent_id, a predefined plan skeleton, a window-local candidate roster, and text-derived candidate trajectories with span provenance, a system must output a Top-K list together with doc_id:span evidence certificates whose cited spans are sufficient to recover the decision. We instantiate this task on MAVEN-ERE and RAMS with fixed upstream extraction, window-local randomized candidate identifiers, skeleton-aligned trajectory supervision, hard negatives, and audit references. We introduce Evidence-Coupled Policy Optimization (ECPO), a listwise policy-optimization objective whose action is the joint object of ranking and evidence certificate. ECPO first learns an interpretable trajectory reward from skeleton alignment, argument consistency, and optional graph features; it then optimizes a constrained policy with three coupled rewards: listwise ranking utility, span-level certificate validity, and an evidence-cycle reward computed by a label-free deterministic verifier that reconstructs candidate support from claim-stripped cited spans. This reframes the goal from maximizing ordinary NDCG alone to maximizing CertNDCG and decision-evidence coupling. The evaluation compares ECPO against zero-shot, SFT, and GRPO policies, RM-only scoring with deterministic evidence attachment, grammar/JSON-constrained decoding, validator retry, best-of-N RM selection, and post-hoc evidence rationalization under closed-roster, predicted-roster, and hybrid-roster settings.
Problem

Research questions and friction points this paper is trying to address.

evidence-certified ranking
candidate ranking
interpretable evidence
decision support
span provenance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evidence-Coupled Policy Optimization
Evidence-Certified Ranking
CertNDCG
Deterministic Verifier
Listwise Policy Optimization
🔎 Similar Papers
No similar papers found.
M
Miaobo Hu
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
S
Shuhao Hu
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
B
BoKun Wang
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Y
Yina Sa
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Xiaobo Guo
Xiaobo Guo
Dartmouth College
machine learningdeep learningnatural language processingsocia mediapropagantion
Xin Wang
Xin Wang
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Biomedical Engineering
D
Daren Zha
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
J
Jun Xiao
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China