🤖 AI Summary
Accurately extracting structured triples from complex natural language remains a key challenge in knowledge graph construction, particularly for low-performing extractors. This work proposes leveraging atomic propositions—minimal semantic units—as an interpretable intermediate representation to enhance triple extraction without replacing the underlying system, thereby boosting the performance of weaker models. We distill a multilingual lightweight model, MPropositionneur-V2 (based on the Qwen3-0.6B architecture), from Qwen3-32B and integrate it into extraction frameworks such as GLiREL and Qwen3, employing a fallback strategy that combines the strengths of both strong and weak models. Experiments demonstrate that our approach significantly improves relation recall and multilingual accuracy for weak extractors on SMiLER, FewRel, DocRED, and CaRB, while strong models maintain high entity recall through the fallback mechanism.
📝 Abstract
Knowledge Graph construction from natural language requires extracting structured triplets from complex, information-dense sentences. In this paper, we investigate if the decomposition of text into atomic propositions (minimal, semantically autonomous units of information) can improve the triplet extraction. We introduce MPropositionneur-V2, a small multilingual model covering six European languages trained by knowledge distillation from Qwen3-32B into a Qwen3-0.6B architecture, and we evaluate its integration into two extraction paradigms: entity-centric (GLiREL) and generative (Qwen3). Experiments on SMiLER, FewRel, DocRED and CaRB show that atomic propositions benefit weaker extractors (GLiREL, CoreNLP, 0.6B models), improving relation recall and, in the multilingual setting, overall accuracy. For stronger LLMs, a fallback combination strategy recovers entity recall losses while preserving the gains in relation extraction. These results show that atomic propositions are an interpretable intermediate data structure that complements extractors without replacing them.