UltraWiki: Ultra-fine-grained Entity Set Expansion with Negative Seed Entities

📅 2024-03-07
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Ultra-fine-grained entity set expansion (Ultra-ESE) suffers from semantic ambiguity and difficulty modeling “undesired semantics” when relying solely on positive seed entities. Method: This paper introduces negative seed entities for the first time to explicitly disambiguate highly similar semantic classes, proposing a negative-seed-driven modeling paradigm. We construct UltraWiki—the first ultra-fine-grained benchmark dataset—and design a dual-path evaluation framework: RetExpan (retrieval-augmented expansion) and GenExpan (chain-of-thought–enhanced LLM generation). Results: Experiments across 236 ultra-fine-grained semantic classes demonstrate significant improvements in expansion accuracy. Moreover, our analysis reveals substantial limitations of current large language models in Ultra-ESE tasks, establishing both a novel methodological paradigm and a foundational benchmark to advance future research.

Technology Category

Application Category

📝 Abstract
Entity Set Expansion (ESE) aims to identify new entities belonging to the same semantic class as a given set of seed entities. Traditional methods primarily relied on positive seed entities to represent a target semantic class, which poses challenge for the representation of ultra-fine-grained semantic classes. Ultra-fine-grained semantic classes are defined based on fine-grained semantic classes with more specific attribute constraints. Describing it with positive seed entities alone cause two issues: (i) Ambiguity among ultra-fine-grained semantic classes. (ii) Inability to define"unwanted"semantic. Due to these inherent shortcomings, previous methods struggle to address the ultra-fine-grained ESE (Ultra-ESE). To solve this issue, we first introduce negative seed entities in the inputs, which belong to the same fine-grained semantic class as the positive seed entities but differ in certain attributes. Negative seed entities eliminate the semantic ambiguity by contrast between positive and negative attributes. Meanwhile, it provide a straightforward way to express"unwanted". To assess model performance in Ultra-ESE, we constructed UltraWiki, the first large-scale dataset tailored for Ultra-ESE. UltraWiki encompasses 236 ultra-fine-grained semantic classes, where each query of them is represented with 3-5 positive and negative seed entities. A retrieval-based framework RetExpan and a generation-based framework GenExpan are proposed to comprehensively assess the efficacy of large language models from two different paradigms in Ultra-ESE. Moreover, we devised three strategies to enhance models' comprehension of ultra-fine-grained entities semantics: contrastive learning, retrieval augmentation, and chain-of-thought reasoning. Extensive experiments confirm the effectiveness of our proposed strategies and also reveal that there remains a large space for improvement in Ultra-ESE.
Problem

Research questions and friction points this paper is trying to address.

Address ambiguity in ultra-fine-grained entity classes
Incorporate negative seed entities to define unwanted semantics
Develop frameworks for ultra-fine-grained entity set expansion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces negative seed entities for clarity
Proposes retrieval and generation-based frameworks
Uses contrastive learning and chain-of-thought reasoning
🔎 Similar Papers
No similar papers found.
Y
Yangning Li
SIGS, Tsinghua University
Qingsong Lv
Qingsong Lv
Tsinghua University
Computer ScienceMachine Learning
Tianyu Yu
Tianyu Yu
Tsinghua University
multi-modal learning
Y
Yinghui Li
SIGS, Tsinghua University
S
Shulin Huang
SIGS, Tsinghua University
T
Tingwei Lu
SIGS, Tsinghua University
Xuming Hu
Xuming Hu
Assistant Professor, HKUST(GZ) / HKUST
Natural Language ProcessingLarge Language Model
Wenhao Jiang
Wenhao Jiang
GML, Tencent, PolyU
Computer VisionMachine LearningFoundation Models
H
Hai-Tao Zheng
SIGS, Tsinghua University, PengCheng Laboratory
H
Hui Wang
PengCheng Laboratory