🤖 AI Summary
This work addresses the challenge of automatically discovering interpretable and discriminative global features from unstructured text by proposing a dataset-level prompt optimization approach. It extends prompt learning beyond the instance level to the dataset level for the first time, employing a multi-agent collaborative framework that iteratively generates feature definitions, extracts feature values, and jointly optimizes a shared prompt based on feedback from both downstream classification performance and interpretability. Experimental results demonstrate that the method automatically produces high-quality, human-understandable feature sets across multiple text classification tasks, significantly enhancing model performance and confirming its effectiveness and generalizability.
📝 Abstract
Feature extraction from unstructured text is a critical step in many downstream classification pipelines, yet current approaches largely rely on hand-crafted prompts or fixed feature schemas. We formulate feature discovery as a dataset-level prompt optimization problem: given a labelled text corpus, the goal is to induce a global set of interpretable and discriminative feature definitions whose realizations optimize a downstream supervised learning objective. To this end, we propose a multi-agent prompt optimization framework in which language-model agents jointly propose feature definitions, extract feature values, and evaluate feature quality using dataset-level performance and interpretability feedback. Instruction prompts are iteratively refined based on this structured feedback, enabling optimization over prompts that induce shared feature sets rather than per-example predictions. This formulation departs from prior prompt optimization methods that rely on per-sample supervision and provides a principled mechanism for automatic feature discovery from unstructured text.