Latent Factor Models Meets Instructions:Goal-conditioned Latent Factor Discovery without Task Supervision

📅 2025-02-21

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge of task-directed latent factor discovery in unsupervised settings. We propose an instruction-driven approach that requires no task-specific annotations: a large language model (LLM) parses natural-language goal instructions to extract fine-grained document attributes; cross-document attribute co-occurrence is modeled, and gradient-based latent factor clustering optimizes interpretable, co-occurring attribute clusters aligned with the target objective. Our method achieves the first deep integration of LLM instruction understanding with statistical latent variable modeling, overcoming key limitations of pure LLMs—namely, sensitivity to data noise and bounded domain knowledge. Evaluated on movie recommendation, text-world navigation, and legal document classification, it improves downstream task performance by 5–52%. Human evaluation confirms that the discovered latent factors achieve 1.8× higher interpretability than the best baseline.

Technology Category

Application Category

📝 Abstract

Instruction-following LLMs have recently allowed systems to discover hidden concepts from a collection of unstructured documents based on a natural language description of the purpose of the discovery (i.e., goal). Still, the quality of the discovered concepts remains mixed, as it depends heavily on LLM's reasoning ability and drops when the data is noisy or beyond LLM's knowledge. We present Instruct-LF, a goal-oriented latent factor discovery system that integrates LLM's instruction-following ability with statistical models to handle large, noisy datasets where LLM reasoning alone falls short. Instruct-LF uses LLMs to propose fine-grained, goal-related properties from documents, estimates their presence across the dataset, and applies gradient-based optimization to uncover hidden factors, where each factor is represented by a cluster of co-occurring properties. We evaluate latent factors produced by Instruct-LF on movie recommendation, text-world navigation, and legal document categorization tasks. These interpretable representations improve downstream task performance by 5-52% than the best baselines and were preferred 1.8 times as often as the best alternative, on average, in human evaluation.

Problem

Research questions and friction points this paper is trying to address.

Goal-conditioned latent factor discovery

Handling noisy datasets beyond LLM knowledge

Improving interpretable representations for downstream tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LLMs with statistical models

Uses gradient-based optimization for factor discovery

Produces interpretable latent factor representations

🔎 Similar Papers

No similar papers found.