Representative Language Generation

📅 2025-05-27
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses three interrelated challenges in generative modeling: disproportionate representation of interest groups in outputs, insufficient diversity, and cumulative bias. To tackle these, we propose the framework of *representative generation*, which formally requires the generated distribution to exactly match the prior group proportions observed in the training data. We provide the first formal task definition and introduce a novel combinatorial measure—the *group closure dimension*—which reveals that representative generation is information-theoretically feasible yet computationally undecidable, thereby extending classical generative theory. Building upon the Kleinberg–Li generative framework, we integrate combinatorial analysis, information theory, and computability theory to design a non-uniform representative generator. We prove that for countably infinite hypothesis classes and group sets, representative generation is information-feasible, yet not computable via membership queries alone—establishing the first rigorous theoretical foundation for fairness-aware generative modeling.

Technology Category

Application Category

📝 Abstract
We introduce"representative generation,"extending the theoretical framework for generation proposed by Kleinberg et al. (2024) and formalized by Li et al. (2024), to additionally address diversity and bias concerns in generative models. Our notion requires outputs of a generative model to proportionally represent groups of interest from the training data. We characterize representative uniform and non-uniform generation, introducing the"group closure dimension"as a key combinatorial quantity. For representative generation in the limit, we analyze both information-theoretic and computational aspects, demonstrating feasibility for countably infinite hypothesis classes and collections of groups under certain conditions, but proving a negative result for computability using only membership queries. This contrasts with Kleinberg et al.'s (2024) positive results for standard generation in the limit. Our findings provide a rigorous foundation for developing more diverse and representative generative models.
Problem

Research questions and friction points this paper is trying to address.

Extends generation framework to address diversity and bias
Ensures outputs proportionally represent training data groups
Analyzes feasibility for infinite hypothesis classes and groups
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends generation framework for diversity and bias
Introduces group closure dimension as key quantity
Analyzes information-theoretic and computational feasibility
🔎 Similar Papers
No similar papers found.