iBERT: Interpretable Style Embeddings via Sense Decomposition

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

This work addresses the lack of intrinsic interpretability and fine-grained controllability in text embeddings. We propose a Structured Interpretable Text Encoder (SITE) that represents each token as a sparse, non-negative combination of context-independent semantic vectors, explicitly disentangling stylistic attributes (e.g., formality, misspellings, emoji usage) from semantic content and enabling modular control. Built upon the BERT architecture, SITE incorporates a sparse non-negative mixture mechanism and a hierarchical pooling strategy to jointly support token-level and sentence-level modeling. On the STEL benchmark, SITE achieves an ~8-percentage-point improvement in style representation accuracy over SBERT while maintaining competitive performance on downstream tasks. These results demonstrate SITE’s effectiveness in multi-attribute disentanglement, generalization under mixed supervision, and precise stylistic controllability.

Technology Category

Application Category

📝 Abstract

We present iBERT (interpretable-BERT), an encoder to produce inherently interpretable and controllable embeddings - designed to modularize and expose the discriminative cues present in language, such as stylistic and semantic structure. Each input token is represented as a sparse, non-negative mixture over k context-independent sense vectors, which can be pooled into sentence embeddings or used directly at the token level. This enables modular control over representation, before any decoding or downstream use. To demonstrate our model's interpretability, we evaluate it on a suite of style-focused tasks. On the STEL benchmark, it improves style representation effectiveness by ~8 points over SBERT-style baselines, while maintaining competitive performance on authorship verification. Because each embedding is a structured composition of interpretable senses, we highlight how specific style attributes - such as emoji use, formality, or misspelling can be assigned to specific sense vectors. While our experiments center on style, iBERT is not limited to stylistic modeling. Its structural modularity is designed to interpretably decompose whichever discriminative signals are present in the data - enabling generalization even when supervision blends stylistic and semantic factors.

Problem

Research questions and friction points this paper is trying to address.

Creates interpretable embeddings to reveal stylistic and semantic language cues

Enables modular control over representations before downstream applications

Decomposes discriminative signals to handle blended stylistic and semantic factors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse non-negative mixture over sense vectors

Modular control over representation before decoding

Interpretable decomposition of discriminative signals

🔎 Similar Papers

No similar papers found.