Text Simplification with Sentence Embeddings

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

This work addresses two key challenges in text simplification: difficulty in controlling textual complexity and insufficient model lightweighting. We propose a novel paradigm that models simplification transformations directly in sentence embedding space. Specifically, we employ a feed-forward neural network to learn the mapping between high- and low-complexity sentence pairs within a fixed pre-trained embedding space, and generate simplified text by decoding target embeddings—bypassing the redundant parameters and computational overhead of conventional Seq2Seq or large language models. To our knowledge, this is the first empirical demonstration of the learnability and strong generalization capability of simplification transformations in embedding space. Our model, with only several million parameters, achieves performance on par with state-of-the-art methods on English benchmarks. Moreover, it transfers zero-shot to the medical domain (MedEASI) and multilingual settings (Spanish, German), significantly improving cross-domain and cross-lingual robustness.

Technology Category

Application Category

📝 Abstract

Sentence embeddings can be decoded to give approximations of the original texts used to create them. We explore this effect in the context of text simplification, demonstrating that reconstructed text embeddings preserve complexity levels. We experiment with a small feed forward neural network to effectively learn a transformation between sentence embeddings representing high-complexity and low-complexity texts. We provide comparison to a Seq2Seq and LLM-based approach, showing encouraging results in our much smaller learning setting. Finally, we demonstrate the applicability of our transformation to an unseen simplification dataset (MedEASI), as well as datasets from languages outside the training data (ES,DE). We conclude that learning transformations in sentence embedding space is a promising direction for future research and has potential to unlock the ability to develop small, but powerful models for text simplification and other natural language generation tasks.

Problem

Research questions and friction points this paper is trying to address.

Transforming complex to simple text using sentence embeddings

Learning transformations between high and low complexity embeddings

Developing small models for multilingual text simplification tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses feed forward neural network for transformation

Learns complexity transformation in embedding space

Applies transformation to unseen datasets and languages

🔎 Similar Papers

No similar papers found.