LinguaSynth: Heterogeneous Linguistic Signals for News Classification

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the poor interpretability, high computational cost, and strong black-box nature of deep learning models in news classification, this paper proposes a lightweight, interpretable classification framework based on multi-source linguistic feature fusion. The method systematically integrates five heterogeneous linguistic signals—lexical (part-of-speech tagging), syntactic (dependency structures), named entity (NER), word-level (Word2Vec), and document-level (Doc2Vec) representations—and empirically validates their complementary contributions to distributional semantics via feature interaction analysis. A transparent logistic regression classifier is employed as the prediction module. Evaluated on the 20 Newsgroups dataset, the framework achieves 84.89% accuracy—outperforming the TF-IDF baseline by 3.32%—while maintaining low computational overhead and full model transparency. This work establishes a novel paradigm for efficient, interpretable NLP, reconciling high performance with rigorous explainability and practical deployability.

Technology Category

Application Category

📝 Abstract

Deep learning has significantly advanced NLP, but its reliance on large black-box models introduces critical interpretability and computational efficiency concerns. This paper proposes LinguaSynth, a novel text classification framework that strategically integrates five complementary linguistic feature types: lexical, syntactic, entity-level, word-level semantics, and document-level semantics within a transparent logistic regression model. Unlike transformer-based architectures, LinguaSynth maintains interpretability and computational efficiency, achieving an accuracy of 84.89 percent on the 20 Newsgroups dataset and surpassing a robust TF-IDF baseline by 3.32 percent. Through rigorous feature interaction analysis, we show that syntactic and entity-level signals provide essential disambiguation and effectively complement distributional semantics. LinguaSynth sets a new benchmark for interpretable, resource-efficient NLP models and challenges the prevailing assumption that deep neural networks are necessary for high-performing text classification.

Problem

Research questions and friction points this paper is trying to address.

Improving interpretability in NLP models

Enhancing computational efficiency for text classification

Integrating diverse linguistic signals effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates five complementary linguistic feature types

Uses transparent logistic regression model

Achieves high accuracy without deep neural networks

🔎 Similar Papers

No similar papers found.

Authors to Follow