Efficient Extractive Text Summarization for Online News Articles Using Machine Learning

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address information overload in online news, this paper proposes a deep learning–based extractive text summarization method. We employ BERT to obtain sentence-level semantic embeddings and formulate summarization as a binary classification task, systematically comparing logistic regression, feedforward neural networks, and LSTM models. The LSTM model explicitly captures inter-sentence sequential dependencies, yielding significant improvements over baselines—including Lede-3—on the Cornell Newsroom dataset (1.3 million samples). Experimental results demonstrate that the proposed LSTM model achieves state-of-the-art performance in both F1 score and ROUGE-1, delivering substantial gains in summary quality. Its high accuracy, combined with computational efficiency and architectural simplicity, ensures practical deployability. This work provides a scalable, technically robust solution for enhancing news content accessibility and improving user engagement.

Technology Category

Application Category

📝 Abstract
In the age of information overload, content management for online news articles relies on efficient summarization to enhance accessibility and user engagement. This article addresses the challenge of extractive text summarization by employing advanced machine learning techniques to generate concise and coherent summaries while preserving the original meaning. Using the Cornell Newsroom dataset, comprising 1.3 million article-summary pairs, we developed a pipeline leveraging BERT embeddings to transform textual data into numerical representations. By framing the task as a binary classification problem, we explored various models, including logistic regression, feed-forward neural networks, and long short-term memory (LSTM) networks. Our findings demonstrate that LSTM networks, with their ability to capture sequential dependencies, outperform baseline methods like Lede-3 and simpler models in F1 score and ROUGE-1 metrics. This study underscores the potential of automated summarization in improving content management systems for online news platforms, enabling more efficient content organization and enhanced user experiences.
Problem

Research questions and friction points this paper is trying to address.

Efficient extractive summarization for online news articles
Using machine learning to generate concise coherent summaries
Addressing information overload with automated content management
Innovation

Methods, ideas, or system contributions that make the work stand out.

BERT embeddings for text representation
LSTM networks for sequential dependencies
Binary classification for extractive summarization
🔎 Similar Papers
No similar papers found.
Sajib Biswas
Sajib Biswas
Doctoral Candidate, Florida State University
Deep LearningLarge Language ModelSoftware Reverse Engineering
Milon Biswas
Milon Biswas
Towson University, 8000 York Rd, Towson, MD, 21252, United States
A
Arunima Mandal
Florida State University, 222 S Copeland St, Tallahassee, FL, 32306, United States
F
Fatema Tabassum Liza
Florida State University, 222 S Copeland St, Tallahassee, FL, 32306, United States
J
Joy Sarker
Bangladesh Agricultural University, BAU Main Road, Mymensingh, 2202, Bangladesh