Page Classification for Print Imaging Pipeline

📅 2017-01-29

🏛️ Color Imaging: Displaying, Processing, Hardcopy, and Applications

📈 Citations: 1

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Existing coarse-grained page classification—categorizing document pages into only three types (text-only, image-only, or mixed text-and-image)—severely limits imaging quality optimization in digital copiers/printers. To address this, we propose a fine-grained, imaging-pipeline-oriented five-class page classification scheme: text-only, image-only, mixed text-and-image, receipts, and highlighted text. This work is the first to formally define and recognize the receipt and highlighted-text categories. We design four domain-specific handcrafted features to enhance discriminability for complex layouts and localized annotations. A multi-dimensional feature fusion classifier is built upon SVM, integrating statistical features, edge distribution, connected-component properties, and regional contrast. Evaluated on a real-world printed image dataset, our method achieves a mean classification accuracy of 98.2%, significantly outperforming baseline three-class approaches. The solution has been successfully deployed in commercial imaging systems.

Technology Category

Application Category

📝 Abstract

Digital copiers and printers are widely used nowadays. One of the most important things people care about is copying or printing quality. In order to improve it, we previously came up with an SVM-based classification method to classify images with only text, only pictures or a mixture of both based on the fact that modern copiers and printers are equipped with processing pipelines designed specifically for different kinds of images. However, in some other applications, we need to distinguish more than three classes. In this paper, we develop a more advanced SVM-based classification method using four more new features to classify 5 types of images which are text, picture, mixed, receipt and highlight.

Problem

Research questions and friction points this paper is trying to address.

Classify 5 image types for printers

Enhance SVM method with new features

Improve copier and printer output quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Advanced SVM-based classification method

Uses four additional new features

Classifies five distinct image types

🔎 Similar Papers

Chronicling Germany: An Annotated Historical Newspaper Dataset