Page Classification for Print Imaging Pipeline

📅 2017-01-29
🏛️ Color Imaging: Displaying, Processing, Hardcopy, and Applications
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing coarse-grained page classification—categorizing document pages into only three types (text-only, image-only, or mixed text-and-image)—severely limits imaging quality optimization in digital copiers/printers. To address this, we propose a fine-grained, imaging-pipeline-oriented five-class page classification scheme: text-only, image-only, mixed text-and-image, receipts, and highlighted text. This work is the first to formally define and recognize the receipt and highlighted-text categories. We design four domain-specific handcrafted features to enhance discriminability for complex layouts and localized annotations. A multi-dimensional feature fusion classifier is built upon SVM, integrating statistical features, edge distribution, connected-component properties, and regional contrast. Evaluated on a real-world printed image dataset, our method achieves a mean classification accuracy of 98.2%, significantly outperforming baseline three-class approaches. The solution has been successfully deployed in commercial imaging systems.

Technology Category

Application Category

📝 Abstract
Digital copiers and printers are widely used nowadays. One of the most important things people care about is copying or printing quality. In order to improve it, we previously came up with an SVM-based classification method to classify images with only text, only pictures or a mixture of both based on the fact that modern copiers and printers are equipped with processing pipelines designed specifically for different kinds of images. However, in some other applications, we need to distinguish more than three classes. In this paper, we develop a more advanced SVM-based classification method using four more new features to classify 5 types of images which are text, picture, mixed, receipt and highlight.
Problem

Research questions and friction points this paper is trying to address.

Classify 5 image types for printers
Enhance SVM method with new features
Improve copier and printer output quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Advanced SVM-based classification method
Uses four additional new features
Classifies five distinct image types
🔎 Similar Papers
No similar papers found.
Shaoyuan Xu
Shaoyuan Xu
Amazon Inc., Senior Applied Scientist
NLPLLMsMulti-modality LearningMachine LearningComputer Vision
C
Cheng Lu
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47906, U.S.A.
M
Mark Q. Shaw
HP Inc., Boise, ID 83714, U.S.A.
P
Péter Bauer
HP Inc., Boise, ID 83714, U.S.A.
J
J. Allebach
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47906, U.S.A.