EDC: Equation Discovery for Classification

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing equation discovery (ED) techniques primarily focus on regression tasks, leaving binary classification unexplored. This paper introduces EDC, the first ED framework tailored for binary classification, designed to learn concise, interpretable analytic functions that accurately characterize the location and shape of decision boundaries. EDC employs a lightweight, extensible grammar system that supports domain-knowledge integration and jointly optimizes both model structure and parameters. It adopts an additive symbolic regression form—including linear, quadratic, exponential, and interaction terms—and leverages evolutionary algorithms for optimal expression search. Experiments on synthetic and real-world datasets demonstrate that EDC significantly outperforms existing ED-based classification methods, achieves performance comparable to state-of-the-art black-box models, and simultaneously delivers strong interpretability, robustness against overfitting, and structural simplicity.

Technology Category

Application Category

📝 Abstract
Equation Discovery techniques have shown considerable success in regression tasks, where they are used to discover concise and interpretable models ( extit{Symbolic Regression}). In this paper, we propose a new ED-based binary classification framework. Our proposed method EDC finds analytical functions of manageable size that specify the location and shape of the decision boundary. In extensive experiments on artificial and real-life data, we demonstrate how EDC is able to discover both the structure of the target equation as well as the value of its parameters, outperforming the current state-of-the-art ED-based classification methods in binary classification and achieving performance comparable to the state of the art in binary classification. We suggest a grammar of modest complexity that appears to work well on the tested datasets but argue that the exact grammar -- and thus the complexity of the models -- is configurable, and especially domain-specific expressions can be included in the pattern language, where that is required. The presented grammar consists of a series of summands (additive terms) that include linear, quadratic and exponential terms, as well as products of two features (producing hyperbolic curves ideal for capturing XOR-like dependencies). The experiments demonstrate that this grammar allows fairly flexible decision boundaries while not so rich to cause overfitting.
Problem

Research questions and friction points this paper is trying to address.

Extends equation discovery from regression to binary classification tasks
Finds analytical functions defining decision boundary location and shape
Discovers both equation structure and parameter values for classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses equation discovery for binary classification
Finds analytical functions defining decision boundaries
Employs configurable grammar with domain-specific expressions
🔎 Similar Papers
No similar papers found.
G
Guus Toussaint
LIACS, Leiden University, Leiden, Netherlands
Arno Knobbe
Arno Knobbe
LIACS, Leiden University
Data MiningData ScienceSports