🤖 AI Summary
This study addresses the challenge of achieving near-optimal information aggregation for binary classification in a directed acyclic graph where each agent observes only a subset of feature columns from a shared dataset. The authors propose a distributed logistic regression method in which agents sequentially integrate predictions from their parent nodes with their local features to train local models and forward predictions downstream. This work extends information aggregation theory from linear to logistic regression for the first time, revealing that network depth constitutes a fundamental performance bottleneck. The authors establish a tight excess risk bound of $O(M/\sqrt{D})$ when any $M$ consecutive nodes collectively cover all features, and construct an instance demonstrating a matching lower bound of $\Omega(k/D)$, where $k$ denotes the feature dimensionality and $D$ the path depth.
📝 Abstract
We study networked binary classification on a directed acyclic graph (DAG) where each agent observes only a subset of the feature columns of a shared dataset. Agents act sequentially along the DAG: each receives prediction columns from its parents (if any), augments its local features with these columns, fits a logistic predictor by minimizing binary cross-entropy (BCE), and forwards its prediction column to its outgoing neighbors. We ask whether this sequential distributed training procedure achieves information aggregation, meaning that some agent attains small excess loss compared to the best logistic predictor trained with access to all feature columns.
This question was studied for linear regression under squared loss by Kearns, Roth, and Ryu (SODA 2026). Extending their guarantees to classification is nontrivial because their analysis relies on quadratic structure that does not directly transfer to BCE with a logistic link. We analyze the resulting sequential logit-passing protocol and prove: (i) an excess loss upper bound of $O(M/\sqrt{D})$ on depth-$D$ paths under the condition that every $M$ contiguous subsequence of $M$ agents collectively observe all features, and (ii) a close lower bound showing instances with excess loss of at least $Ω(k/D)$ where $k$ is the dimension of the feature space. Together, these results identify network depth as a fundamental bottleneck for information aggregation in networked logistic regression.