🤖 AI Summary
This paper addresses the problem of learning the topological structure of directed acyclic graph (DAG) Gaussian graphical models under the equal-variance assumption. Existing methods suffer from sample complexity that grows polynomially with the condition number of the covariance matrix, limiting their applicability in high-dimensional settings. To overcome this bottleneck, we propose the first polynomial-time algorithm whose sample complexity is independent of the condition number. Our method depends only on the maximum in-degree $d$ and $log n$ (where $n$ is the number of nodes), achieving a nearly tight theoretical bound. It leverages structural properties of equal-variance DAGs, integrating statistical hypothesis testing with graph-structure recovery techniques. Rigorous upper-bound analysis and an information-theoretic lower-bound construction establish its theoretical guarantees. Experiments on synthetic data confirm the theoretical predictions and demonstrate substantial improvements over baseline methods that are sensitive to the condition number.
📝 Abstract
We study the problem of learning the topology of a directed Gaussian Graphical Model under the equal-variance assumption, where the graph has $n$ nodes and maximum in-degree $d$. Prior work has established that $O(d log n)$ samples are sufficient for this task. However, an important factor that is often overlooked in these analyses is the dependence on the condition number of the covariance matrix of the model. Indeed, all algorithms from prior work require a number of samples that grows polynomially with this condition number. In many cases this is unsatisfactory, since the condition number could grow polynomially with $n$, rendering these prior approaches impractical in high-dimensional settings. In this work, we provide an algorithm that recovers the underlying graph and prove that the number of samples required is independent of the condition number. Furthermore, we establish lower bounds that nearly match the upper bound up to a $d$-factor, thus providing an almost tight characterization of the true sample complexity of the problem. Moreover, under a further assumption that all the variances of the variables are bounded, we design a polynomial-time algorithm that recovers the underlying graph, at the cost of an additional polynomial dependence of the sample complexity on $d$. We complement our theoretical findings with simulations on synthetic datasets that confirm our predictions.