On adaptivity and minimax optimality of two-sided nearest neighbors

📅 2024-11-20

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This paper addresses matrix completion under high missingness rates using a nonsmooth, nonlinear latent factor model, where the nonlinearity belongs to a Hölder class (not necessarily Lipschitz continuous). To overcome the limitations of existing methods in handling low smoothness and severe missingness, we propose a two-sided nearest-neighbor (NN) estimator. Theoretically, our method is the first to achieve adaptivity to the unknown Hölder exponent, attaining oracle-optimal convergence rates under row/column latent factor priors—even under deterministic, large-scale missingness mechanisms. Its mean squared error (MSE) upper bound explicitly depends on the true smoothness level without requiring prior knowledge of the Hölder parameter. Numerical simulations and analysis of the HeartSteps mobile health dataset demonstrate its superior robustness and performance in highly missing, strongly nonlinear settings.

Technology Category

Application Category

📝 Abstract

Nearest neighbor (NN) algorithms have been extensively used for missing data problems in recommender systems and sequential decision-making systems. Prior theoretical analysis has established favorable guarantees for NN when the underlying data is sufficiently smooth and the missingness probabilities are lower bounded. Here we analyze NN with non-smooth non-linear functions with vast amounts of missingness. In particular, we consider matrix completion settings where the entries of the underlying matrix follow a latent non-linear factor model, with the non-linearity belonging to a Holder function class that is less smooth than Lipschitz. Our results establish following favorable properties for a suitable two-sided NN: (1) The mean squared error (MSE) of NN adapts to the smoothness of the non-linearity, (2) under certain regularity conditions, the NN error rate matches the rate obtained by an oracle equipped with the knowledge of both the row and column latent factors, and finally (3) NN's MSE is non-trivial for a wide range of settings even when several matrix entries might be missing deterministically. We support our theoretical findings via extensive numerical simulations and a case study with data from a mobile health study, HeartSteps.

Problem

Research questions and friction points this paper is trying to address.

Addresses matrix completion with non-smooth nonlinear factor models

Analyzes nearest neighbors under extreme deterministic missing data

Establishes adaptive minimax optimality for two-sided NN method

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-sided nearest neighbors algorithm

Adapts to non-smooth Holder functions

Handles deterministic missing entries

🔎 Similar Papers

Effective and General Distance Computation for Approximate Nearest Neighbor Search