🤖 AI Summary
Many counterintuitive phenomena in deep learning—such as double descent, grokking, and the lottery ticket hypothesis—are studied in isolation, lacking empirical validation in real-world applications and failing to advance unified theoretical understanding. Method: This paper proposes a “phenomenon-driven theory construction” paradigm, treating such phenomena as diagnostic probes for testing and refining general theories—not standalone objects of study. Through systematic literature analysis, critical reflection on research methodologies, and cross-case comparative studies, we identify dual deficiencies in current phenomenon research: insufficient empirical grounding and poor theoretical integration. Contribution/Results: We establish a methodological principle that phenomenon research must explicitly serve the evolution of general theory; introduce a practical evaluation framework for phenomenon significance based on reproducibility, cross-task generalizability, and theoretical embeddability; and provide actionable pathways to enhance both the efficiency and practical relevance of foundational deep learning research.
📝 Abstract
Developing a better understanding of surprising or counterintuitive phenomena has constituted a significant portion of deep learning research in recent years. These include double descent, grokking, and the lottery ticket hypothesis -- among many others. Works in this area often develop ad hoc hypotheses attempting to explain these observed phenomena on an isolated, case-by-case basis. This position paper asserts that, in many prominent cases, there is little evidence to suggest that these phenomena appear in real-world applications and these efforts may be inefficient in driving progress in the broader field. Consequently, we argue against viewing them as isolated puzzles that require bespoke resolutions or explanations. However, despite this, we suggest that deep learning phenomena do still offer research value by providing unique settings in which we can refine our broad explanatory theories of more general deep learning principles. This position is reinforced by analyzing the research outcomes of several prominent examples of these phenomena from the recent literature. We revisit the current norms in the research community in approaching these problems and propose practical recommendations for future research, aiming to ensure that progress on deep learning phenomena is well aligned with the ultimate pragmatic goal of progress in the broader field of deep learning.