🤖 AI Summary
To address the difficulty users face in retrieving theorems from mathlib4 due to unfamiliarity with naming conventions and documentation, this paper introduces LeanSearch—the first semantic search engine tailored for the Lean mathematical library. Methodologically: (1) we construct the first evaluable cross-lingual semantic search benchmark mapping natural-language queries to formal theorems; (2) we propose a joint encoding strategy for theorems and their associated docstrings to build a customized dense semantic index over mathlib4; and (3) we implement an end-to-end embedded retrieval system. Our contributions include establishing the first reproducible, evaluable semantic search infrastructure for mathlib4; deploying a publicly accessible service (leansearch.net); and achieving significant improvements in retrieval accuracy and onboarding experience for novice users—thereby facilitating collaborative formalization within the Lean community.
📝 Abstract
The interactive theorem prover Lean enables the verification of formal mathematical proofs and is backed by an expanding community. Central to this ecosystem is its mathematical library, mathlib4, which lays the groundwork for the formalization of an expanding range of mathematical theories. However, searching for theorems in mathlib4 can be challenging. To successfully search in mathlib4, users often need to be familiar with its naming conventions or documentation strings. Therefore, creating a semantic search engine that can be used easily by individuals with varying familiarity with mathlib4 is very important. In this paper, we present a semantic search engine (https://leansearch.net/) for mathlib4 that accepts informal queries and finds the relevant theorems. We also establish a benchmark for assessing the performance of various search engines for mathlib4.