Symbolic Neural Generation with Applications to Lead Discovery in Drug Design

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of generating molecular structures in drug design that simultaneously satisfy syntactic validity, chemical feasibility, and interpretability. We propose the Symbolic Neural Generator (SNG) framework, which tightly integrates Inductive Logic Programming (ILP) with large language models (LLMs) to explicitly constrain neural generation via symbolic rules. SNG establishes a formal generative system grounded in partial-order semantics and supports probabilistic extension. It jointly outputs symbolic descriptions, concrete molecular instances, and associated confidence weights, enabling efficient modeling of complex chemical spaces even under few-shot conditions. Experiments demonstrate state-of-the-art performance on benchmark tasks; generated molecules exhibit predicted binding affinities toward exploratory targets at clinical-candidate levels; several candidates have been validated by domain experts and advanced to synthetic validation. Our core contribution is the first deep integration of formal symbolic reasoning into generative AI for molecular design, markedly enhancing reliability and interpretability.

Technology Category

Application Category

📝 Abstract
We investigate a relatively underexplored class of hybrid neurosymbolic models integrating symbolic learning with neural reasoning to construct data generators meeting formal correctness criteria. In extit{Symbolic Neural Generators} (SNGs), symbolic learners examine logical specifications of feasible data from a small set of instances -- sometimes just one. Each specification in turn constrains the conditional information supplied to a neural-based generator, which rejects any instance violating the symbolic specification. Like other neurosymbolic approaches, SNG exploits the complementary strengths of symbolic and neural methods. The outcome of an SNG is a triple $(H, X, W)$, where $H$ is a symbolic description of feasible instances constructed from data, $X$ a set of generated new instances that satisfy the description, and $W$ an associated weight. We introduce a semantics for such systems, based on the construction of appropriate extit{base} and extit{fibre} partially-ordered sets combined into an overall partial order, and outline a probabilistic extension relevant to practical applications. In this extension, SNGs result from searching over a weighted partial ordering. We implement an SNG combining a restricted form of Inductive Logic Programming (ILP) with a large language model (LLM) and evaluate it on early-stage drug design. Our main interest is the description and the set of potential inhibitor molecules generated by the SNG. On benchmark problems -- where drug targets are well understood -- SNG performance is statistically comparable to state-of-the-art methods. On exploratory problems with poorly understood targets, generated molecules exhibit binding affinities on par with leading clinical candidates. Experts further find the symbolic specifications useful as preliminary filters, with several generated molecules identified as viable for synthesis and wet-lab testing.
Problem

Research questions and friction points this paper is trying to address.

Combines symbolic learning with neural reasoning for data generation
Generates drug molecules satisfying formal symbolic specifications
Addresses early-stage drug discovery for poorly understood targets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid neurosymbolic models combine symbolic learning with neural reasoning
Symbolic learners derive logical constraints from minimal data instances
Neural generators produce data adhering to formal symbolic specifications
🔎 Similar Papers
No similar papers found.