🤖 AI Summary
Addressing key challenges in molecular odor prediction—including non-smooth target modeling, strong coupling of mixed-dimensional features, and severe label imbalance—this work proposes a harmonic modulation feature mapping mechanism and a cheminformatics-guided dynamic weighting loss function. The former employs frequency-adaptive mapping to decouple heterogeneous multi-source features, enhancing structural representation independence; the latter dynamically reweights samples based on label co-occurrence priors to mitigate long-tail distribution bias. Additionally, we integrate feature importance learning, molecular ensemble optimization, and interpretability constraints to improve model robustness and mapping transparency. Extensive experiments across multiple benchmark datasets demonstrate that our method significantly outperforms existing state-of-the-art models, achieving up to a 12.6% improvement in F1-score for minority odor classes. Moreover, the framework enables mechanistically interpretable inference from molecular structure to odor descriptors.
📝 Abstract
Molecular odor prediction has great potential across diverse fields such as chemistry, pharmaceuticals, and environmental science, enabling the rapid design of new materials and enhancing environmental monitoring. However, current methods face two main challenges: First, existing models struggle with non-smooth objective functions and the complexity of mixed feature dimensions; Second, datasets suffer from severe label imbalance, which hampers model training, particularly in learning minority class labels. To address these issues, we introduce a novel feature mapping method and a molecular ensemble optimization loss function. By incorporating feature importance learning and frequency modulation, our model adaptively adjusts the contribution of each feature, efficiently capturing the intricate relationship between molecular structures and odor descriptors. Our feature mapping preserves feature independence while enhancing the model's efficiency in utilizing molecular features through frequency modulation. Furthermore, the proposed loss function dynamically adjusts label weights, improves structural consistency, and strengthens label correlations, effectively addressing data imbalance and label co-occurrence challenges. Experimental results show that our method significantly can improves the accuracy of molecular odor prediction across various deep learning models, demonstrating its promising potential in molecular structure representation and chemoinformatics.