| Sign up

PDF (1.8 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey

Taojie Kuang^¹, Pengfei Liu^², Zhixiang Ren^³()

1Peng Cheng National Laboratory, Shenzhen 518000, China, and also with School of Future Technology, South China University of Technology, Guangzhou 511442, China

2Peng Cheng National Laboratory, Shenzhen 518000, China, and also with School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China

3Peng Cheng National Laboratory, Shenzhen 518000, China

Show Author Information

Abstract

The precise prediction of molecular properties is essential for advancements in drug development, particularly in virtual screening and compound optimization. The recent introduction of numerous deep learningbased methods has shown remarkable potential in enhancing Molecular Property Prediction (MPP), especially improving accuracy and insights into molecular structures. Yet, two critical questions arise: does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods? To explore these matters, we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks. We discover that integrating molecular information significantly improves Molecular Property Prediction (MPP) for both regression and classification tasks. Specifically, regression improvements, measured by reductions in Root Mean Square Error (RMSE), are up to 4.0%, while classification enhancements, measured by the area under the receiver operating characteristic curve (ROC-AUC), are up to 1.7%. Additionally, we discover that, as measured by ROC-AUC, augmenting 2D graphs with 3D information improves performance for classification tasks by up to 13.2% and enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%. The two consolidated insights offer crucial guidance for future advancements in drug discovery.

Keywords

Molecular Property Prediction (MPP)Deep Learning (DL)domain knowledge multi-modality drug discovery

References

[1]

J. Shen and C. A Nicolaou, Molecular property prediction: Recent trends in the era of artificial intelligence, Drug Discov Today Technol., vol. 32–33, pp. 29–36, 2019.

Crossref Google Scholar

[2]

Z. Li, M. Jiang, S. Wang, and S. Zhang, Deep learning methods for molecular representation and property prediction, Drug Discov. Today, vol. 27, no. 12, p. 103373, 2022.

Crossref Google Scholar

[3]

H. Ma, C. Yan, Y. Guo, S. Wang, Y. Wang, H. Sun, and J. Huang, Improving molecular property prediction on limited data with deep multi-label learning, in Proc. 2020 IEEE Int. Conf. Bioinformatics and Biomedicine (BIBM ), Seoul, Republic of Korea, 2020, pp. 2779–2784.

[4]

X. Lin, Z. Quan, Z. J. Wang, H. Huang, and X. Zeng, A novel molecular representation with BiGRU neural networks for learning atom, Brief. Bioinform., vol. 21, no. 6, pp. 2099–2111, 2020.

Crossref Google Scholar

[5]

Q. Lv, G. Chen, L. Zhao, W. Zhong, and C. Y. C. Chen, Mol2Context-vec: Learning molecular representation from context awareness for drug discovery, Brief. Bioinform., vol. 22, no. 6, p. bbab317, 2021.

Crossref Google Scholar

[6]

S. Han, H. Fu, Y. Wu, G. Zhao, Z. Song, F. Huang, Z. Zhang, S. Liu, and W. Zhang, HimGNN: A novel hierarchical molecular graph representation learning framework for property prediction, Brief. Bioinform., vol. 24, no. 5, p. bbad305, 2023.

Crossref Google Scholar

[7]

G. Bouritsas, F. Frasca, S. Zafeiriou, and M. M. Bronstein, Improving graph neural network expressivity via subgraph isomorphism counting, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 657–668, 2023.

Crossref Google Scholar

[8]

Y. Song, S. Zheng, Z. Niu, Z. H. Fu, Y. Lu, and Y. Yang, Communicative representation learning on attributed molecular graphs, in Proc. 29^th Int. Joint Conf. Artificial Intelligence, Yokohama, Japan, 2020, pp. 2831–2838.

[9]

H. Li, D. Zhao, and J. Zeng, KPGT: Knowledge-guided pre-training of graph transformer for molecular property prediction, in Proc. 28^th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, Washington, DC, USA, 2022, pp. 857–867.

[10]

Z. Xiong, D. Wang, X. Liu, F. Zhong, X. Wan, X. Li, Z. Li, X. Luo, K. Chen, H. Jiang, et al., Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism, J. Med. Chem., vol. 63, no. 16, pp. 8749–8760, 2020.

Crossref Google Scholar

[11]

Y. Rong, Y. Bian, T. Xu, W. Xie, Y. Wei, W. Huang, and J. Huang, Self-supervised graph transformer on large-scale molecular data, in Proc. 34^th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 12559–12571.

[12]

J. Ross, B. Belgodere, V. Chenthamarakshan, I. Padhi, Y. Mroueh, and P. Das, Large-scale chemical language representations capture molecular structure and properties, Nat. Mach. Intell., vol. 4, no. 12, pp. 1256–1264, 2022.

Crossref Google Scholar

[13]

S. Yin and G. Zhong, LGI-GT: Graph transformers with local and global operators interleaving, in Proc. 32^nd Int. Joint Conf. Artificial Intelligence, Macao, China, 2023, pp. 4504–4512.

[14]

S. Luo, T. Chen, Y. Xu, S. Zheng, T. Y. Liu, L. Wang, and D. He, One transformer can understand both 2D & 3D molecular data, arXiv preprint arXiv: 2210.01765, 2023.

[15]

X. Zeng, H. Xiang, L. Yu, J. Wang, K. Li, R. Nussinov, and F. Cheng, Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework, Nat. Mach. Intell., vol. 4, no. 11, pp. 1004–1016, 2022.

Crossref Google Scholar

[16]

J. H. Chen and Y. J. Tseng, Different molecular enumeration influences in deep learning: An example using aqueous solubility, Brief. Bioinform., vol. 22, no. 3, p. bbaa092, 2021.

Crossref Google Scholar

[17]

S. Liu, J. Li, K. C. Bennett, B. Ganoe, T. Stauch, M. Head-Gordon, A. Hexemer, D. Ushizima, and T. Head-Gordon, Multiresolution 3D-DenseNet for chemical shift prediction in NMR crystallography, J. Phys. Chem. Lett., vol. 10, no. 16, p. 4558–4565, 2019.

Crossref Google Scholar

[18]

S. Liu, H. Wang, W. Liu, J. Lasenby, H. Guo, and J. Tang, Pre-training molecular graph representation with 3D geometry, arXiv preprint arXiv: 2110.07728, 2022.

[19]

S. Li, J. Zhou, T. Xu, D. Dou, and H. Xiong, GeomGCL: Geometric graph contrastive learning for molecular property prediction, in Proc. 36^th AAAI Conf. Artificial Intelligence, Virtual Event, 2022, pp. 4541–4549.

[20]

J. Zhu, Y. Xia, L. Wu, S. Xie, T. Qin, W. Zhou, H. Li, and T. Y. Liu, Unified 2D and 3D pre-training of molecular representations, in Proc. 28^th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, Washington, DC, USA, 2022, pp. 2626–2636.

[21]

Z. Guo, W. Yu, C. Zhang, M. Jiang, and N. V. Chawla, GraSeq: Graph and sequence fusion learning for molecular property prediction, in Proc. 29^th ACM Int. Conf. Information & Knowledge Management, Virtual Event, 2020, pp. 435–443.

[22]

Y. Wang, J. Wang, Z. Cao, and A. B. Farimani, Molecular contrastive learning of representations via graph neural networks, Nat. Mach. Intell., vol. 4, no. 3, pp. 279–287, 2022.

Crossref Google Scholar

[23]

Y. Fang, Q. Zhang, N. Zhang, Z. Chen, X. Zhuang, X. Shao, X. Fan, and H. Chen, Knowledge graph-enhanced molecular contrastive learning with functional prompt, Nat. Mach. Intell., vol. 5, no. 5, pp. 542–553, 2023.

Crossref Google Scholar

[24]

H. Li, R. Zhang, Y. Min, D. Ma, D. Zhao, and J. Zeng, A knowledge-guided pre-training framework for improving molecular representation learning, Nat. Commun., vol. 14, no. 1, p. 7568, 2023.

Crossref Google Scholar

[25]

Z. Hao, C. Lu, Z. Huang, H. Wang, Z. Hu, Q. Liu, E. Chen, and C. Lee, ASGN: An active semi-supervised graph neural network for molecular property prediction, in Proc. 26^th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, Virtual Event, 2020, pp. 731–752.

[26]

F. Y. Sun, J. Hoffmann, V. Verma, and J. Tang, InfoGraph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization, arXiv preprint arXiv: 1908.01000, 2020.

[27]

D. Zhang, W. Feng, Y. Wang, Z. Qi, Y. Shan, and J. Tang, DropConn: Dropout connection based random GNNs for molecular property prediction, IEEE Trans. Knowl. Data Eng., vol. 36, no. 2, pp. 518–529, 2024.

[28]

Y. Sun, Y. Chen, W. Ma, W. Huang, K. Liu, Z. Ma, W. Y. Ma, and Y. Lan, PEMP: Leveraging physics properties to enhance molecular property prediction, in Proc. 31^st ACM Int. Conf. Information & Knowledge Management, Atlanta, GA, USA, 2022, pp. 3505–3513.

[29]

W. Chen, A. Tripp, and J. M. Hernández-Lobato, Meta-learning adaptive deep kernel Gaussian processes for molecular property prediction, arXiv preprint arXiv: 2205.02708, 2023.

[30]

X. Zhuang, Q. Zhang, B. Wu, K. Ding, Y. Fang, and H. Chen, Graph sampling-based meta-learning for molecular property prediction, arXiv preprint arXiv: 2306.16780, 2023.

[31]

S. Biswas, Y. Chung, J. Ramirez, H. Wu, and W. H. Green, Predicting critical properties and acentric factors of fluids using multitask machine learning, J. Chem. Inf. Model., vol. 63, no. 15, pp. 4574–4588, 2023.

Crossref Google Scholar

[32]

Z. Tan, Y. Li, W. Shi, and S. Yang, A multitask approach to learn molecular properties, J. Chem. Inf. Model., vol. 61, no. 8, pp. 3824–3834, 2021.

Crossref Google Scholar

[33]

Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande, MoleculeNet: A benchmark for molecular machine learning, Chem. Sci., vol. 9, no. 2, pp. 513–530, 2018.

Crossref Google Scholar

[34]

D. Weininger, SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules, J. Chem. Inf. Comput. Sci., vol. 28, no. 1, pp. 31–36, 1988.

Crossref Google Scholar

[35]

D. Weininger, A. Weininger, and J. L. Weininger, SMILES. 2. Algorithm for generation of unique smiles notation, J. Chem. Inf. Comput. Sci., vol. 29, no. 2, pp. 97–101, 1989.

[36]

D. Weininger, SMILES. 3. DEPICT. Graphical depiction of chemical structures, J. Chem. Inf. Comput. Sci., vol. 30, no. 3, pp. 237–243, 1990.

[37]

D. Rogers and M. Hahn, Extended-connectivity fingerprints, J. Chem. Inf. Model., vol. 50, no. 5, pp. 742–754, 2010.

Crossref Google Scholar

[38]

J. L. Durant, B. A. Leland, D. R. Henry, and J. G. Nourse, Reoptimization of mdl keys for use in drug discovery, J. Chem. Inf. Comput. Sci., vol. 42, no. 6, pp. 1273–1280, 2002.

Crossref Google Scholar

[39]

M. Krenn, F. Häse, A. Nigam, P. Friederich, and A. Aspuru-Guzik, Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation, Mach. Learn.: Sci. Technol., vol. 1, no. 4, p. 045024, 2020.

Crossref Google Scholar

[40]

A. D. McNaught and A. Wilkinson, Compendium of Chemical Terminology, 2nd ed. Oxford, UK: Blackwell Science, 1997.

[41]

S. R. Heller, A. McNaught, I. Pletnev, S. Stein, and D. Tchekhovskoi, InChi, the IUPAC international chemical identifier, J. Cheminform., vol. 7, no. 1, p. 23, 2015.

Crossref Google Scholar

[42]

G. Landrum, RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling, Greg Landrum, vol. 8, p. 31, 2013.

[43]

W. L. DeLano, PyMOL: An open-source molecular graphics tool, CCP4 Newsl. Protein Crystallogr, vol. 40, no. 1, pp. 82–92, 2002.

[44]

J. Sunseri and D. R. Koes, Libmolgrid: Graphics processing unit accelerated molecular gridding for deep learning applications, J. Chem. Inf. Model., vol. 60, no. 3, pp. 1079–1084, 2020.

Crossref Google Scholar

[45]

J. Degen, C. Wegscheid-Gerlach, A. Zaliani, and M. Rarey, On the art of compiling and using ‘drug-like’ chemical fragment spaces, ChemMedChem, vol. 3, no. 10, pp. 1503–1507, 2008.

Crossref Google Scholar

[46]

X. Q. Lewell, D. B. Judd, S. P. Watson, and M. M. Hann, RECAP-retrosynthetic combinatorial analysis procedure: A powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry, J. Chem. Inf. Comput. Sci., vol. 38, no. 3, pp. 511–522, 1998.

[47]

G. W. Bemis and M. A. Murcko, The properties of known drugs. 1. Molecular frameworks, J. Med. Chem., vol. 39, no. 15, pp. 2887–2893, 1996.

Crossref Google Scholar

[48]

T. Liu, M. Naderi, C. Alvin, S. Mukhopadhyay, and M. Brylinski, Break down in order to build up: Decomposing small molecules for fragment-based drug design with eMolFrag, J. Chem. Inf. Model., vol. 57, no. 4, pp. 627–631, 2017.

Crossref Google Scholar

[49]

F. Kruger, N. Stiefl, and G. A. Landrum, rdScaffoldNetwork: The scaffold network implementation in RDKit, J. Chem. Inf. Model., vol. 60, no. 7, pp. 3331–3335, 2020.

[50]

C. K. Wu, X. C. Zhang, Z. J. Yang, A. P. Lu, T. J. Hou, and D. S. Cao, Learning to SMILES: Ban-based strategies to improve latent representation learning from molecules, Brief. Bioinform., vol. 22, no. 6, p. bbab327, 2021.

Crossref Google Scholar

[51]

K. Yang, K. Swanson, W. Jin, C. Coley, P. Eiden, H. Gao, A. Guzman-Perez, T. Hopper, B. Kelley, M. Mathea, et al., Analyzing learned molecular representations for property prediction, J. Chem. Inf. Model., vol. 59, no. 8, pp. 3370–3388, 2019.

Crossref Google Scholar

[52]

Y. Ji, G. Wan, Y. Zhan, and B. Du, Metapath-fused heterogeneous graph network for molecular property prediction, Inf. Sci., vol. 629, pp. 155–168, 2023.

Crossref Google Scholar

[53]

X. Fang, L. Liu, J. Lei, D. He, S. Zhang, J. Zhou, F. Wang, H. Wu, and H. Wang, Geometry-enhanced molecular representation learning for property prediction, Nat. Mach. Intell., vol. 4, no. 2, pp. 127–134, 2022.

Crossref Google Scholar

[54]

G. Zhou, Z. Gao, Q. Ding, H. Zheng, H. Xu, Z. Wei, L. Zhang, and G. Ke, Uni-Mol: A universal 3D molecular representation learning framework, chemRxiv. doi: 10.26434/chemrxiv-2022-jjm0j-v4.

[55]

S. Chithrananda, G. Grand, and B. Ramsundar, ChemBERTa: Large-scale self-supervised pretraining for molecular property prediction, arXiv preprint arXiv: 2010.09885, 2020.

[56]

A. Yüksel, E. Ulusoy, A. Ünlü, and T. Doǧan, SELFormer: Molecular representation learning via SELFIES language models, Mach. Learn. : Sci. Technol., vol. 4, no. 2, p. 025035, 2023.

[57]

X. C. Zhang, J. C. Yi, G. P. Yang, C. K. Wu, T. J. Hou, and D. S. Cao, ABC-Net: A divide-and-conquer based deep learning architecture for SMILES recognition from molecular images, Brief. Bioinform., vol. 23, no. 2, p. bbac033, 2022.

[58]

S. Liu, W. Nie, C. Wang, J. Lu, Z. Qiao, L. Liu, J. Tang, C. Xiao, and A. Anandkumar, Multi-modal molecule structure–text model for text-based retrieval and editing, Nat. Mach. Intell., vol. 5, no. 12, pp. 1447–1457, 2023.

Crossref Google Scholar

[59]

P. Liu, X. Qiu, X. Chen, S. Wu, and X. Huang, Multi-timescale long short-term memory neural network for modelling sentences and documents, in Proc. 2015 Conf. Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 2326–2335.

[60]

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Gated feedback recurrent neural networks, in Proc. 32^nd Int. Conf. Int. Conf. Machine Learning, Lille, France, 2015, pp. 2067–2075.

[61]

A. L. Nazarova, L. Yang, K. Liu, A. Mishra, R. K. Kalia, K. I. Nomura, A. Nakano, P. Vashishta, and P. Rajak, Dielectric polymer property prediction using recurrent neural networks with optimizations, J. Chem. Inf. Model., vol. 61, no. 5, pp. 2175–2186, 2021.

Crossref Google Scholar

[62]

Z. Wang, Y. Su, W. Shen, S. Jin, J. H. Clark, J. Ren, and X. Zhang, Predictive deep learning models for environmental properties: The direct calculation of octanol–water partition coefficients from molecular graphs, Green Chem., vol. 21, no. 16, pp. 4555–4565, 2019.

Crossref Google Scholar

[63]

M. Withnall, E. Lindelöf, O. Engkvist, and H. Chen, Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction, J. Cheminform., vol. 12, no. 1, p. 1, 2020.

Crossref Google Scholar

[64]

P. Li, Y. Li, C. Y. Hsieh, S. Zhang, X. Liu, H. Liu, S. Song, and X. Yao, TrimNet: Learning molecular representation from triplet messages for biomedicine, Brief. Bioinform., vol. 22, no. 4, p. bbaa266, 2021.

Crossref Google Scholar

[65]

X. Zhang, C. Chen, Z. Meng, Z. Yang, H. Jiang, and X. Cui, CoAtGIN: Marrying convolution and attention for graph-based molecule property prediction, in Proc. 2022 IEEE Int. Conf. Bioinformatics and Biomedicine (BIBM ), Las Vegas, NV, USA, 2022, pp. 374–379.

[66]

X. Fan, M. Gong, Y. Wu, A. K. Qin, and Y. Xie, Propagation enhanced neural message passing for graph representation learning, IEEE Trans. Knowl. Data Eng., vol. 35, no. 2, pp. 1952–1964, 2023.

[67]

Y. Li, P. Li, X. Yang, C. Y. Hsieh, S. Zhang, X. Wang, R. Lu, H. Liu, and X. Yao, Introducing block design in graph neural networks for molecular properties prediction, Chem. Eng. J., vol. 414, p. 128817, 2021.

Crossref Google Scholar

[68]

H. Ma, Y. Bian, Y. Rong, W. Huang, T. Xu, W. Xie, G. Ye, and J. Huang, Multi-view graph neural networks for molecular property prediction, arXiv preprint arXiv: 2005.13607, 2020.

[69]

X. Liu, X. Wang, J. Wu, and K. Xia, Hypergraph-based persistent cohomology (HPC) for molecular representations in drug design, Brief. Bioinform., vol. 22, no. 5, p. bbaa411, 2021.

Crossref Google Scholar

[70]

J. Feng, Z. Wang, Y. Li, B. Ding, Z. Wei, and H. Xu, MGMAE: Molecular representation learning by reconstructing heterogeneous graphs with a high mask ratio, in Proc. 31^st ACM Int. Conf. Information & Knowledge Management, Atlanta, GA, USA, 2022, pp. 509–519.

[71]

T. Hasebe, Knowledge-embedded message-passing neural networks: Improving molecular property prediction with human knowledge, ACS Omega, vol. 6, no. 42, pp. 27955–27967, 2021.

Crossref Google Scholar

[72]

S. Yang, Z. Li, G. Song, and L. Cai, Deep molecular representation learning via fusing physical and chemical information, in Proc. Annu. Conf. Neural Information Processing Systems, Virtual Event, 2021, pp. 16346–16357.

[73]

X. Zang, X. Zhao, and B. Tang, Hierarchical molecular graph self-supervised learning for property prediction, Commun. Chem., vol. 6, no. 1, p. 34, 2023.

Crossref Google Scholar

[74]

N. Liu, S. Jian, D. Li, Y. Zhang, Z. Lai, and H. Xu, Hierarchical adaptive pooling by capturing high-order dependency for graph representation learning, IEEE Trans. Knowl. Data Eng., vol. 35, no. 4, pp. 3952–3965, 2023.

Crossref Google Scholar

[75]

J. Gao, J. Gao, X. Ying, M. Lu, and J. Wang, Higher-order interaction goes neural: A substructure assembling graph attention network for graph classification, IEEE Trans. Knowl. Data Eng., vol. 35, no. 2, pp. 1594–1608, 2023.

[76]

X. B. Ye, Q. Guan, W. Luo, L. Fang, Z. R. Lai, and J. Wang, Molecular substructure graph attention network for molecular property identification in drug discovery, Pattern Recog., vol. 128, p. 108659, 2022.

Crossref Google Scholar

[77]

W. Zhu, Y. Zhang, D. Zhao, J. Xu, and L. Wang, HiGNN: A hierarchical informative graph neural network for molecular property prediction equipped with feature-wise attention, J. Chem. Inf. Model., vol. 63, no. 1, pp. 43–55, 2023.

Crossref Google Scholar

[78]

C. Lu, Q. Liu, C. Wang, Z. Huang, P. Lin, and L. He, Molecular property prediction: A multilevel quantum interactions modeling perspective, in Proc. 33^rd AAAI Conf. Artificial Intelligence, Honolulu, HI, USA, 2019, pp. 1052–1060.

[79]

M. Fey, J. G. Yuen, and F. Weichert, Hierarchical inter-message passing for learning on molecular graphs, arXiv preprint arXiv: 2006.12179, 2020.

[80]

F. Wu, D. Radev, and S. Z. Li, Molformer: Motif-based transformer on 3D heterogeneous molecular graphs, in Proc. 37^th AAAI Conf. Artificial Intelligence, Washington, DC, USA, 2023, pp. 5312–5320.

[81]

F. B. Fuchs, D. E. Worrall, V. Fischer, and M. Welling, SE(3)-transformers: 3D roto-translation equivariant attention networks, arXiv preprint arXiv: 2006.10503, 2020.

[82]

K. T. Schütt, O. T. Unke, and M. Gastegger, Equivariant message passing for the prediction of tensorial properties and molecular spectra, arXiv preprint arXiv: 2102.03150, 2021.

[83]

J. Brandstetter, R. Hesselink, E. van der Pol, E. J. Bekkers, and M. Welling, Geometric and physical quantities improve E(3) equivariant message passing, arXiv preprint arXiv: 2110.02905, 2022.

[84]

J. Gasteiger, F. Becker, and S. Günnemann, GemNet: Universal directional graph neural networks for molecules, arXiv preprint arXiv: 2106.08903, 2022.

[85]

J. Gasteiger, S. Giri, J. T. Margraf, and S. Günnemann, Fast and uncertainty-aware directional message passing for non-equilibrium molecules, arXiv preprint arXiv: 2011.14115, 2022.

[86]

M. Shuaibi, A. Kolluru, A. Das, A. Grover, A. Sriram, Z. Ulissi, and C. L. Zitnick, Rotation invariant graph neural networks using spin convolutions, arXiv preprint arXiv: 2106.09575, 2021.

[87]

S. Wang, Y. Guo, Y. Wang, H. Sun, and J. Huang, SMILES-BERT: Large scale unsupervised pre-training for molecular property prediction, in Proc. 10^th ACM Int. Conf. Bioinformatics, Computational Biology and Health Informatics, Niagara Falls, NY, USA, 2019, pp. 429–436.

[88]

Y. Wang, X. Chen, Y. Min, and J. Wu, MolCloze: A unified cloze-style self-supervised molecular structure learning model for chemical property prediction, in Proc. 2021 IEEE Int. Conf. Bioinformatics and Biomedicine (BIBM ), Houston, TX, USA, 2021, pp. 2896–2903.

[89]

B. Winter, C. Winter, J. Schilling, and A. Bardow, A smile is all you need: Predicting limiting activity coefficients from smiles with natural language processing, Digit Discov, vol. 1, no. 6, pp. 859–869, 2022.

Crossref Google Scholar

[90]

J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, and Y. Liu, RoFormer: Enhanced transformer with rotary position embedding, Neurocomputing, vol. 568, p. 127063, 2024.

Crossref Google Scholar

[91]

Ł. Maziarka, T. Danel, S. Mucha, K. Rataj, J. Tabor, and S. Jastrzebski, Molecule attention transformer, arXiv preprint arXiv: 2002.08264, 2020.

[92]

W. Park, W. Chang, D. Lee, J. Kim, and S. W. Hwang, GRPE: Relative positional encoding for graph transformer, arXiv preprint arXiv: 2201.12787, 2022.

[93]

M. S. Hussain, M. J. Zaki, and D. Subramanian, Global self-attention as a replacement for graph convolution, in Proc. 28^th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, Washington, DC, USA, 2022, pp. 655–665.

[94]

D. Masters, J. Dean, K. Klaser, Z. Li, S. Maddrell-Mander, A. Sanders, H. Helal, D. Beker, L. Rampášek, and D. Beaini, GPS++: An optimised hybrid MPNN/transformer for molecular property prediction, arXiv preprint arXiv: 2212.02229, 2022.

[95]

Z. Chen, H. Tan, T. Wang, T. Shen, T. Lu, Q. Peng, C. Cheng, and Y. Qi, Graph propagation transformer for graph representation learning, arXiv preprint arXiv: 2305.11424, 2023.

[96]

G. P. Ren, K. J. Wu, and Y. He, Enhancing molecular representations via graph transformation layers, J. Chem. Inf. Model., vol. 63, no. 9, pp. 2679–2688, 2023.

Crossref Google Scholar

[97]

J. Gao, Z. Shen, Y. Xie, J. Lu, Y. Lu, S. Chen, Q. Bian, Y. Guo, L. Shen, J. Wu, et al., TransFoxMol: Predicting molecular property with focused attention, Brief. Bioinform., vol. 24, no. 5, p. bbad306, 2023.

Crossref Google Scholar

[98]

Y. Jiang, S. Jin, X. Jin, X. Xiao, W. Wu, X. Liu, Q. Zhang, X. Zeng, G. Yang, and Z. Niu, Pharmacophoric-constrained heterogeneous graph transformer model for molecular property prediction, Commun. Chem., vol. 6, no. 1, p. 60, 2023.

Crossref Google Scholar

[99]

M. Hirohara, Y. Saito, Y. Koda, K. Sato, and Y. Sakakibara, Convolutional neural network based on smiles representation of compounds for detecting chemical motif, BMC Bioinformatics, vol. 19, no. S19, p. 526, 2018.

Crossref Google Scholar

[100]

P. Jiang, Y. Chi, X. S. Li, Z. Meng, X. Liu, X. S. Hua, and K. Xia, Molecular persistent spectral image (Mol-PSI) representation for machine learning models in drug design, Brief. Bioinform., vol. 23, no. 1, p. bbab527, 2022.

Crossref Google Scholar

[101]

D. Kuzminykh, D. Polykovskiy, A. Kadurin, A. Zhebrak, I. Baskov, S. Nikolenko, R. Shayakhmetov, and A. Zhavoronkov, 3D molecular representations based on the wave transform for convolutional neural networks, Mol. Pharm., vol. 15, no. 10, pp. 4378–4385, 2018.

[102]

H. Cai, H. Zhang, D. Zhao, J. Wu, and L. Wang, FP-GNN: A versatile deep learning architecture for enhanced molecular property prediction, Brief. Bioinform., vol. 23, no. 6, p. bbac408, 2022.

[103]

X. Wang, Z. Li, M. Jiang, S. Wang, S. Zhang, and Z. Wei, Molecule property prediction based on spatial graph embedding, J. Chem. Inf. Model., vol. 59, no. 9, pp. 3817–3828, 2019.

Crossref Google Scholar

[104]

J. Liu, X. Lei, Y. Zhang, and Y. Pan, The prediction of molecular toxicity based on BiGRU and GraphSAGE, Comput. Biol. Med., vol. 153, p. 106524, 2023.

Crossref Google Scholar

[105]

Y. Luo, K. Yang, M. Hong, X. Liu, and Z. Nie, MolFM: A multimodal molecular foundation model, arXiv preprint arXiv: 2307.09484, 2023.

[106]

Y. Sun, M. Islam, E. Zahedi, M. Kuenemann, H. Chouaib, and P. Hu, Molecular property prediction based on bimodal supervised contrastive learning, in Proc. 2022 IEEE Int. Conf. Bioinformatics and Biomedicine (BIBM ), Las Vegas, NV, USA, 2022, pp. 394–397.

[107]

P. Liu, Y. Ren, J. Tao, and Z. Ren, GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text, arXiv preprint arXiv: 2308.06911, 2024.

[108]

Q. Tang, F. Nie, Q. Zhao, and W. Chen, A merged molecular representation deep learning method for blood–brain barrier permeability prediction, Brief. Bioinform., vol. 23, no. 5, p. bbac357, 2022.

Crossref Google Scholar

[109]

T. Zhang, S. Chen, A. Wulamu, X. Guo, Q. Li, and H. Zheng, TransG-Net: Transformer and graph neural network based multi-modal data fusion network for molecular properties prediction, Appl. Intell., vol. 53, no. 12, pp. 16077–16088, 2023.

Crossref Google Scholar

[110]

D. Chen, K. Gao, D. D. Nguyen, X. Chen, Y. Jiang, G. W. Wei, and F. Pan, Algebraic graph-assisted bidirectional transformers for molecular property prediction, Nat. Commun., vol. 12, no. 1, p. 3521, 2021.

Crossref Google Scholar

[111]

W. X. Shen, X. Zeng, F. Zhu, Y. L. Wang, C. Qin, Y. Tan, Y. Y. Jiang, and Y. Z. Chen, Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations, Nat. Mach. Intell., vol. 3, no. 4, pp. 334–343, 2021.

Crossref Google Scholar

[112]

Y. Liu, L. Wang, M. Liu, X. Zhang, B. Oztekin, and S. Ji, Spherical message passing for 3D molecular graphs, arXiv preprint arXiv: 2102.05013, 2022.

[113]

Z. Wang, M. Liu, Y. Luo, Z. Xu, Y. Xie, L. Wang, L. Cai, Q. Qi, Z. Yuan, T. Yang, et al., Advanced graph and sequence neural networks for molecular property prediction and drug discovery, Bioinformatics, vol. 38, no. 9, pp. 2579–2586, 2022.

Crossref Google Scholar

[114]

J. Zhu, Y. Xia, L. Wu, S. Xie, W. Zhou, T. Qin, H. Li, and T. Y. Liu, Dual-view molecular pre-training, in Proc. 29^th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, Long Beach, CA, USA, 2023, pp. 3615–3627.

[115]

M. Sun, J. Xing, H. Wang, B. Chen, and J. Zhou, MoCL: Data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph, in Proc. 27^th ACM SIGKDD Conf. Knowledge Discovery & Data Mining, Singapore, 2021, p. 3585–3594.

[116]

Y. Wang, R. Magar, C. Liang, and A. B. Farimani, Improving molecular contrastive learning via faulty negative mitigation and decomposed fragment contrast, J. Chem. Inf. Model., vol. 62, no. 11, pp. 2713–2725, 2022.

Crossref Google Scholar

[117]

Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, Graph contrastive learning with augmentations, in Proc. 34^th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 5812–5823.

[118]

J. Xia, C. Zhao, B. Hu, Z. Gao, C. Tan, Y. Liu, S. Li, and S. Z. Li, Mole-BERT: Rethinking pre-training graph neural networks for molecules, chemRxiv. doi: 10.26434/chemrxiv-2023-dngg4.

[119]

Z. Wu, D. Jiang, J. Wang, X. Zhang, H. Du, L. Pan, C. Y. Hsieh, D. Cao, and T. Hou, Knowledge-based BERT: A method to extract molecular features like computational chemists, Brief. Bioinform., vol. 23, no. 3, p. bbac131, 2022.

Crossref Google Scholar

[120]

Z. Wu, J. Wang, H. Du, D. Jiang, Y. Kang, D. Li, P. Pan, Y. Deng, D. Cao, C. Y. Hsieh, et al., Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking, Nat. Commun., vol. 14, no. 1, p. 2585, 2023.

Crossref Google Scholar

[121]

S. Kim, J. Nam, J. Kim, H. Lee, S. Ahn, and J. Shin, Fragment-based multi-view molecular contrastive learning, in Proc. ICLR 2023, https://openreview.net/forum?id=9lGwd4q8KJc, 2024.

[122]

F. Wu, H. Qin, S. Li, S. Z. Li, X. Zhan, and J. Xu, InstructBio: A large-scale semi-supervised learning paradigm for biochemical problems, arXiv preprint arXiv: 2304.03906, 2023.

[123]

Q. Lv, G. Chen, Z. Yang, W. Zhong, and C. Y. C. Chen, Meta learning with graph attention networks for low-data drug discovery, IEEE Trans. Neural Netw. Learn. Syst.

[124]

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics : Human Language Technologies, Volume 1 (Long and Short Papers ), Minneapolis, MN, USA, 2019, pp. 4171–4186.

[125]

L. Floridi and M. Chiriatti, GPT-3: Its nature, scope, limits, and consequences, Minds Mach., vol. 30, pp. 681–694, 2020.

[126]

X. C. Zhang, C. K. Wu, Z. J. Yang, Z. X. Wu, J. C. Yi, C. Y. Hsieh, T. J. Hou, and D. S. Cao, MG-BERT: Leveraging unsupervised atomic representation learning for molecular property prediction, Brief. Bioinform., vol. 22, no. 6, p. bbab152, 2021.

[127]

W. Ahmad, E. Simon, S. Chithrananda, G. Grand, and B. Ramsundar, ChemBERTa-2: Towards chemical foundation models, arXiv preprint arXiv: 2209.01712, 2022.

[128]

R. Irwin, S. Dimitriadis, J. He, and E. J. Bjerrum, Chemformer: A pre-trained transformer for computational chemistry, Mach. Learn.: Sci. Technol., vol. 3, no. 1, p. 015022, 2022.

Crossref Google Scholar

[129]

W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec, Strategies for pre-training graph neural networks, arXiv preprint arXiv: 1905.12265, 2020.

[130]

J. Godwin, M. Schaarschmidt, A. Gaunt, A. Sanchez-Gonzalez, Y. Rubanova, P. Veličković, J. Kirkpatrick, and P. Battaglia, Simple GNN regularisation for 3D molecular property prediction & beyond, arXiv preprint arXiv: 2106.07971, 2022.

[131]

S. Liu, H. Guo, and J. Tang, Molecular geometry pretraining with SE(3)-invariant denoising distance matching, arXiv preprint arXiv: 2206.13602, 2023.

[132]

S. Feng, Y. Ni, Y. Lan, Z. M. Ma, and W. Y. Ma, Fractional denoising for 3D molecular pre-training, in Proc. 40^th Int. Conf. Machine Learning, Honolulu, HI, USA, 2023, pp. 9938–9961.

[133]

R. Jiao, J. Han, W. Huang, Y. Rong, and Y. Liu, Energy-motivated equivariant pretraining for 3D molecular graphs, in Proc. 37^th AAAI Conf. Artificial Intelligence, Washington, DC, USA, 2023, pp. 8096–8104.

[134]

X. Gao, W. Gao, W. Xiao, Z. Wang, C. Wang, and L. Xiang, Supervised pretraining for molecular force fields and properties prediction, arXiv preprint arXiv: 2211.14429, 2022.

[135]

X. Wang, H. Zhao, W. W. Tu, and Q. Yao, Automated 3D pre-training for molecular property prediction, in Proc. 29^th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, Long Beach, CA, USA, 2023, pp. 2419–2430.

[136]

L. Zeng, L. Li, and J. Li, MolKD: Distilling cross-modal knowledge in chemical reactions for molecular property prediction, arXiv preprint arXiv: 2305.01912, 2023.

[137]

J. Broberg, M. Bånkestad, and E. Ylipää, Pre-training transformers for molecular property prediction using reaction prediction, arXiv preprint arXiv: 2207.02724, 2022.

[138]

X. C. Zhang, C. K. Wu, J. C. Yi, X. X. Zeng, C. Q. Yang, A. P. Lu, T. J. Hou, and D. S. Cao, Pushing the boundaries of molecular property prediction for drug discovery with multitask learning BERT enhanced by SMILES enumeration, Research, vol. 2022, p. 0004, 2022.

Crossref Google Scholar

[139]

H. Abdel-Aty and I. R. Gould, Large-scale distributed training of transformers for chemical fingerprinting, J. Chem. Inf. Model., vol. 62, no. 20, pp. 4852–4862, 2022.

Crossref Google Scholar

[140]

Z. Zheng, Y. Tan, H. Wang, S. Yu, T. Liu, and C. Liang, CasANGCL: Pre-training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction, Brief. Bioinform., vol. 24, no. 1, p. bbac566, 2023.

Crossref Google Scholar

[141]

X. Guan and D. Zhang, T-MGCL: Molecule graph contrastive learning based on transformer for molecular property prediction, IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 20, no. 6, pp. 3851–3862, 2023.

[142]

H. Liu, Y. Huang, X. Liu, and L. Deng, Attention-wise masked graph contrastive learning for predicting molecular property, Brief. Bioinform., vol. 23, no. 5, p. bbac303, 2022.

Crossref Google Scholar

[143]

S. Lin, C. Liu, P. Zhou, Z. Y. Hu, S. Wang, R. Zhao, Y. Zheng, L. Lin, E. Xing, and X. Liang, Prototypical graph contrastive learning, IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 2, pp. 2747–2758, 2024.

Crossref Google Scholar

[144]

J. Cui, H. Chai, Y. Gong, Y. Ding, Z. Hua, C. Gao, and Q. Liao, MocGCL: Molecular graph contrastive learning via negative selection, in Proc. 2023 Int. Joint Conf. Neural Networks (IJCNN ), Gold Coast, Australia, 2023, pp. 1–8.

[145]

K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, Momentum contrast for unsupervised visual representation learning, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 9726–9735.

[146]

M. J. Zaki and W. Meira Jr, Data Mining and Analysis : Fundamental Concepts and Algorithms. Cambridge, UK: Cambridge University Press, 2014.

[147]

Y. Wang, Y. Min, E. Shao, and J. Wu, Molecular graph contrastive learning with parameterized explainable augmentations, in Proc. 2021 IEEE Int. Conf. Bioinformatics and Biomedicine (BIBM ), Houston, TX, USA, 2021, pp. 1558–1563.

[148]

M. Liu, Y. Yang, X. Gong, L. Liu, and Q. Liu, HierMRL: Hierarchical structure-aware molecular representation learning for property prediction, in Proc. 2022 IEEE Int. Conf. Bioinformatics and Biomedicine (BIBM ), Las Vegas, NV, USA, 2022, pp. 386–389.

[149]

J. Wang, J. Guan, and S. Zhou, Molecular property prediction by contrastive learning with attention-guided positive sample selection, Bioinformatics, vol. 39, no. 5, p. btad258, 2023.

Crossref Google Scholar

[150]

K. Moon, H. J. Im, and S. Kwon, 3D graph contrastive learning for molecular property prediction, Bioinformatics, vol. 39, no. 6, p. btad371, 2023.

[151]

T. Kuang, Y. Ren, and Z. Ren, 3D-mol: A novel contrastive learning framework for molecular property prediction with 3D information, arXiv preprint arXiv: 2309.17366, 2024.

[152]

X. Wu, J. Duan, Y. Pan, and M. Li, Medical knowledge graph: Data sources, construction, reasoning, and applications, Big Data Mining and Analytics, vol. 6, no. 2, pp. 201–217, 2023.

Crossref Google Scholar

[153]

R. Hua, X. Wang, C. Cheng, Q. Zhu, and X. Zhou, A chemical domain knowledge-aware framework for multi-view molecular property prediction, in Proc. 7^th China Conf. Knowledge Graph and Semantic Computing Evaluations, Qinhuangdao, China, 2022, pp. 1–11.

[154]

Y. Fang, Q. Zhang, H. Yang, X. Zhuang, S. Deng, W. Zhang, M. Qin, Z. Chen, X. Fan, and H. Chen, Molecular contrastive learning with chemical element knowledge graph, in Proc. 36^th AAAI Conf. Artificial Intelligence, Virtual Event, 2022, pp. 3968–3976.

[155]

M. Xu, H. Wang, B. Ni, H. Guo, and J. Tang, Self-supervised graph-level representation learning with local and global structure, in Proc. 38^th Int. Conf. Machine Learning, Virtual Event, 2021, pp. 11548–11558.

[156]

X. Xu, C. Deng, Y. Xie, and S. Ji, Group contrastive self-supervised learning on graphs, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 3, pp. 3169–3180, 2023.

[157]

X. Shen, Y. Liu, Y. Wu, and L. Xie, MoLGNN: Self-supervised motif learning graph neural network for drug discovery, in Proc. Machine Learning for Molecules Workshop at NeurIPS 2020, https://ml4molecules.github.io/papers2020/ML4Molecules_2020_paper_4.pdf, 2024.

[158]

X. Luo, W. Ju, M. Qu, Y. Gu, C. Chen, M. Deng, X. S. Hua, and M. Zhang, CLEAR: Cluster-enhanced contrast for self-supervised graph representation learning, IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 1, pp. 899–912, 2024.

[159]

R. Benjamin, U. Singer, and K. Radinsky, Graph neural networks pretraining through inherent supervision for molecular property prediction, in Proc. 31^st ACM Int. Conf. Information & Knowledge Management, Atlanta, GA, USA, 2022, pp. 2903–2912.

[160]

G. Shi, Y. Zhu, J. K. Liu, and X. Li, Hegcl: Advance self-supervised learning in heterogeneous graph-level representation, IEEE Trans. Neural Netw. Learn. Syst.

[161]

A. Xie, Z. Zhang, J. Guan, and S. Zhou, Self-supervised learning with chemistry-aware fragmentation for effective molecular property prediction, Brief. Bioinform., vol. 24, no. 5, p. bbad296, 2023.

Crossref Google Scholar

[162]

Z. Ji, R. Shi, J. Lu, F. Li, and Y. Yang, ReLMole: Molecular representation learning based on two-level graph similarities, J. Chem. Inf. Model., vol. 62, no. 22, pp. 5361–5372, 2022.

Crossref Google Scholar

[163]

R. Hadsell, S. Chopra, and Y. LeCun, Dimensionality reduction by learning an invariant mapping, in Proc. 2006 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR’06 ), New York, NY, USA, 2006, pp. 1735–1742.

[164]

G. A. Pinheiro, J. L. F. Da Silva, and M. G. Quiles, SMICLR: Contrastive learning on multiple molecular representations for semisupervised and unsupervised representation learning, J. Chem. Inf. Model., vol. 62, no. 17, pp. 3948–3960, 2022.

[165]

C. Zhang, X. Yan, and Y. Liu, Pseudo-siamese neural network based graph and sequence representation learning for molecular property prediction, in Proc. 2022 IEEE Int. Conf. Bioinformatics and Biomedicine (BIBM ), Las Vegas, NV, USA, 2022, pp. 3911–3913.

[166]

H. Stärk, D. Beaini, G. Corso, P. Tossou, C. Dallago, S. Günnemann, and P. Liò., 3D infomax improves GNNs for molecular property prediction, in Proc. 39^th Int. Conf. Machine Learning, Baltimore, MD, USA, 2022, pp. 20479–20502.

[167]

Y. Zhu, D. Chen, Y. Du, Y. Wang, Q. Liu, and S. Wu, Molecular contrastive pretraining with collaborative featurizations, Journal of Chemical Information and Modeling, vol. 64, no. 4, pp. 1112–1122, 2024.

[168]

A. Tarvainen and H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, in Proc. 31^st Int. Conf. Advances in Neural Information Processing Systems, Long Beach, California, USA, 2017, pp. 1195–1204.

[169]

J. Chen, Y. W. Si, C. W. Un, and S. W. I. Siu, Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network, J. Cheminform., vol. 13, no. 1, p. 93, 2021.

Crossref Google Scholar

[170]

D. Berthelot, N. Carlini, I. Goodfellow, A. Oliver, N. Papernot, and C. Raffel, MixMatch: A holistic approach to semi-supervised learning, in Proc. 33^rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 5049–5059.

[171]

K. Yu, S. Visweswaran, and K. Batmanghelich, Semi-supervised hierarchical drug embedding in hyperbolic space, J. Chem. Inf. Model., vol. 60, no. 12, pp. 5647–5657, 2020.

Crossref Google Scholar

[172]

H. Ma, F. Jiang, Y. Rong, Y. Guo, and J. Huang, Robust self-training strategy for various molecular biology prediction tasks, in Proc. 13^th ACM Int. Conf. Bioinformatics, Computational Biology and Health Informatics, Northbrook, IL, USA, 2022, pp. 1–5.

[173]

Z. Zhang and M. R. Sabuncu, Generalized cross entropy loss for training deep neural networks with noisy labels, in Proc. 32^nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 8792–8802.

[174]

G. Liu, T. Zhao, E. Inae, T. Luo, and M. Jiang, Semi-supervised graph imbalanced regression, arXiv preprint arXiv: 2305.12087, 2023.

[175]

A. R. Zamir, A. Sax, W. Shen, L. Guibas, J. Malik, and S. Savarese, Taskonomy: Disentangling task transfer learning, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3712–3722.

[176]

X. Li, X. Yan, Q. Gu, H. Zhou, D. Wu, and J. Xu, DeepChemStable: Chemical stability prediction with an attention-based graph convolution network, J. Chem. Inf. Model., vol. 59, no. 3, pp. 1044–1049, 2019.

Crossref Google Scholar

[177]

X. Chen and K. He, Exploring simple siamese representation learning, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 15745–15753.

[178]

H. Li, X. Zhao, S. Li, F. Wan, D. Zhao, and J. Zeng, Improving molecular property prediction through a task similarity enhanced transfer learning strategy, iScience, vol. 25, no. 10, p. 105231, 2022.

Crossref Google Scholar

[179]

W. Ju, Z. Liu, Y. Qin, B. Feng, C. Wang, Z. Guo, X. Luo, and M. Zhang, Few-shot molecular property prediction via hierarchically structured learning on relation graphs, Neural Netw., vol. 163, pp. 122–131, 2023.

Crossref Google Scholar

[180]

C. Q. Nguyen, C. Kreatsoulas, and K. M. Branson, Meta-learning GNN initializations for low-resource molecular property prediction, arXiv preprint arXiv: 2003.05996, 2020.

[181]

L. Torres, J. P. Arrais, and B. Ribeiro, Few-shot learning via graph embeddings with convolutional networks for low-data molecular property prediction, Neural Comput. Appl., vol. 35, no. 18, pp. 13167–13185, 2023.

Crossref Google Scholar

[182]

H. S. de Ocáriz Borde and F. Barbero, Graph neural network expressivity and meta-learning for molecular property regression, arXiv preprint arXiv: 2209.13410, 2022.

[183]

K. P. Ham and L. Sael, Evidential meta-model for molecular property prediction, Bioinformatics, vol. 39, no. 10, p. btad604, 2023.

Crossref Google Scholar

[184]

Z. Meng, Y. Li, P. Zhao, Y. Yu, and I. King, Meta-learning with motif-based task augmentation for few-shot molecular property prediction, in Proc. 2023 SIAM Int. Conf. Data Mining (SDM ), Minneapolis-St. Paul Twin Cities, MN, USA, 2023, pp. 811–819.

[185]

Z. Guo, C. Zhang, W. Yu, J. Herr, O. Wiest, M. Jiang, and N. V. Chawla, Few-shot graph learning for molecular property prediction, in Proc. Web Conf. 2021, Ljubljana, Slovenia, 2021, pp. 2559–2567.

[186]

Y. Wang, A. Abuduweili, Q. Yao, and D. Dou, Property-aware relation networks for few-shot molecular property prediction, arXiv preprint arXiv: 2107.07994, 2021.

[187]

S. Yao, Z. Feng, J. Song, L. Jia, Z. Zhong, and M.Song, Chemical property relation guided few-shot molecular property prediction, in Proc. 2022 Int. Joint Conf. Neural Networks (IJCNN ), Padua, Italy, 2022, pp. 1–8.

[188]

J. Dong, N. N. Wang, Z. J. Yao, L. Zhang, Y. Cheng, D. Ouyang, A. P. Lu, and D. S. Cao, ADMETlab: A platform for systematic ADMET evaluation based on a comprehensively collected ADMET database, J. Cheminform., vol. 10, p. 29, 2018.

[189]

D. van Tilborg, A. Alenicheva, and F. Grisoni, Exposing the limitations of molecular machine learning with activity cliffs, J. Chem. Inf. Model., vol. 62, no. 23, pp. 5938–5951, 2022.

Crossref Google Scholar

[190]

Y. Ji, L. Zhang, J. Wu, B. Wu, L. Li, L. K. Huang, T. Xu, Y. Rong, J. Ren, D. Xue, et al., DrugOOD: Out-of-distribution dataset curator and benchmark for AI-aided drug discovery–a focus on affinity prediction problems with noise annotations, in Proc. 37^th AAAI Conf. Artificial Intelligence, Washington, DC, USA, 2023, pp. 8023–8031.

[191]

S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Schütt, and K. R. Müller, Machine learning of accurate energy-conserving molecular force fields, Sci. Adv., vol. 3, no. 5, p. e1603015, 2017.

Crossref Google Scholar

[192]

C. Morris, N. M. Kriege, F. Bause, K. Kersting, P. Mutzel, and M. Neumann, TUDataset: A collection of benchmark datasets for learning with graphs, arXiv preprint arXiv: 2007.08663, 2020.

[193]

W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, Open graph benchmark: Datasets for machine learning on graphs, in Proc. 34^th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 22118–22133.

[194]

A. Wojtuch, T. Danel, S. Podlewska, and Ł. Maziarka, Extended study on atomic featurization in graph neural networks for molecular property prediction, J. Cheminform., vol. 15, no. 1, p. 81, 2023.

Crossref Google Scholar

[195]

Z. Zeng, Y. Yao, Z. Liu, and M. Sun, A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals, Nat. Commun., vol. 13, no. 1, p. 862, 2022.

Crossref Google Scholar

[196]

K. Xu, W. Hu, J. Leskovec, and S. Jegelka, How powerful are graph neural networks? arXiv preprint arXiv: 1810.00826, 2019.

[197]

B. Su, D. Du, Z. Yang, Y. Zhou, J. Li, A. Rao, H. Sun, Z. Lu, and J. R. Wen, A molecular multimodal foundation model associating molecule graphs with natural language, arXiv preprint arXiv: 2209.05481, 2022.

[198]

X. Tang, A. Tran, J. Tan, and M. B. Gerstein, MolLM: A unified language model to integrate biomedical text with 2D and 3D molecular representations, bioRxiv. doi: 10.1101/2023.11.25.568656.

Big Data Mining and Analytics

Volume 7 Issue 3,
September 2024

Pages 858-888

DOI: 10.26599/BDMA.2024.9020028

Cite this article:

Kuang T, Liu P, Ren Z. Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey. Big Data Mining and Analytics, 2024, 7(3): 858-888. https://doi.org/10.26599/BDMA.2024.9020028

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号