Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
The effectiveness of AI-driven drug discovery can be enhanced by pretraining on small molecules. However, the conventional masked language model pretraining techniques are not suitable for molecule pretraining due to the limited vocabulary size and the non-sequential structure of molecules. To overcome these challenges, we propose FragAdd, a strategy that involves adding a chemically implausible molecular fragment to the input molecule. This approach allows for the incorporation of rich local information and the generation of a high-quality graph representation, which is advantageous for tasks like virtual screening. Consequently, we have developed a virtual screening protocol that focuses on identifying estrogen receptor alpha binders on a nucleus receptor. Our results demonstrate a significant improvement in the binding capacity of the retrieved molecules. Additionally, we demonstrate that the FragAdd strategy can be combined with other self-supervised methods to further expedite the drug discovery process.
H. F. Lynch and C. T. Robertson, Challenges in confirming drug effectiveness after early approval, Science, vol. 374, no. 6572, pp. 1205–1207, 2021.
M. Schlander, K. Hernandez-Villafuerte, C. Y. Cheng, J. Mestre-Ferrandiz, and M. Baumann, How much does it cost to research and develop a new drug? A systematic review and assessment, PharmacoEconomics, vol. 39, no. 11, pp. 1243–1269, 2021.
S. Simoens and I. Huys, R&D costs of new medicines: A landscape analysis, Front. Med., vol. 8, p. 760762, 2021.
H. Beck, M. Härter, B. Haß, C. Schmeck, and L. Baerfacker, Small molecules and their impact in drug discovery: A perspective on the occasion of the 125th anniversary of the Bayer Chemical Research Laboratory, Drug Discov. Today, vol. 27, no. 6, pp. 1560–1574, 2022.
Y. Ye, Unleashing the power of big data to guide precision medicine in China, Nature, vol. 606, no. 7916, pp. 49–51, 2022.
L. Ericsson, H. Gouk, C. C. Loy, and T. M. Hospedales, Self-Supervised Representation Learning: Introduction, advances, and challenges, IEEE Signal Process. Mag., vol. 39, no. 3, pp. 42–62, 2022.
Y. LeCun and I. Misra, Self-supervised learning: The dark matter of intelligence, https://ai.meta.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/, 2021.
C. Cai, S. Wang, Y. Xu, W. Zhang, K. Tang, Q. Ouyang, L. Lai, and J. Pei, Transfer learning for drug discovery, J. Med. Chem., vol. 63, no. 16, pp. 8683–8694, 2020.
Y. Wang, J. Wang, Z. Cao, and A. Barati Farimani, Molecular contrastive learning of representations via graph neural networks, Nat. Mach. Intell., vol. 4, no. 3, pp. 279–287, 2022.
J. Milton and J. Treffers-Daller, Vocabulary size revisited: The link between vocabulary size and academic achievement, Appl. Linguist. Rev., vol. 4, no. 1, pp. 151–172, 2013.
J. Degen, C. Wegscheid-Gerlach, A. Zaliani, and M. Rarey, On the art of compiling and using ‘drug-like’ chemical fragment spaces, ChemMedChem, vol. 3, no. 10, pp. 1503–1507, 2008.
Z. Wu, B. Ramsundar, E. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande, MoleculeNet: A benchmark for molecular machine learning, Chem. Sci., vol. 9, no. 2, pp. 513–530, 2018.
C. Valsecchi, F. Grisoni, S. Motta, L. Bonati, and D. Ballabio, NURA: A curated dataset of nuclear receptor modulators, Toxicol. Appl. Pharmacol., vol. 407, p. 115244, 2020.
J. Johnson, M. Douze, and H. Jégou, Billion-scale similarity search with GPUs, IEEE Trans. Big Data, vol. 7, no. 3, pp. 535–547, 2021.
O. Trott and A. J. Olson, AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function. efficient optimization, and multithreading, J. Comput. Chem., vol. 31, no. 2, pp. 455–461, 2010.
N. M. O’Boyle, M. Banck, C. A. James, C. Morley, T. Vandermeersch, and G. R. Hutchison, Open Babel: An open chemical toolbox, J. Cheminf., vol. 3, no. 1, p. 33, 2011.
W. L. DeLano, PyMOL: An open-source molecular graphics tool, CCP4 Newsletter On Protein Crystallography, vol. 40, no. 1, pp. 82–92, 2002.
G. Subramanian, B. Ramsundar, V. Pande, and R. A. Denny, Computational modeling of β-secretase 1 (BACE-1) inhibitors using ligand based approaches, J. Chem. Inf. Model., vol. 56, no. 10, pp. 1936–1949, 2016.
K. M. Gayvert, N. S. Madhukar, and O. Elemento, A data-driven approach to predicting successes and failures of clinical trials, Cell Chem. Biol., vol. 23, no. 10, pp. 1294–1301, 2016.
A. A. Sadybekov, A. V. Sadybekov, Y. Liu, C. Iliopoulos-Tsoutsouvas, X. P. Huang, J. Pickett, B. Houser, N. Patel, N. K. Tran, F. Tong, et al., Synthon-based ligand discovery in virtual libraries of over 11 billion compounds, Nature, vol. 601, no. 7893, pp. 452–459, 2022.
F. Gentile, J. C. Yaacoub, J. Gleave, M. Fernandez, A. T. Ton, F. Ban, A. Stern, and A. Cherkasov, Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking, Nat. Protoc., vol. 17, no. 3, pp. 672–697, 2022.
K. Atz, F. Grisoni, and G. Schneider, Geometric deep learning on molecular representations, Nat. Mach. Intell., vol. 3, no. 12, pp. 1023–1032, 2021.
D. Bafna, F. Ban, P. S. Rennie, K. Singh, and A. Cherkasov, Computer-aided ligand discovery for estrogen receptor alpha, Int. J. Mol. Sci., vol. 21, no. 12, p. 4193, 2020.
M. Kriegel, H. J. Wiederanders, S. Alkhashrom, J. Eichler, and Y. A. Muller, A PROSS-designed extensively mutated estrogen receptor α variant displays enhanced thermal stability while retaining native allosteric regulation and structure, Sci. Rep., vol. 11, no. 1, p. 10509, 2021.
D. Probst and J. L. Reymond, Visualization of very large high-dimensional data sets as minimum spanning trees, J. Cheminf., vol. 12, no. 1, p. 12, 2020.
223
Views
38
Downloads
2
Crossref
2
Web of Science
2
Scopus
0
CSCD
Altmetrics
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).