Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery, which requires the optimization of a specific objective based on satisfying chemical rules. Herein, we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated molecule. The Matched Molecular Pairs (MMPs), which contain the source and target molecules, are used herein, and logD and solubility are selected as the optimization properties. The main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix dimension. Threshold intervals and state changes are then used to encode logD and solubility for subsequent tests. During the experiments, we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12365, 1503, and 1570 MMPs as the training, validation, and test sets, respectively. Transformer models are compared with the baseline models with respect to their abilities to generate molecules with specific properties. Results show that the transformer model can accurately optimize the source molecules to satisfy specific properties.
P. G. Polishchuk, T. I. Madzhidov, and A. Varnek, Estimation of the size of drug-like chemical space based on GDB-17 data, J. Comput. Aided Mol. Des., vol. 27, no. 8, pp. 675–679, 2013.
S. Heller, A. McNaught, S. Stein, D. Tchekhovskoi, and I. Pletnev, InChI − the worldwide chemical structure identifier standard, J. Cheminform., vol. 5, no. 1, p. 7, 2013.
M. Krenn, F. Häse, A. K. Nigam, P. Friederich, and A. Aspuru-Guzik, Self-Referencing Embedded Strings (SELFIES): A 100% robust molecular string representation, Mach. Learn. Sci. Technol., vol. 1, no. 4, p. 045024, 2020.
A. Gupta, A. T. Müller, B. J. H. Huisman, J. A. Fuchs, P. Schneider, and G. Schneider, Generative recurrent networks for de novo drug design, Mol. Inform., vol. 37, nos. 1&2, p. 1700111, 2018.
M. H. S. Segler, T. Kogej, C. Tyrchan, and M. P. Waller, Generating focused molecule libraries for drug discovery with recurrent neural networks, ACS Cent. Sci., vol. 4, no. 1, pp. 120–131, 2018.
R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-Lobato, B. Sánchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, and A. Aspuru-Guzik, Automatic chemical design using a data-driven continuous representation of molecules, ACS Cent. Sci., vol. 4, no. 2, pp. 268–276, 2018.
J. Lim, S. Ryu, J. W. Kim, and W. Y. Kim, Molecular generative model based on conditional variational autoencoder for de novo molecular design, J. Cheminform., vol. 10, p. 31, 2018.
E. Putin, A. Asadulaev, Y. Ivanenkov, V. Aladinskiy, B. Sanchez-Lengeling, A. Aspuru-Guzik, and A. Zhavoronkov, Reinforced adversarial neural computer for de novo molecular design, J. Chem. Inf. Model., vol. 58, no. 6, pp. 1194–1204, 2018.
E. Putin, A. Asadulaev, Q. Vanhaelen, Y. Ivanenkov, A. V. Aladinskaya, A. Aliper, and A. Zhavoronkov, Adversarial threshold neural computer for molecular de novo design, Mol. Pharm., vol. 15, no. 10, pp. 4386–4397, 2018.
M. Lee and K. Min, MGCVAE: Multi-objective inverse design via molecular graph conditional variational autoencoder, J. Chem. Inf. Model., vol. 62, no. 12, pp. 2943–2950, 2022.
V. Bagal, R. Aggarwal, P. K. Vinod, and U. D. Priyakumar, MolGPT: Molecular generation using a transformer-decoder model, J. Chem. Inf. Model., vol. 62, no. 9, pp. 2064–2076, 2022.
M. Langevin, H. Minoux, M. Levesque, and M. Bianciotto, Scaffold-constrained molecular generation, J. Chem. Inf. Model., vol. 60, no. 12, pp. 5637–5646, 2020.
J. Zhang and H. Chen, De novo molecule design using molecular generative models constrained by ligand-protein interactions, J. Chem. Inf. Model., vol. 62, no. 14, pp. 3291–3306, 2022.
J. He, H. You, E. Sandström, E. Nittinger, E. J. Bjerrum, C. Tyrchan, W. Czechtizky, and O. Engkvist, Molecular optimization by capturing chemist’s intuition using deep neural networks, J. Cheminform., vol. 13, no. 1, p. 26, 2021.
J. He, E. Nittinger, C. Tyrchan, W. Czechtizky, A. Patronov, E. J. Bjerrum, and O. Engkvist, Transformer-based molecular optimization beyond matched molecular pairs, J. Cheminform., vol. 14, no. 1, p. 18, 2022.
G. R. Bickerton, G. V. Paolini, J. Besnard, S. Muresan, and A. L. Hopkins, Quantifying the chemical beauty of drugs, Nat. Chem., vol. 4, no. 2, pp. 90–98, 2012.
K. Preuer, P. Renz, T. Unterthiner, S. Hochreiter, and G. Klambauer, Fréchet ChemNet distance: A metric for generative models for molecules in drug discovery, J. Chem. Inf. Model., vol. 58, no. 9, pp. 1736–1741, 2018.
N. Brown, M. Fiscato, M. H. S. Segler, and A. C. Vaucher, GuacaMol: Benchmarking models for de novo molecular design, J. Chem. Inf. Model., vol. 59, no. 3, pp. 1096–1108, 2019.
D. Polykovskiy, A. Zhebrak, B. Sanchez-Lengeling, S. Golovanov, O. Tatanov, S. Belyaev, R. Kurbanov, A. Artamonov, V. Aladinskiy, M. Veselov, et al., Molecular Sets (MOSES): A benchmarking platform for molecular generation models, Front. Pharmacol., vol. 11, p. 565644, 2020.
A. Gaulton, L. J. Bellis, A. P. Bento, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey, D. Michalovich, B. Al-Lazikani, et al., ChEMBL: a large-scale bioactivity database for drug discovery, Nucleic Acids Res., vol. 40, no. D1, pp. D1100–D1107, 2012.
A. Dalke, J. Hert, and C. Kramer, mmpdb: An open-source matched molecular pair platform for large multiproperty data sets, J. Chem. Inf. Model., vol. 58, no. 5, pp. 902–910, 2018.
D. Weininger, SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules, J. Chem. Inf. Comput. Sci., vol. 28, no. 1, pp. 31–36, 1988.
K. Yang, K. Swanson, W. G. Jin, C. Coley, P. Eiden, H. Gao, A. Guzman-Perez, T. Hopper, B. Kelley, M. Mathea, et al., Analyzing learned molecular representations for property prediction, J. Chem. Inf. Model., vol. 59, no. 8, pp. 3370–3388, 2019.
S. Turk, B. Merget, F. Rippmann, and S. Fulle, Coupling matched molecular pairs with machine learning for virtual compound optimization, J. Chem. Inf. Model., vol. 57, no. 12, pp. 3079–3085, 2017.
D. Mendez, A. Gaulton, A. P. Bento, J. Chambers, M. De Veij, E. Félix, M. P. Magariños, J. F. Mosquera, P. Mutowo, M. Nowotka, et al., ChEMBL: Towards direct deposition of bioassay data, Nucleic Acids Res., vol. 47, no. D1, pp. D930–D940, 2019.
J. G. Cumming, A. M. Davis, S. Muresan, M. Haeberlein, and H. Chen, Chemical predictive modelling to improve compound quality, Nat. Rev. Drug Discov., vol. 12, no. 12, pp. 948–962, 2013.
F. W. Scholz and M. A. Stephens, K-sample Anderson-darling tests, J. Am. Stat. Assoc., vol. 82, no. 399, pp. 918–924, 1987.
J. B. Dressman and C. Reppas, In vitro-in vivo correlations for lipophilic, poorly water-soluble drugs, Eur. J. Pharm. Sci., vol. 11, no. S2, pp. S73–S80, 2000.
851
Views
187
Downloads
3
Crossref
3
Web of Science
3
Scopus
0
CSCD
Altmetrics
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).