AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Molecular Generation and Optimization of Molecular Properties Using a Transformer Model

School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Show Author Information

Abstract

Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery, which requires the optimization of a specific objective based on satisfying chemical rules. Herein, we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated molecule. The Matched Molecular Pairs (MMPs), which contain the source and target molecules, are used herein, and logD and solubility are selected as the optimization properties. The main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix dimension. Threshold intervals and state changes are then used to encode logD and solubility for subsequent tests. During the experiments, we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12365, 1503, and 1570 MMPs as the training, validation, and test sets, respectively. Transformer models are compared with the baseline models with respect to their abilities to generate molecules with specific properties. Results show that the transformer model can accurately optimize the source molecules to satisfy specific properties.

References

[1]

P. G. Polishchuk, T. I. Madzhidov, and A. Varnek, Estimation of the size of drug-like chemical space based on GDB-17 data, J. Comput. Aided Mol. Des., vol. 27, no. 8, pp. 675–679, 2013.

[2]

S. Heller, A. McNaught, S. Stein, D. Tchekhovskoi, and I. Pletnev, InChI − the worldwide chemical structure identifier standard, J. Cheminform., vol. 5, no. 1, p. 7, 2013.

[3]
N. M. O’Boyle and A. Dalke, DeepSMILES: An adaptation of SMILES for use in machine-learning of chemical structures. doi:10.26434/chemrxiv.7097960.
[4]

M. Krenn, F. Häse, A. K. Nigam, P. Friederich, and A. Aspuru-Guzik, Self-Referencing Embedded Strings (SELFIES): A 100% robust molecular string representation, Mach. Learn. Sci. Technol., vol. 1, no. 4, p. 045024, 2020.

[5]
E. J. Bjerrum and R. Threlfall, Molecular generation with recurrent neural networks (RNNs), arXiv preprint arXiv: 1705.04612, 2017.
[6]

A. Gupta, A. T. Müller, B. J. H. Huisman, J. A. Fuchs, P. Schneider, and G. Schneider, Generative recurrent networks for de novo drug design, Mol. Inform., vol. 37, nos. 1&2, p. 1700111, 2018.

[7]

M. H. S. Segler, T. Kogej, C. Tyrchan, and M. P. Waller, Generating focused molecule libraries for drug discovery with recurrent neural networks, ACS Cent. Sci., vol. 4, no. 1, pp. 120–131, 2018.

[8]

R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-Lobato, B. Sánchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, and A. Aspuru-Guzik, Automatic chemical design using a data-driven continuous representation of molecules, ACS Cent. Sci., vol. 4, no. 2, pp. 268–276, 2018.

[9]

J. Lim, S. Ryu, J. W. Kim, and W. Y. Kim, Molecular generative model based on conditional variational autoencoder for de novo molecular design, J. Cheminform., vol. 10, p. 31, 2018.

[10]
M. J. Kusner, B. Paige, and J. M. Hernández-Lobato, Grammar variational autoencoder, in Proc. 34 th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 1945–1954.
[11]
H. Dai, Y. Tian, B. Dai, S. Skiena, and L. Song, Syntax-directed variational autoencoder for molecule generation, in Proc. Int. Conf. Learning Representations, https://doi.org/10.48550/arXiv.1802.08786, 2018.
[12]
Q. Liu, M. Allamanis, M. Brockschmidt, and A. L. Gaunt, Constrained graph variational autoencoders for molecule design, in Proc. 32 nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 7806–7815.
[13]
W. Jin, R. Barzilay, and T. Jaakkola, Junction tree variational autoencoder for molecular graph generation, in Proc. 35 th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 2323–2332.
[14]
M. Simonovsky and N. Komodakis, GraphVAE: Towards generation of small graphs using variational autoencoders, in Proc. 27 th Int. Conf. Artificial Neural Networks, Rhodes, Greece, 2018, pp. 412–422.
[15]
G. L. Guimaraes, B. Sanchez-Lengeling, C. Outeiral, P. L. C. Farias, and A. Aspuru-Guzik, Objective-reinforced generative adversarial networks (ORGAN) for sequence generation models, arXiv preprint arXiv: 1705.10843, 2018.
[16]

E. Putin, A. Asadulaev, Y. Ivanenkov, V. Aladinskiy, B. Sanchez-Lengeling, A. Aspuru-Guzik, and A. Zhavoronkov, Reinforced adversarial neural computer for de novo molecular design, J. Chem. Inf. Model., vol. 58, no. 6, pp. 1194–1204, 2018.

[17]

E. Putin, A. Asadulaev, Q. Vanhaelen, Y. Ivanenkov, A. V. Aladinskaya, A. Aliper, and A. Zhavoronkov, Adversarial threshold neural computer for molecular de novo design, Mol. Pharm., vol. 15, no. 10, pp. 4386–4397, 2018.

[18]
N. De Cao and T. Kipf, MolGAN: An implicit generative model for small molecular graphs, arXiv preprint arXiv: 1805.11973, 2022.
[19]
L. Dinh, D. Krueger, and Y. Bengio, NICE: Non-linear independent components estimation, arXiv preprint arXiv: 1410.8516, 2015.
[20]
L. Dinh, J. Sohl-Dickstein, and S. Bengio, Density estimation using real NVP, arXiv preprint arXiv: 1605.08803, 2017.
[21]
D. P. Kingma and P. Dhariwal, Glow: Generative flow with invertible 1x1 convolutions, arXiv preprint arXiv: 1807.03039, 2018.
[22]

M. Lee and K. Min, MGCVAE: Multi-objective inverse design via molecular graph conditional variational autoencoder, J. Chem. Inf. Model., vol. 62, no. 12, pp. 2943–2950, 2022.

[23]
C. Li, J. Yao, W. Wei, Z. Niu, X. Zeng, J. Li, and J. Wang, Geometry-based molecular generation with deep constrained variational autoencoder, IEEE Trans. Neural Netw. Learn. Syst. doi: 10.1109/TNNLS.2022.3147790.
[24]
C. Ma and X. Zhang, GF-VAE: A flow-based variational autoencoder for molecule generation, in Proc. 30 th ACM Int. Conf. Information & Knowledge Management, Virtual Event, Queensland, Australia, 2021, pp. 1181–1190.
[25]
S. Luo, J. Guan, J. Ma, and J. Peng, A 3D generative model for structure-based drug design, arXiv preprint arXiv: 2203.10446, 2022.
[26]

V. Bagal, R. Aggarwal, P. K. Vinod, and U. D. Priyakumar, MolGPT: Molecular generation using a transformer-decoder model, J. Chem. Inf. Model., vol. 62, no. 9, pp. 2064–2076, 2022.

[27]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need, in Proc. 31 st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017. pp. 6000–6010.
[28]

M. Langevin, H. Minoux, M. Levesque, and M. Bianciotto, Scaffold-constrained molecular generation, J. Chem. Inf. Model., vol. 60, no. 12, pp. 5637–5646, 2020.

[29]

J. Zhang and H. Chen, De novo molecule design using molecular generative models constrained by ligand-protein interactions, J. Chem. Inf. Model., vol. 62, no. 14, pp. 3291–3306, 2022.

[30]

J. He, H. You, E. Sandström, E. Nittinger, E. J. Bjerrum, C. Tyrchan, W. Czechtizky, and O. Engkvist, Molecular optimization by capturing chemist’s intuition using deep neural networks, J. Cheminform., vol. 13, no. 1, p. 26, 2021.

[31]

J. He, E. Nittinger, C. Tyrchan, W. Czechtizky, A. Patronov, E. J. Bjerrum, and O. Engkvist, Transformer-based molecular optimization beyond matched molecular pairs, J. Cheminform., vol. 14, no. 1, p. 18, 2022.

[32]

G. R. Bickerton, G. V. Paolini, J. Besnard, S. Muresan, and A. L. Hopkins, Quantifying the chemical beauty of drugs, Nat. Chem., vol. 4, no. 2, pp. 90–98, 2012.

[33]

K. Preuer, P. Renz, T. Unterthiner, S. Hochreiter, and G. Klambauer, Fréchet ChemNet distance: A metric for generative models for molecules in drug discovery, J. Chem. Inf. Model., vol. 58, no. 9, pp. 1736–1741, 2018.

[34]
T. Fu, C. Xiao, and J. Sun, CORE: Automatic molecule optimization using copy & refine strategy, Proc. AAAI Conf. Artif. Intell., vol. 34, no. 1, pp. 638–645, 2020.
[35]

N. Brown, M. Fiscato, M. H. S. Segler, and A. C. Vaucher, GuacaMol: Benchmarking models for de novo molecular design, J. Chem. Inf. Model., vol. 59, no. 3, pp. 1096–1108, 2019.

[36]

D. Polykovskiy, A. Zhebrak, B. Sanchez-Lengeling, S. Golovanov, O. Tatanov, S. Belyaev, R. Kurbanov, A. Artamonov, V. Aladinskiy, M. Veselov, et al., Molecular Sets (MOSES): A benchmarking platform for molecular generation models, Front. Pharmacol., vol. 11, p. 565644, 2020.

[37]
D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv: 1409.0473, 2016.
[38]

A. Gaulton, L. J. Bellis, A. P. Bento, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey, D. Michalovich, B. Al-Lazikani, et al., ChEMBL: a large-scale bioactivity database for drug discovery, Nucleic Acids Res., vol. 40, no. D1, pp. D1100–D1107, 2012.

[39]

A. Dalke, J. Hert, and C. Kramer, mmpdb: An open-source matched molecular pair platform for large multiproperty data sets, J. Chem. Inf. Model., vol. 58, no. 5, pp. 902–910, 2018.

[40]

D. Weininger, SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules, J. Chem. Inf. Comput. Sci., vol. 28, no. 1, pp. 31–36, 1988.

[41]

K. Yang, K. Swanson, W. G. Jin, C. Coley, P. Eiden, H. Gao, A. Guzman-Perez, T. Hopper, B. Kelley, M. Mathea, et al., Analyzing learned molecular representations for property prediction, J. Chem. Inf. Model., vol. 59, no. 8, pp. 3370–3388, 2019.

[42]

S. Turk, B. Merget, F. Rippmann, and S. Fulle, Coupling matched molecular pairs with machine learning for virtual compound optimization, J. Chem. Inf. Model., vol. 57, no. 12, pp. 3079–3085, 2017.

[43]

D. Mendez, A. Gaulton, A. P. Bento, J. Chambers, M. De Veij, E. Félix, M. P. Magariños, J. F. Mosquera, P. Mutowo, M. Nowotka, et al., ChEMBL: Towards direct deposition of bioassay data, Nucleic Acids Res., vol. 47, no. D1, pp. D930–D940, 2019.

[44]
M. Swain, MolVS: Molecule validation and standardization, https://pypi.org/project/Molvs, 2018.
[45]

J. G. Cumming, A. M. Davis, S. Muresan, M. Haeberlein, and H. Chen, Chemical predictive modelling to improve compound quality, Nat. Rev. Drug Discov., vol. 12, no. 12, pp. 948–962, 2013.

[46]

F. W. Scholz and M. A. Stephens, K-sample Anderson-darling tests, J. Am. Stat. Assoc., vol. 82, no. 399, pp. 918–924, 1987.

[47]

J. B. Dressman and C. Reppas, In vitro-in vivo correlations for lipophilic, poorly water-soluble drugs, Eur. J. Pharm. Sci., vol. 11, no. S2, pp. S73–S80, 2000.

Big Data Mining and Analytics
Pages 142-155
Cite this article:
Xu Z, Lei X, Ma M, et al. Molecular Generation and Optimization of Molecular Properties Using a Transformer Model. Big Data Mining and Analytics, 2024, 7(1): 142-155. https://doi.org/10.26599/BDMA.2023.9020009

681

Views

144

Downloads

1

Crossref

1

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 06 January 2023
Revised: 27 March 2023
Accepted: 04 May 2023
Published: 25 December 2023
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return