SELFIES and the future of molecular string representations
Max Planck Institute for the Science of Light · Fordham University · +20 more institutions
Abstract
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings-most pertinently, most combinations of symbols lead to invalid results with no valid…
Citation impact
- FWCI
- 17.78
- Percentile
- 100%
- References
- 198
Authors
31Topics & keywords
- Cheminformatics
- Computer science
- String (physics)
- Artificial intelligence
- Interpretability
- Context (archaeology)
- Popularity
- Representation (politics)
- Quality Education
Funding
- NSNational Science FoundationAwards: 2037745, DMR-1928882
- UDU.S. Department of Energy
- AWAmazon Web Services
- DFDeutsche ForschungsgemeinschaftAward: NFDI4-1
- SNSchweizerischer Nationalfonds zur Förderung der Wissenschaftlichen ForschungAward: 191127
- FDFundação de Amparo à Pesquisa do Estado de São PauloAwards: 2021/, 2021/01633-3
- CDCoordenação de Aperfeiçoamento de Pessoal de Nível Superior
- ASAustrian Science FundAward: J4309
- UWUniversität Wien
- CNConsejo Nacional de Ciencia y TecnologíaAward: CVU 105568
- UOUniversity of Toronto
- NCNational Centre of Competence in Research RoboticsAward: P2ELP2_195155
- NINational Institutes of HealthAward: R35GM137966
- OOOffice of Science
- H2Horizon 2020 Framework Programme
- NSNatural Sciences and Engineering Research Council of Canada
- EREuropean Research Council
- H2Horizon 2020Award: 666983
- NINational Institute of General Medical Sciences