Reoptimization of MDL Keys for Use in Drug Discovery
Information Systems Laboratories (United States)
Abstract
For a number of years MDL products have exposed both 166 bit and 960 bit keysets based on 2D descriptors. These keysets were originally constructed and optimized for substructure searching. We report on improvements in the performance of MDL keysets which are reoptimized for use in molecular similarity. Classification performance for a test data set of 957 compounds was increased from 0.65 for the 166 bit keyset and 0.67 for the 960 bit keyset to 0.71 for a surprisal S/N pruned keyset containing 208 bits and 0.71 for a genetic algorithm optimized keyset containing 548 bits. We present an overview of the underlying technology supporting the definition of descriptors and the encoding of these descriptors into…
Citation impact
- FWCI
- 8.40
- Percentile
- 100%
- References
- 34
Authors
4Topics & keywords
- Pruning
- Basis (linear algebra)
- Set (abstract data type)
- Minimum description length
- Reduction (mathematics)
- Similarity (geometry)
- Computer science
- Selection (genetic algorithm)
- Good health and well-being