Combining IC 50 or K i Values from Different Sources Is a Source of Significant Noise
Indexed incrossrefpubmed
Abstract
As part of the ongoing quest to find or construct large data sets for use in validating new machine learning (ML) approaches for bioactivity prediction, it has become distressingly common for researchers to combine literature IC50 data generated using different assays into a single data set. It is well-known that there are many situations where this is a scientifically risky thing to do, even when the assays are against exactly the same target, but the risks of assays being incompatible are even higher when pulling data from large collections of literature data like ChEMBL. Here, we estimate the amount of noise present in combined data sets using cases where measurements for the same compound are reported in…
Citation impact
135
total citations
- FWCI
- 44.61
- Percentile
- 100%
- References
- 14
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- chEMBL
- Metadata
- Set (abstract data type)
- Noise (video)
- Computer science
- Data curation
- Data mining
- Order (exchange)
No related works found for this paper.