articleJournal of Chemical Information and ModelingFeb 23, 2024HYBRID OA

Combining IC 50 or K i Values from Different Sources Is a Source of Significant Noise

ETH Zurich

PubMed
Indexed incrossrefpubmed

Abstract

As part of the ongoing quest to find or construct large data sets for use in validating new machine learning (ML) approaches for bioactivity prediction, it has become distressingly common for researchers to combine literature IC50 data generated using different assays into a single data set. It is well-known that there are many situations where this is a scientifically risky thing to do, even when the assays are against exactly the same target, but the risks of assays being incompatible are even higher when pulling data from large collections of literature data like ChEMBL. Here, we estimate the amount of noise present in combined data sets using cases where measurements for the same compound are reported in…

Citation impact

135
total citations
FWCI
44.61
Percentile
100%
References
14
Citations per year

Authors

2

Topics & keywords

Keywords
  • chEMBL
  • Metadata
  • Set (abstract data type)
  • Noise (video)
  • Computer science
  • Data curation
  • Data mining
  • Order (exchange)
No related works found for this paper.