Detecting Novel Associations in Large Data Sets
Broad Institute · University of Oxford · +7 more institutions
Abstract
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut…
Citation impact
- FWCI
- 145.93
- Percentile
- 100%
- References
- 51
Authors
9- DNDavid N. ReshefCorresponding
Broad Institute, University of Oxford, IIT@MIT, Massachusetts Institute of Technology
- YRYakir ReshefCorresponding
Broad Institute, Harvard College Observatory
- HKHilary K. Finucane
Weizmann Institute of Science
- SRSharon R. Grossman
Broad Institute, Harvard University, Center for Systems Biology
- GMGil McVean
Centre for Human Genetics, University of Oxford
Topics & keywords
- Computer science
- Data mining