Correcting Sample Selection Bias by Unlabeled Data
Max Planck Society · Max Planck Institute for Biological Cybernetics
Abstract
We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias.Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate.We present a nonparametric method which directly produces resampling weights without distribution estimation.Our method works by matching distributions between training and testing sets in feature space.Experimental results demonstrate that our method works well in practice.
Citation impact
- FWCI
- 26.97
- Percentile
- 100%
- References
- 26
Authors
5- JHJiayuan HuangCorresponding
Max Planck Society, Max Planck Institute for Biological Cybernetics
- AJAlexander J. Smola
Max Planck Institute for Biological Cybernetics, Max Planck Society
- AGArthur Gretton
Max Planck Society, Max Planck Institute for Biological Cybernetics
- KBKarsten Borgwardt
Max Planck Society, Max Planck Institute for Biological Cybernetics
- BSBernhard Schölkopf
Max Planck Institute for Biological Cybernetics, Max Planck Society
Topics & keywords
- Selection bias
- Selection (genetic algorithm)
- Sample (material)
- Sampling bias
- Computer science
- Artificial intelligence
- Statistics
- Pattern recognition (psychology)