Feature selection,  L  1  vs.  L  2  regularization, and rotational invariance

Ng, Andrew Y.

doi:10.1145/1015330.1015435

articleJan 1, 2004Closed access

Feature selection, L 1 vs. L 2 regularization, and rotational invariance

AYAndrew Y. Ng

Stanford University

Indexed incrossref

Abstract

We consider supervised learning in the presence of very many irrelevant features, and study two different regularization methods for preventing overfitting. Focusing on logistic regression, we show that using L1 regularization of the parameters, the sample complexity (i.e., the number of training examples required to learn "well,") grows only logarithmically in the number of irrelevant features. This logarithmic rate matches the best known bounds for feature selection, and indicates that L1 regularized logistic regression can be effective even if there are exponentially many irrelevant features as there are training examples. We also give a lower-bound showing that any rotationally invariant…

Citation impact

1,583

total citations

FWCI: 30.06
Percentile: 100%
References: 29

Citations per year

Authors

1

AY
Andrew Y. NgCorresponding
Stanford University

Topics & keywords

Topics

Keywords

Overfitting
Regularization (linguistics)
Logarithm
Logistic regression
Feature selection
Pattern recognition (psychology)
Artificial intelligence
Mathematics

No related works found for this paper.

Funding

DA
Defense Advanced Research Projects Agency