preprintThe Innovation InformaticsJan 1, 2026HYBRID OA

Informative missingness and its implications in semi-supervised learning

Indexed inarxivcrossrefdatacite

Abstract

<p>Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data. It leverages information from labelled samples, whose acquisition is often costly or labour-intensive, together with unlabelled data to enhance prediction performance. This defines an incomplete-data problem, which statistically can be formulated within the likelihood framework for finite mixture models that can be fitted using the expectation-maximisation (EM) algorithm. Ideally, one would prefer a completely labelled sample, as one would anticipate that a labelled observation provides more information than an unlabelled one. However, when the mechanism governing label absence depends on the observed…

Citation impact

5
total citations
FWCI
116.53
Percentile
100%
References
0
Too recent for citation history.

Authors

3

Topics & keywords

Keywords
  • Missing data
  • Classifier (UML)
  • Inference
  • Class (philosophy)
  • Mechanism (biology)
  • Statistical model
  • Imputation (statistics)
  • Statistical inference
No related works found for this paper.