Informative missingness and its implications in semi-supervised learning

Wu, Jinran; Wang, You‐Gan; McLachlan, Geoffrey J.

doi:10.59717/j.xinn-inform.2026.100033

preprintThe Innovation InformaticsJan 1, 2026HYBRID OA

Informative missingness and its implications in semi-supervised learning

JWJinran Wu YWYou‐Gan Wang GJGeoffrey J. McLachlan

Indexed inarxivcrossrefdatacite

Abstract

<p>Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data. It leverages information from labelled samples, whose acquisition is often costly or labour-intensive, together with unlabelled data to enhance prediction performance. This defines an incomplete-data problem, which statistically can be formulated within the likelihood framework for finite mixture models that can be fitted using the expectation-maximisation (EM) algorithm. Ideally, one would prefer a completely labelled sample, as one would anticipate that a labelled observation provides more information than an unlabelled one. However, when the mechanism governing label absence depends on the observed…

Citation impact

5

total citations

FWCI: 116.53
Percentile: 100%
References: 0

Too recent for citation history.

Authors

3

Topics & keywords

Topics

Keywords

Missing data
Classifier (UML)
Inference
Class (philosophy)
Mechanism (biology)
Statistical model
Imputation (statistics)
Statistical inference

No related works found for this paper.