Weighted log-likelihood
In the previous example, we have considered a single log-likelihood for both labeled and unlabeled samples:
This is equivalent to saying that we trust the unlabeled points just like the labeled ones. However, in some contexts, this assumption can lead to completely wrong estimations, as shown in the following graph:
Biased final Gaussian mixture configuration
In this case, the means and covariance matrices of both Gaussian distributions have been biased by the unlabeled points and the resulting density estimation is clearly wrong. When this phenomenon happens, the best thing to do is to consider a double weighted log-likelihood. If the first N samples are labeled and the following M are unlabeled, the log-likelihood can be expressed as follows:
In the previous formula, the term λ, if less than 1, can underweight the unlabeled terms, giving more importance to the labeled dataset. The modifications to the algorithm are trivial because each unlabeled weight has to be scaled according to λ, reducing its estimated probability. In Semi-Supervised Learning, Chapelle O., Schölkopf B., Zien A., (edited by), The MIT Press, the reader can find a very detailed discussion about the choice of λ. There are no golden rules; however, a possible strategy could be based on the cross-validation performed on the labeled dataset. Another (more complex) approach is to consider different increasing values of λ and pick the first one where the log-likelihood is maximum. I recommend the aforementioned book for further details and strategies.