function [fu, fu_CMN] = harmonic_function(W, fl)
% [fu, fu_CMN] = harmonic_function(W, fl)
%
% Semi-supervised learning with basic harmonic function.
% That is eq (5) in "Semi-Supervised Learning Using Gaussian Fields
% and Harmonic Functions". Xiaojin Zhu, Zoubin Ghahramani, John Lafferty.
% The Twentieth International Conference on Machine Learning (ICML-2003).
%
% Input:
% W: n*n weight matrix. The first L entries(row,col) are for labeled data,
% the rest for unlabeled data. W has to be symmetric, and all
% entries has to be non-negative. Also note the graph may be disconnected,
% but each connected subgraph has to have at least one labeled point.
% This is to make sure the sub-Laplacian matrix is invertible.
% fl: L*c label matrix. Each line is for a labeled point, in
% one-against-all encoding (all zero but one 1). For example in binary
% classification each line would be either "0 1" or "1 0".
%
% Output:
% fu: The harmonic solution, i.e. eq (5) in the ICML paper.
% (n-L)*c label matrix for the unlabeled points. Each row is for
% an unlabeled point, and each column for a class. The class with the
% largest value is the predicted class of the unlabeled point.
% fu_CMN: The solution after applying Class Mass Normalization (CMN), as
% in eq (9) of the ICML paper. The class proportions are the maximum
% likelihood (frequency) estimate from labeled data fl. The CMN heuristic
% is known to sometimes improve classification accuracy. As before,
% the class with the largest value in a row is the predicted class.
%
% Note:
% If there is a warning "Matrix is singular to working precision.",
% and fu is all NaN, it usually means the original graph is disconnected,
% and some connected components have no labeled data. To solve the problem:
% 1. Try a different graph
% 2. Use a different algorithm, e.g. the Gaussian Field kernels
%
%
% Please note there is no warranty. In fact it is not even a software, but
% merely research code. Feel free to modify it for your work.
% Xiaojin Zhu, zhuxj@cs.cmu.edu
% 2004
l = size(fl, 1); % the number of labeled points
n = size(W, 1); % total number of points
% the graph Laplacian L=D-W
L = diag(sum(W)) - W;
% the harmonic function.
fu = - inv(L(l+1:n, l+1:n)) * L(l+1:n, 1:l) * fl;
% compute the CMN solution
q = sum(fl)+1; % the unnormalized class proportion estimate from labeled data, with Laplace smoothing
fu_CMN = fu .* repmat(q./sum(fu), n-l, 1);