Let me start this with a happy note. I am really glad for my proposal for “Estimators of large-scale sparse Gaussian” has been accepted in Google summer of code, 2013. I’m really looking forward to an awesome summer. I promised myself to write more often regarding this, but I’ve been really busy with a task that was needed for this project. I hope I can include some of that experience in this post too.
Some theoretical background about the project –
The aim of this project is to estimate log-determinant (up to an arbitrary precision) of a very large sparse precision matrix (inverse of covariance matrix) that arises in the log-likelihood expression of a multivariate Gaussian distribution. A direct method (like the one that I already added there in Shogun, CStatistics::log_det()) relies on Cholesky factorization of the matrix, which, in practice, may not be that sparse and often cannot be even stored in the memory and hence are not feasible in most of the practical scenarios . The idea of this project borrows a concept from complex analysis (Cauchy’s integral formula for matrix functions) which represents a matrix function using a contour integral in the complex plane. Rational approximation of this integral for matrix function (logarithm of a matrix in this case) times a vector leads to a shifted family of linear systems with complex shifts, weights and constant, which can be estimated up to an arbitrary precision.
The task for estimating log-determinant here then becomes to sample the trace of the log of matrix using a set of vectors (called probing vectors, generated using greedy graph coloring), and in that expression, fit the log-matrix times vector using the rational approximation formula,
There has been quite a few changes in the design and structure of the framework than I have initially had in mind. Heiko suggested some really cool ideas regarding the way it should work.