Labelled Latent Dirichlet Allocation
This code implements a “soft” clustering methodology we call Labeled Latent Dirichlet Allocation (LLDA). The LLDA model is an instance of a general family of probabilistic models, known as probabilistic graphical models. Probabilistic graphical models provide a general Bayesian framework for representing joint probability distributions over collections of variables, and for computing posterior distributions on subsets of those variables.The algorithm clusters genes and experiments without requiring that a given gene or drug only appear in one cluster. The model also incorporates the functional annotation of known genes to guide the clustering procedure. The procedure is optimized to work with so-called “fitness” data where in the gene information is encoded in the growth defect of a strain deleted for that gene over a set of conditions. (PMID: 15919724).
Download llda_code.zip.