The Department of Energy’s Systems Biology Knowledgebase (KBase) is an ambitious program to disrupt the way we do biological systems science for prediction and engineering. It is an open-source, extensible platform that runs on DOE high-end infrastructure and supports collaborative, reproducible, reusable, and openly publishable analyses of organisms and their communities to predict and design their functions.
Visit KBase.us.


Microbes Online

A comparative functional genomic database and workbench with advanced tools for phylogenetic analysis and annotation, functional data storage/display/analysis, metabolic analysis and metafunctional genomics.
Visit Microbes Online.



A database of gene regulatory interactions in microbes based on literature and an expertly curated database of transcription factor binding sites underlying a suite of prediction tools.

Visit RegTransBase.



Capture, visualize and analyze transcription factor regulons reconstructed by a comparative genomic approach.
Visit RegPrecise.



Genomic reconstruction of transcriptional regulons in groups of closely related prokaryotic genomes. Regulons are described as gene sets with shared regulatory sites.
Visit RegPredict.



An old but flexible simulator for complex gene expression circuits. Used initially to model complex promoter, elongation, and expression dynamics of the bacteriophage lambda lysis/lysogeny switch.
Download Simulac.zip.



A now very old but still surprisingly effective correlation-based method of reconstructing molecular reaction networks from time-series data.
Download Deduce.zip

labelled latent dirichlet allocation

Labelled Latent Dirichlet Allocation

This code implements a “soft” clustering methodology we call Labeled Latent Dirichlet Allocation (LLDA). The LLDA model is an instance of a general family of probabilistic models, known as probabilistic graphical models. Probabilistic graphical models provide a general Bayesian framework for representing joint probability distributions over collections of variables, and for computing posterior distributions on subsets of those variables.The algorithm clusters genes and experiments without requiring that a given gene or drug only appear in one cluster. The model also incorporates the functional annotation of known genes to guide the clustering procedure. The procedure is optimized to work with so-called “fitness” data where in the gene information is encoded in the growth defect of a strain deleted for that gene over a set of conditions. (PMID: 15919724).
Download llda_code.zip.


Meta Microbes Online

As of October 2010, Meta.MicrobesOnline.org contained over 1600 genomes from bacterial, archaeal, and microeukaryotic isolates, offers combined phylogenetic gene tree analysis of millions of genes from over 150 ecological and organismal metagenomes. These trees are built using our FastTree program, which offers rapid highly accurate tree building, even for very large trees. Such combined analysis is superior to BLAST-based homology approaches in that trees offers the ability to place genes from environmental samples into an evolutionary context and permits more precise functional grouping within a gene family and may identify phylogenetic markers to aid in assigning the species for environmental sequence fragments, permitting the determination of which community members are responsible for which roles.
Visit meta.microbesonline.org.



The Google-Like Application for Metabolic Maps (GLAMM) has been developed to provide an interactive browsing experience with metabolic networks based on that used with web mapping technology. GLAMM is integrated with the MicrobesOnline.org web resource, allowing a researcher to access the many powerful comparative genomic and functional analysis tools present in MicrobesOnline.
Visit GLAMM.



FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory. For large alignments, FastTree is 100-1,000 times faster than PhyML 3.0 or RAxML 7. FastTree is open-source software — you can download the code below.
Visit FastTree.

d tailor


DNA-Tailor (D-Tailor) is a fully extendable software framework for biological sequence analysis and multi-objective sequence design. D-Tailor permits the seamless integration of an arbitrary number of sequence analysis tools into a Monte-Carlo algorithm that evolves synthetic sequences towards user-defined goals.
Visit D-Tailor.