Task ID
  

OTU Clustering

  
  

  

  

  

Diversity

  

  

LEfSe

  

  
  
  

  
  

  
  

Volcano plot
   

Functional annotation

  

  
  
  

  
  

  LEfSe

LDA Effect Size (LEfSe) (Segata et. al 2010) is An algorithm for High-Dimensional biomarker discovery and explanation that identifies genomic features (genes, pathways, or taxa) characterizing the differences between two or more biological conditions (or classes, see figure below). It emphasizes both statistical significance and biological relevance, allowing researchers to identify differentially abundant features that are also consistent with biologically meaningful categories (subclasses). LEfSe first robustly identifies features that are statistically different among biological classes. It then performs additional tests to assess whether these differences are consistent with respect to expected biological behavior.

Specifically, we first use the non-parametric factorial Kruskal-Wallis (KW) sum-rank test to detect features with significant differential abundance with respect to the class of interest; biological significance is subsequently investigated using a set of pairwise tests among subclasses using the (unpaired) Wilcoxon rank-sum test. As a last step, LEfSe uses Linear Discriminant Analysis to estimate the effect size of each differentially abundant feature and, if desired by the investigator, to perform dimension reduction.

LEfSe consists of six modules performing the following steps (see the figure below).

  • A) Format Data for LEfSe: selects the structure of the problem (classes, subclasses, subjects) and formats the tabular abundance data for the B module
  • B) LDA Effect Size (LEfSe): performs the analysis using the data formatted with module A and provides input for the visualization modules (C, D, E, F)
  • C) Plot LEfSe Results: graphically reports the discovered biomarkes (output of B) with their effect sizes
  • D) Plot Cladogram: graphically represents the discovered biomarkers (output of B) in a taxonomic tree specified by the hierarchical feature names (not available for non-hierarchical features)
  • E) Plot One Feature: plots the row values of a feature (biomarker or not) as an abundance histogram with classes and subclasses structure (only one feature at the time)
  • F) Plot Differential Features: plots the row values of all features (biomarkers or not) as abundance histograms with classes and subclasses structure and provides a zip archive of the figures

  Volcano plot

Volcano plot shows two important indicators (Fold change/ p-value), which can be used to screen out genes with differential expression between two samples intuitively and reasonably. Using T test analysis of significant differentially expressed genes between the two samples, with log2 (a fold change) as the abscissa, by T test significance test P values of negative logarithm - log10 (P value) as the ordinate, brings the Volcano figure (Volcano Plot), use of certain filter condition (such as more than 1.5 times change and P < 0.05), can filter out significantly differentially expressed genes, for subsequent research.

  PICRUSt

PICRUSt: Phylogenetic Investigation of Communities by Reconstruction of Unobserved States .
The PICRUSt project aims to support prediction of the unobserved character states in a community of organisms from phylogenetic information about the organisms in that community. The primary application is to predict gene family abundance (e.g. the metagenome) in environmental DNA samples for which only marker gene (e.g. 16S rRNA gene) data are available. This is an open source, international, collaborative bioinformatics project developed in the Huttenhower, Beiko, Langille, Vega Thurber, Knight and Caporaso labs.

First Step
Run 'Normalize by Copy Number' to correct your OTU table for multiple 16S copy number (this panel).
Second Step
Run the 'Predict Metagenome' on the output of the previous step (normalized OTU table) to get your metagenome predictions.
Description of PICRUSt: Normalize by Copy Number module
This module corrects the abundance of each OTU to better reflect the true organism abundance by normalizing by PICRUSt's prediction of 16S copy number for each OTU.

For more information please visit: http://picrust.github.com/

  Tax4Fun

Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data.
Tax4Fun is a open-source R package that predicts the functional capabilities of microbial communities based on 16S datasets. Tax4Fun is applicable to output as obtained from the SILVAngs web server or the application of QIIME (Caporaso et al., 2010) against the SILVA database (Quast et al., 2013).
Further, the Tax4Fun package implements the MoP-Pro approach for whole metagenome shotgun sequencing data (Aßhauer and Meinicke, 2013). MoP-Pro implements a shortcut to estimate the metabolic profile of a metagenome. The taxonomic profile of the metagenome is linked to a set of pre-computed metabolic reference profiles. The combination of the taxonomic abundance estimates, obtained through the fast method Taxy-Pro (Klingenberg et al., 2013), and the metabolic reference profiles, based on the KEGG database (Kanehisa and Goto, 2000; Kanehisa et al., 2014), achieves an unrivaled speed of the metabolic profiling approach.

For more information please visit: http://tax4fun.gobics.de/