MULTIVARIATE ANALYSIS OF METAGENOMES – AN UNDERGRADUATE REU STORY
TITLE:
MULTIVARIATE ANALYSIS OF METAGENOMES – AN UNDERGRADUATE REU STORY
DATE:
Friday, October 23rd, 2009
TIME:
3:30 PM
LOCATION:
GMCS 214
SPEAKER:
Elizabeth Dinsdale, PhD,
Assistant Professor,
Biology Department,
San Diego State University
ABSTRACT:
Microbial activity shapes the health of individual organisms, entire ecosystems and the planet. Metagenomes, which are random samples of the microbial genomes within an environment, are constructed to explore variations in microbial activities. Bioinformatics analysis of the metagenome sequences provides a description of the metabolic processes that are important for the growth and survival of the microbes in any given environment. The number of metagenomes is increasing exponentially, making it challenging to analyze and present biological interpretations across all datasets. To address this problem seven Math REU summer students conducted a statistical comparison across 203 metagenomes, a dataset consisting of about 2 billion base pairs (bp) of DNA sequences. In this talk, I will discuss their statistical analyses, which included both supervised and unsupervised techniques. The former required input from the researcher to obtain groupings, whereas in the later the grouping is provided by the statistical analyses. We demonstrated that the combination of determining group size using the K-mean silhouette analysis, identifying important variables using the random forest variable plot, and clustering using a canonical discriminant analysis explained 91.2 % of the variance with an error rate of 12.5 %. Our results showed that the metabolic profile provide by metagenomes are highly accurate in distinguishing the activity of microbial communities from different environments.
HOST:
Robert Edwards
DOWNLOAD: