FINDING A NOVEL WAY FOR FAST SEQUENCE ALIGNMENT AND EXPLOITING INFORMATION THEORY IN BACTERIAL GENOMES AND COMPLETE PHAGES
TITLE:
FINDING A NOVEL WAY FOR FAST SEQUENCE ALIGNMENT AND EXPLOITING
INFORMATION THEORY IN BACTERIAL GENOMES AND COMPLETE PHAGES
DATE:
Friday, Aug 30th, 2013
TIME:
3:30 PM
LOCATION:
GMCS 214
SPEAKER:
Sajia Akhter.
Computational Science Research Center at SDSU.
ABSTRACT:
The invention of next generation sequencing technology (NGS) provides
the capability of generating high throughput low cost sequencing data,
and is used by scientists to address a diverse range of biological
problems. Several data analysis algorithms have been developed in last
few years to best exploit NGS data. New tools and methods have also
been implemented for better understanding of these data.
This talk presents several novel techniques involving NGS
datasets. The first technique, qudaich is a novel sequence aligner,
which can be used as a key part of NGS data analysis. Qudaich
generates the pairwise local alignments of a query dataset against a
database. Qudaich can efficiently process large volumes of data and is
well suited to the next generation reads datasets. This aligner can
also handle both DNA and protein sequences and tries to generate the
best possible alignment for each query sequence. In contrast to other
contemporary aligners, qudaich is more efficient in terms of execution
time and accuracy.
Next, in this talk, I show different ways to extract useful
genomic information from NGS data, which, in turn, shows promising
directions to solve some of the existing biological problems like
prophage prediction. Prophages are viruses that integrated into, and
replicated as part of, the bacterial genome. These genetic elements
can have tremendous impact on their hosts. The majority of other phage
finding tools mainly rely on homology-based approach for prophage
prediction, which limits the de novo discovery of novel prophages.
This work also presents a novel algorithm, PhiSpy to predict
prophages in bacterial genomes. PhiSpy combines similarity based and
composition based strategies to identify prophages. It finds 94% of
the known prophages in 50 complete bacterial genomes with a 6% false
negative rate and a 0.66% false positive rate. This led to a
successful prediction of the largest set of prophages comparing to
other prophage finding applications.
Finally, this work also demonstrates that information theory
can be effectively applied to find informative sequences, to predict
the lifestyle restrictions of an organism, and to analyze the
deviation of the amino acid utilization profile in different metabolic
processes in different organisms.
Together, these tools will enable the next generation of sequence
analyses using next generation sequence data.
HOST:
Dr. Jose Castillo.
DOWNLOAD: