|
ABSTRACT:
|
The purpose of our study
was to develop a phylogenetic-based (tree-parsing)
algorithm for automated gene function annotation. Computer-based functional
annotations of molecular sequences (DNA and proteins) based on sequence
similarity matching and pattern recognition have
proven to be powerful tools in molecular biology. The BLAST algorithm, in
particular, has been used to annotate literally millions of genes and has
been incredibly successful at identifying the biological function of numerous
sequences. However, similarity searching algorithms, by themselves, cannot
distinguish orthologous sequences (related through
common ancestry) from paralogous sequences
(similarity due to an ancestral duplication event). Using an example with
bacterial porin genes, we show how reliance on
automated BLAST searches can lead to extreme confusion and extensive mis-annotation of bacterial sequences. Our phylogenetic analysis of class 1 porin
genes found numerous instances of incorrectly annotated sequences.
Interestingly, this problem was not always solved through comparative
analysis of gene position in related bacterial genomes, and we found strong
evidence of allele swapping. Not only was our phylogenetic
analysis able to greatly improve the quality of functional annotations, but
we also uncovered a new potential functional class of porin
proteins. Based on our successful phylogenetic
analysis of porin sequences, we have developed an
algorithm for automated phylogenetic functional
annotation of sequences. We show that this algorithm works extremely well for
porin annotation and we demonstrate how it might be
applied to functional annotation of other genes.
|