Protist Genomics Project

In this genomics project we will sequence about 30,000 randomly-selected cDNA, or genomic clones from three diverse, and likely early-diverging protists: Spironucleus barkhanus (Atlantic salmon arasite), Trichomonas vaginalis (human parasite) and Naegleria gruberi (soil amoebo-flagellate). We will sequence 5000 Expressed Sequence Tags (EST's) and 5000 Genome Survey Sequences (GSS's). This DNA sequence survey will provide data on their expression patterns and genomic organization. All of the randomly-selected, partial sequences will be compared against sequence databases to identify coding regions of evolutionary, biochemical and/or therapeutic interest. In addition to the immediate utility of identifying individual genes for further study, these data will provide information on the feasibility of more complete genome studies in these organisms. Finally, more extensive comparative sequence analyses will be carried out to: a)determine a robust phylogenitic position of these protists in the tree of eukaryotes, b)determine whether lateral gene transfer is an important force in protistan genome evolution, and c) determine the presence (or apparent absence) of introns in genes of these species and, if present, their genomic prelavence and location.

Excavates link

COWPIE Server
(password required)

Deep eukaryotic phylogeny

The development of technology for sequencing genes has revolutionized evolutionary biology. By comparing the gene sequences of distantly related organisms, we are now able to reconstruct their genealogy (phylogeny). A "tree of life" derived from comparisons of the small subunit ribosomal RNA (SSU rRNA) genes indicates that the living world can be divided into three major groups: the Eukaryota (with nucleated cells), the Archaebacteria and the Eubacteria (both with non-nucleated cells).

Eukaryotes include the multicellular kingdoms of animals, plants, fungi and a myriad of single-celled organisms, the protists. The SSU rRNA gene three of eukaryotes indicates that several protist groups with simple cell structures diverged first, followed by a series of branches leading to a vast radiation of multicellular and protistan organisms. However, this picture of early evolution is increasingly challenged by new data. Trees of genes encoding proteins conflict with the rRNA-based phylogeny in reconstructing the deepest relationships amongst eukaryotes. Moreover, recent studies indicate that the deep structure of the SSU rRNA and protein trees of eukaryotes could be dominated by artifacts arising from inadequacies in our tree-building methods. These problems hint that single gene trees of eukaryotes may be inadequate. In this proposal we will address these problems in two ways:

First, we will obtain the sequences of three additional protein genes from a variety of well-described and newly-discovered protists that are potentially deeply-branching. By combining these sequences with several other gene sequences from these organisms, we can assemble a large multi-gene dataset containing much more information than previously available. Sophisticated analyses of this information-rich dataset will then be performed to derive a more robust tree of eukaryotes.

Second, we will systematically investigate to what degree the deep structure of molecular trees of eukaryotes (and conflict between these trees) is due to tree-building artifacts. To accomplish this, we will develop and apply new computational methods to identify, and correct for, known sources of error in individual and combined gene phylogenies of eukaryotes.

These studies will lead to a better understanding of the early history of life on Earth, the causes of conflicts between deep molecular phylogenies, and the forces that influence molecular evolution on the deepest level.

Modeling Protein & Genome Evolution