Gene evolution in genomic context: Integrating genomic location into gene evolution models (Visiting Scholar)
One of the surprising discoveries we learned from the whole genome sequencing of multiple species was the extent of structural rearrangements between the genomes. Different evolutionary forces can shape the non-random organization of genes within a genome, and the physical location can in turn affect the fate of a gene. But even with the completion of several closely related genomes, the statistical modeling of genome rearrangements still appears to be a formidable task. I propose that a feasible approach to the problem is to start from the genes and study how homologous genes are created, clustered, and dispersed in the genome. By learning how each gene family is distributed in the genome, we can learn how whole genomes have come into shape. I plan to model the gene duplication, loss, and transpositions using the birth- death-migration (BDM) process developed in demography. By integrating the BDM process into existing models of gene family evolution, we will be able to infer the sequence substitutions, the duplications and losses, and the transpositions of genes simultaneously. The integrated model can infer correct reconciliations in cases where the sequence and gene tree alone is not informative, and it will also improve the identification of gene transpositions by utilizing the genetic distance between genes. More importantly, genome-wide identification of gene transpositions will offer insight into the non-random organization of genes within a genome.