Improved probabilistic models of insertion/deletion for phylogenetic inference (Visiting Scholar)
This resource has been withdrawn from the Duke Digital Repository.
NESCent materials were deaccessioned from the Duke Digital Repository in March 2026. Contact Hilmar Lapp (hilmar.lapp@duke.edu) with questions about NESCent.
- Title:
- Improved probabilistic models of insertion/deletion for phylogenetic inference (Visiting Scholar)
- Permalink:
- https://idn.duke.edu/ark:/87924/r4ww6c
- Temporal:
- 2009-2012
- Creator:
- Redelings, Benjamin
- Type:
- Collection
- Description:
- Project
- Abstract:
- NESCent Project: Recent advances in statistical methodology allow phylogeny inference to make use of information in insertions and deletions, and to average over uncertainty in multiple sequence alignments. However, the accuracy of these methods could be improved by including some key features of the biological process that generates insertion and deletion mutations (indels). Two of these features are (I) spatial variation in the rate of insertion and deletion, and (II) higher rates for variation in the number of tandem repeats (VNTR). Ignoring spatial variation in insertion/deletion rates can decrease phylogenetic accuracy because the evidential weight of a shared indel is determined by the local indel rate. Proteins have higher indels rates in regions that are exposed to solvent, and so such indels should be down-weighted relative to indels that occur in the hydrophobic core. Additionally, when handling nearly-neutral sequences such as inter-genic spacers, ignoring VNTR mutations can undermine phylogeny inference by giving shared changes of these types too much weight. I propose to extend the software BAli-Phy which jointly estimates alignments and phylogenies to handle indel hotspots. I have developed a simple transducer-based model for multiple alignments that allows each column to fall into a fast or slow rate category and clusters fast columns together. I am developing MCMC transition kernels to simultaneously Gibbs sample of alignments and column labels. Additionally, I plan to use importance sampling on posterior samples from BAli-Phy to correctly weight VNTR mutations. I will then estimate indel rate heterogeneity and VNTR rate increase in several data sets.
- Identifier:
-
- Subject:
-