Biostatistics Seminar - "Cluster analysis of genetic sequence data via the Gap Procedure"
Irene Vrbik, PhD
Post doctoral Fellow, Department of Mathematics and Statistics, Ï㽶ÊÓƵ
Cluster analysis of genetic sequence data via the Gap Procedure
ALL ARE WELCOME
Abstract:
Phylogenetic clustering typically involves estimating a phylogenetic tree and identifying groups of sequences having small genetic pairwise distances and sufficiently high clade support (either bootstrap or posterior probabilities). In this talk, we explore a simple distance-based clustering algorithm, called the Gap Procedure, which uses gaps in sorted pairwise distances to suggest a natural divide between group members and non-members. We show that the clusters found using the Gap Procedure agree closely with computationally expensive gold standard techniques on well separated groups of HIV DNA sequence data. Simulation studies are also presented to illustrate the scenarios in which this fast and easy to implement algorithm may be employed, and more importantly, when more sophisticated methods are required.
Bio: