We have long expected that complete genomic information will provide us with the tools to determine relationships among organisms, from strains of pathogens to the whole tree of life. However, while whole genome sequencing is now possible, extracting information from these data remains challenging. I have developed software (SISRS) to easily identify phylogenetically informative data from raw next-generation sequence data. We have used SISRS data to show that non-coding loci provide more overall signal and a higher proportion of phylogenetic signal compared to coding loci, and different types of loci (e.g. coding vs. introns) have surprisingly consistent levels of information across time scales.
This research is funded by NSF grant DEB-2100217 to R. Schwartz