Several well-documented evolutionary processes are known to cause conflict between species level phylogenies and gene level phylogenies. Three of the most challenging processes for species tree inference are incomplete lineage sorting, hybridisation and gene duplication, which may result in unwarranted comparisons of paralogous genes. Several existing methods have dealt with these processes but none has yet been able to untangle all three at once. Here, we propose a step-wise method by which these processes can be discerned using information on genomic location coupled with coalescent simulations. In the first step, highly discordant genes within genomic blocks (putative paralogues) are identified and excluded from the dataset and, in the second step, blocks of linked genes are grouped according to their hybrid history. Existing multispecies coalescent software can then be applied to recover the principal tree(s) that make up the species tree/network without violating the underlying model. The potential of the approach is evaluated on simulated data derived from a species network composed of nine species, of which one is of hybrid origin, and displaying a single gene duplication that leads to paralogous comparisons. We apply our method to an empirical set of 12 genes from seven species sampled in the plant genus Medicago that display phylogenetic discordance. We identify the causes of the discordance and demonstrate that the Medicago orbicularis lineage experienced an episode of ancient hybridisation. Our results show promise as a new way to explore phylogenetic sequence data that can significantly improve species tree inference in presence of hybridisation and undetected paralogy or other causes leading to extremely discordant gene trees.
↧