Significant Pattern Mining

Significant Pattern Mining

We developed a novel method in the relatively new field of significant pattern mining. This method is able to detect significant patterns in high dimensional datasets while being runtime efficient and statistically sound. The algorithm can be applied to collections of sequences and allow to account for dependencies between objects and to control the Family-Wise error rate.

Genome-wide detection of intervals of genetic heterogeneity associated with complex traits

Motivation:
 Genetic heterogeneity, the fact that several sequence variants give rise to the same phenotype, is a phenomenon that is of the utmost interest in the analysis of complex phenotypes. Current approaches for finding regions in the genome that exhibit genetic heterogeneity suffer from at least one of two shortcomings: (i) they require the definition of an exact interval in the genome that is to be tested for genetic heterogeneity, potentially missing intervals of high relevance, or (ii) they suffer from an enormous multiple hypothesis testing problem due to the large number of potential candidate intervals being tested, which results in either many false positives or a lack of power to detect true intervals.
Results: Here, we present an approach that overcomes both problems: it allows one to automatically find all contiguous sequences of single nucleotide polymorphisms in the genome that are jointly associated with the phenotype. It also solves both the inherent computational efficiency problem and the statistical problem of multiple hypothesis testing, which are both caused by the huge number of candidate intervals. We demonstrate on Arabidopsis thaliana genome-wide association study data that our approach can discover regions that exhibit genetic heterogeneity and would be missed by single-locus mapping.
Conclusions: Our novel approach can contribute to the genome-wide discovery of intervals that are involved in the genetic heterogeneity underlying complex phenotypes.

Efficient C/C++ code can be downloaded here: C/C++ Code

Publication
  • Genome-wide detection of intervals of genetic heterogeneity associated with complex traits.
    F Llinares-López, DG Grimm, DA Bodenham, U Gieraths, M Sugiyama, B Rowan, KM Borgwardt 
    Bioinformatics (2015) 31(12):i303-i310