Network Guided Feature Selection

Network-Guided Feature Selection

Here, I will present two novel algorithms that we have developed to perform network-guided feature selection using graph-cuts. The first version is a single-task feature selection method, whereas the second version also can use multiple correlated tasks to boost the performance for correlated target variables.

Method 1: SConES

SConES, for „Selecting Connected Explanatory SNPs“, is a network-guided multi-locus association mapping method. It allows for the discovery of genetic loci that are maximally associated with a phenotype, and tend to be connected on an underlying network. This network can be constructed from a gene-gene interaction network (based on proximity), or in any way such that you believe that neighboring SNPs should tend to be selected together.

The method can also be applied to other feature selection tasks that are based on an underlying network, such as social network analysis.

We provide several versions of this method in different languages:

  • Matlab implementation, which contains the optimisation function for solving SConES optimisation problem: Matlab Code
  • C/C++ implementation with Python interfaces tailored for genetics data in the easyGWASCore framework: easyGWASCore
  • In the sfan (selecting features as (network) nodes) library: sfan
Publication
  • Efficient network-guided multi-locus association mapping with graph cuts. 
    CA Azencott, D Grimm, M Sugiyama, Y Kawahara and K Borgwardt
    Bioinformatics 2013, 29(13):i171-i179

Method 2: Multi-SConES

A multi-task version of SConES, which achieves multi-task feature selection coupled with multiple network regularizers using a maximum-flow algorithm.

Again, we provide several versions of this method in different languages:

  • R implementation for solving the optimisation problem: R Code 
  • C/C++ implementation with Python interfaces tailored for genetics data in the easyGWASCore framework: easyGWASCore
Publication
  • Multi-task feature selection with multiple networks via maximum flows. 
    M Sugiyama, CA Azencott, D Grimm, Y Kawahara and K Borgwardt
    SIAM International Conference on Data Mining (SDM 2014)