Detecting Risk Gene and Pathogenic Brain Region in EMCI Using a Novel GERF Algorithm Based on Brain Imaging and Genetic Data

Detecting Risk Gene and Pathogenic Brain Region in EMCI Using a Novel GERF Algorithm Based on Brain Imaging and Genetic Data

Abstract

Fusion analysis of disease-related multi-modal data is becoming increasingly important to illuminate the pathogenesis of complex brain diseases. However, owing to the small amount and high dimension of multi-modal data, current machine learning methods do not fully achieve the high veracity and reliability of fusion feature selection. In this paper, we propose a genetic-evolutionary random forest (GERF) algorithm to discover the risk genes and disease-related brain regions of early mild cognitive impairment (EMCI) based on the genetic data and resting-state functional magnetic resonance imaging (rs-fMRI) data. Classical correlation analysis method is used to explore the association between brain regions and genes, and fusion features are constructed. The genetic-evolutionary idea is introduced to enhance the classification performance, and to extract the optimal features effectively. The proposed GERF algorithm is evaluated by the public Alzheimer's Disease Neuroimaging Initiative (ADNI) database, and the results show that the algorithm achieves satisfactory classification accuracy in small sample learning. Moreover, we compare the GERF algorithm with other methods to prove its superiority. Furthermore, we propose the overall framework of detecting pathogenic factors, which can be accurately and efficiently applied to the multi-modal data analysis of EMCI and be able to extend to other diseases. This work provides a novel insight for early diagnosis and clinicopathologic analysis of EMCI, which facilitates clinical medicine to control further deterioration of diseases and is good for the accurate electric shock using transcranial magnetic stimulation