Perelman School of Medicine at the University of Pennsylvania

Shen Lab


The central theme of the lab is focused on developing computational and informatics methods for integrative analysis of multimodal imaging data, high throughput "omics" data, cognitive and other biomarker data, electronic health record data, and rich biological knowledge such as pathways and networks, with applications to various complex disorders. Our research interests include medical image computing, bioinformatics, machine learning, network science, visual analytics, and big data science in biomedicine. The following are some of our research activities.

Integrative Bioinformatics Approaches to Human Brain Genomics and Connectomics

This project is supported by an NIBIB R01 award (R01 EB022574). We are working on three proposed aims: (1) develop a novel computational pipeline for systematic characterization of structural connectome optimized for imaging genomics; (2) develop novel bioinformatics strategies to determining genetic basis of structural connectome; and (3) develop a visual analytic software system for interactive visual exploration and mining of fiber-tracts and brain networks with their genetic determinants and functional outcomes. We are using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and Human Connectome Project (HCP) cohorts as test beds to develop methods and tools with potential for a better understanding of the interplay between genes, brain connectivity and function.

Mining Drug Interaction Induced Adverse Effects (ADEs) from Health Record Databases

This project is supported by an NSF award (IIS 1827472 and IIS 1622526). Recent advances in large-scale electronic health record database techniques provide exciting new opportunities to the study of drug safety. Drug-drug interactions (DDIs), a major cause of adverse drug events (ADEs), are a serious global health concern, and a severe detriment to public health. The scale of DDIs involving three or more drugs (also called high-order DDIs) has posed a prohibitory challenge for its molecular pharmacology and clinical research, which motivates alternative strategies such as mining health record data. This project aims to develop large-scale computational strategies and effective software tools for mining high-order DDI effects from health record databases, in order to yield novel discoveries in drug safety, and ultimately to benefit national health and well-being.

Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining

This project is supported by an NSF award (IIS1837964). Recent advances in multimodal brain imaging and high throughput genotyping and sequencing techniques provide exciting new opportunities to ultimately improve our understanding of brain structure and neural dynamics, their genetic architecture, and their influences on cognition and behavior. However, data privacy and security issues have inhibited data sharing across institutes. Emerging multi-site collaborative data analysis can address these issues and facilitate data and computing resource sharing. This project seeks to harness the opportunities of designing new efficient asynchronous distributed machine learning algorithms with rigorous theoretical foundations for multi-site collaborative brain big data mining, creating large-scale computational strategies and effective software tools to reveal sophisticated relationships among heterogeneous brain data.

Bioinformatics Strategies for Multidimensional Brain Imaging Genetics

This project is supported by an NLM R01 award (R01 LM011360). We have been working on producing novel bioinformatics algorithms and tools for comprehensive joint analysis of large scale heterogeneous imaging genomics data, using Alzheimer’s Disease Neuroimaging Initiative (ADNI) database as a test bed. We have published a variety of novel machine learning models for effective mining of complex imaging genomic associations, including structured sparse regression models, structured sparse canonical correlation analysis (SCCA) models, and gene-gene and gene-environment interaction models. We have also developed a novel imaging genetic enrichment analysis (IGEA) framework for identifying high level associations between gene sets and brain circuitries, and a novel network-based machine learning framework to identify phenotype-relevant functional modules from tissue-specific biological networks. We are working on developing novel machine learning and bioinformatics strategies for integrating brain genomics, transcriptomics and anatomics.

Genetic and Multi-Omic Analysis of Quantitative Phenotypes in AD

This focus aims to investigate the role of genetic variation in disordered brain function using neuroimaging and biomarkers as phenotypes. Besides the method development work described above, we also employ state-of-the-art methods to perform genetic analysis of quantitative phenotypes in AD. ADNI (U01 AG024904) is a landmark study in AD, and Dr. Shen served as a Co-Leader of its Genetics Core between 2009 and 2017. Using data from ADNI and local cohorts, we have completed a series of candidate gene and genome-wide association studies (GWAS) of structural and molecular neuroimaging data and other biomarker data (e.g., cerebrospinal fluid, plasma proteomics, cognition) in mild cognitive impairment (MCI) and AD. These studies yielded many interesting genetic findings in relation to quantitative phenotypes. Given the broadened landscape of ADNI multi-omic domain (e.g., including data from genome, epigenome, transcriptome, proteome, and metabolome), we are interested in expanding the scope of our imaging omics study from the genomic domain to multi-omic domain.

Multidimensional Data Mining and Biomarker Discovery

This topic is aimed to identify biomarkers from multidimensional data sets, including multimodal imaging data, high throughput omics data, and fluid biomarker data, for predicting cognitive and diagnostic outcomes. This work was partially supported by a completed NSF project (IIS-1117335), where we proposed and applied a series of sparse machine learning methods to the ADNI cohort for mining multidimensional imaging, omics and fluid biomarker data and discovering disease-sensitive and/or cognition relevant biomarkers. These approaches include machine learning models for sparse Bayesian classification, structured sparse multi-task regression, sparse learning for joint classification and regression, multi-modal multi-task learning, and multi-task longitudinal learning. Given the scale and complexity of the multidimensional imaging, omics and biomarker data, we are interested in refining our models for multidimensional data integration and longitudinal learning, as well as to address the big data analytic issue.

Biomedical Image Computing

This focus aims to develop and apply image and shape computing methods for analyzing MRI, PET, CT and other 3D imaging data. We have made a variety of contributions to the enhancement of the spherical harmonic (SPHARM) shape modeling technique by addressing its fundamental challenges, including generalization, scalability, and flexibility. Supported by an NIBIB project (R03 EB008674), we developed and released SPHARM-MAT, a SPHARM-based software toolkit for brain imaging ( We have applied SPHARM to various biomedical applications, including hippocampal atrophy in brain disorders, cortical analysis in autism, thalamic atrophy in multiple sclerosis, cardiac motion analysis, and evolutionary biology. Besides SPHARM, we have also developed image processing methods for studying craniofacial dysmorphology in fetal alcohol spectrum disorder and spatiotemporal modeling of lung nodules. We are interested in developing novel methods for morphometric analysis of hippocampal subfields as well as image processing and machine learning methods for diagnosing dental hard-tissue conditions.