Perelman School of Medicine at the University of Pennsylvania

Long Research Group

Software

Multiple Imputation for High-dimensional Incomplete Data

MIHD: R package for multiple imputation for high-dimensional incomplete data.

References:

  • Y. Zhao and Q. Long, “Multiple imputation in the presence of high-dimensional data,” Statistical methods in medical research, p. 962280213511027, 2013.
  • Y. Deng, C. Chang, M. S. Ido, and Q. Long, “Multiple imputation for general missing data patterns in the presence of high-dimensional data,” Scientific reports, vol. 6, iss. 21689, 2016.

Package:

  • MIHD for Windows OS
  • MIHD  for Mac OS

Manual: MIHD

Close


Bootstrap Imputation with Variable Selection

BISS: R package for implementing boostrap imputation with variable selection.

Reference: Q. Long and B. A. Johnson, “Variable selection in the presence of missing data: resampling and imputation,” Biostatistics, vol. 16, iss. 3, pp. 596-610, 2015.

PackageBISSpkg 1.0

ManualBISS

Close


Knowledge-guided Sparse PCA

fgsPCA: matlab code to perform structured sparse PCA

Reference: Z. Li, S. Safo, and Q. Long, "Incorporating Biological Information in Sparse Principal Component Analysis with Application to Genomic Data", BMC bioinformatics 18.1 (2017): 332.

Matlab Code: fgsPCA

Close


Scalable Bayesian Variable Selection for Structured High-dimensional Data

EMSHS: R code to perform an EM alrogithm for Bayesian shrinkage approach with the structural information incorporated

Reference: Chang, C., Kundu, S., & Long, Q. (2018). Scalable Bayesian variable selection for structured high‐dimensional data. Biometrics. (https://onlinelibrary.wiley.com/doi/full/10.1111/biom.12882)

PackageEMSHS R Package in CRAN 

Close


Sparse Linear Discriminant Analysis in Structured Covariates Space

sSLDA: matlab code to perform structured sparse LDA

Reference: Safo, S.E., and Long, Q. (2016) Sparse linear discriminant analysis in structured covariates space. Statistical Analysis and Data Mining: The ASA Data Science Journal, 12(2), pp.56-69.

Matlab Code: sSLDA

Close


Structured Sparse CCA

sSCCA: matlab code to perform structured sparse CCA

Reference: S. Safo, S. Li and Q. Long, "Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information", Biometrics 74.1 (2018): 300-312.

Matlab Code: sSCCA_v2

Close


Penalized Co-Inertia Analysis

pCIA: R package for implementing penalized co-inertia analysis for two datasets.

Reference: E. Min, S. Safo, and Q. Long, “Penalized Co-Inertia Analysis with Applications to –Omics Data”, Bioinformatics, 2019, 35(6):1018-25.

PackagepCIA_0.9

ManualpCIA

Close


Distributed Learning from Multiple EHR Databases

Distributed Learning Predictor: Python library for learning from multiple databases and building predictive models based on Distributed Noise Contrastive Estimation (Distributed NCE)

Reference: Li, Z., Roberts, K.E., Jiang, X., and Long, Q. Distributed Learning from Multiple EHR Databases: Contextual Embedding Models for Medical Events. Journal of Biomedical Informatics, 2019, 92, p.103138. 

Link for the software on githubhttps://github.com/ziyili20/DistributedLearningPredictor

Close


Sparse Multiple Co-Inertia Analysis

pmCIA: R package to perform the sparse multiple co-inertia analysis for multiple datasets

Reference: Min, E.J. and Long, Q., 2020. Sparse multiple co-Inertia analysis with application to integrative analysis of multi-Omics data. BMC Bioinformatics, 21, pp.1-12.

Package: pmCIA_0.9

Close


Graph-guided Bayesian SVM

Graph-guided Bayesian SVM: Matlab codes for the graph-guided Bayesian SVM

Reference: Wenli Sun, Changgee Chang, and Qi Long, "Graph-guided Bayesian SVM with Adaptive Structured Shrinkage Prior for high-dimensional data" 

 

Package: GSVM-adaptive.zip

Close


Distribute Multiple Imputation

Distributed Multiple Imputation: R codes for the simulations reported in the paper

Reference: Changgee Chang, Yi Deng, Xiaoqian Jiang, and Qi Long. (2020) "Multiple Imputation for Analysis of Incomplete Data in Distributed Health Data Networks" Nature Communications, 11(1):5467.

R Codes: Distributed-MI.zip

GitHub: https://github.com/changgee/MIDist  

Close


Deep Learning with Gaussian Differential Privacy

Deep Learning with Gaussian Differential Privacy: Python codes

Reference: Bu, Z., Dong, J., Long, Q., and Su, W. (2020) Deep Learning with Gaussian Differential Privacy. Harvard Data Science Review, 2(3):1-48.

GitHub: https://github.com/woodyx218/Deep-Learning-with-GDP-Tensorflow

Python Library for TensorFlow Privacy including Gaussian DP: https://github.com/tensorflow/privacy

Close


Bayesian Graphical Models of Single-Cell RNA-Sequencing Data

Accounting for Technical Noise in Bayesian Graphical Models of Single-Cell RNA-Sequencing Data: Python codes

Reference: Oh, J., Chang, C. and Long, Q. (2021) Accounting for Technical Noise in Bayesian Graphical Models of Single-Cell RNA-Sequencing Data. Biostatistics, in press

GitHub: https://github.com/jihwan05/scLGM

Close


Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data

Multiple Imputation with Neural Network Gaussian Process: Python codes

Reference: Oh, J., Chang, C. and Long, Q. (2021) Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data. Asian Conference on Machine Learning 2022 (ACML 2022), in press.

GitHub: https://github.com/bestadcarry/MI-NNGP

Close