Qi Long, Ph.D.

faculty photo
Professor of Biostatistics in Biostatistics and Epidemiology
Director, Biostatistics and Bioinformatics Core, Abramson Cancer Center
Senior Scholar, Center for Clinical Epidemiology and Biostatistics
Associate Director, Penn Institute for Biomedical Informatics
Director, Center for Cancer Data Science
Professor, Department of Computer and Information Science, School of Engineering and Applied Science
Senior Fellow, Penn Leonard Davis Institute
Professor, Department of Statistics and Data Science, The Wharton School
Vice Chair of Faculty Professional Development, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine
Associate Director for Quantitative Data Science, Abramson Cancer Center
Department: Biostatistics and Epidemiology

Contact information
Department of Biostatistics, Epidemiology and Informatics
Perelman School of Medicine
University of Pennsylvania
201 Blockley Hall
423 Guardian Drive
Philadelphia, PA 19104
Office: 215-573-0659
Fax: 215-573-1050
Education:
B.S. (Biochemistry)
School of Gifted Young, University of Science and Technology of China, Hefei, Anhui, China, 1998.
M.S. (Biostatistics)
University of Michigan, Ann Arbor, MI, 2003.
Ph.D. (Biostatistics)
University of Michigan, Ann Arbor, MI, 2005.
Permanent link
 
> Perelman School of Medicine   > Faculty   > Details

Description of Research Expertise

Dr. Long's research purposefully includes novel statistical and ML/AI research and impactful biomedical research, each of which reinforces the other. Its thrust is to develop robust statistical and machine learning methods for advancing precision medicine. Specifically, he has developed methods for analysis of big health data (-omics, EHRs, and mHealth data), missing data, causal inference, data privacy, algorithmic fairness, Bayesian methods and clinical trials. Dr. Long’s methodological research has been supported by the National Institutes of Health (NIH), the Patient-Centered Outcomes Research Institute (PCORI) the National Science Foundation (NSF), and the Advanced Research Projects Agency for Health (ARPA-H).

Dr. Long has directed the Statistical and Data Coordinating Center for national research networks and large-scale multi-site clinical studies—supervising a team of database administrators and programmers, application developers and statistical analysts. He currently co-directs (with Dr. Nicola Mason at Penn Vet) the Coordinating Center for the Premedical Cancer Immunotherapy Network for Canine Trials (PRECINCT), part of NCI’s Cancer Moonshot Initiative.

Dr. Long is the founding Director of the Center for Cancer Data Science, and Associate Director for Cancer Informatics of the Penn Institute for Biomedical Informatics. He also directs the Biostatistics and Bioinformatics Core in the Abramson Cancer Center at the University of Pennsylvania.

Dr. Long is an elected fellow of the American Association for the Advancement of Science (AAAS), elected fellow of the American Statistical Association (ASA), and elected member of the International Statistical Institute (ISI).

Selected Publications

Orcutt, X., Chen, K., Mamtani, R., Long, Q.# and Parikh, R.B.# : Evaluating Generalizability of Results from Landmark Randomized Controlled Trials in Oncology To Real-World Patients using Machine Learning-Based Emulated Trials. Nature Medicine Page: https://doi.org/10.1038/s41591-024-03352-5, 2025 Notes: #joint senior/corresponding authors.

Li, X., Ruan, F., Wang, H., Long, Q.# and Su, W.J.#: A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules Annals of Statistics Page: in press, 2025 Notes: #joint senior/corresponding authors.

Chang C, Jang A, Manatunga A, Taylor A.T., Long, Q : A Bayesian Latent Class Model to Predict Kidney Obstruction Based on Renography and Expert Ratings in the Absence of Gold Standard. Journal of the American Statistical Association 115(532): 1645- 1663, 2020.

Wu, Y., Keoliya, M., Chen, K., Velingker, N., Li, Z., Getzen, E., Long, Q., Naik, M., Parikh, R. and Wong, E. : DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation. ICML 2024 2024.

Zhou, Z., Ataee Tarzanagh, D., Hou, B., Tong, B., Xu, J., Feng, Y., Long, Q.# and Shen, L.#: Fair Canonical Correlation Analysis. 2023 Conference on Neural Information Processing Systems (NeurIPS 2023) 2023 Notes: #joint senior/corresponding authors.

Getzen, E.J., Ungar, L., Mowery, D., Jiang, X., and Long, Q.: Mining for Equitable Health: Assessing the Impact of Missing Data in Electronic Health Records. Journal of Biomedical Informatics 139: 104269, 2023.

Zhang Y., Long, Q. : Assessing Fairness in the Presence of Missing Data. 2021 Conference on Neural Information Processing Systems (NeurIPS 2021) 34: 16007-16019, 2021.

Fang, C., He, H., Long, Q., Su, W.: Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training. Proceedings of the National Academy of Sciences (PNAS) 118(43): e2103091118, 2021.

Chang, C., Deng, Y., Jiang, X., Long, Q.: Multiple Imputation for Analysis of Incomplete Data in Distributed Health Data Networks. Nature Communications 11(1): 5467, 2020.

Bu, Z., Dong, J., Long, Q., Su, W.: Deep Learning with Gaussian Differential Privacy. Harvard Data Science Review 2(3): 1-48, 2020.

Zheng, Q., Dong, J., Long, Q., Su, W.: Sharp Composition Bounds for Gaussian Differential Privacy via Edgeworth Expansion. Proceedings of the 37th International Conference on Machine Learning (ICML 2020) 119: 11420-11435, 2020.

Zhao, Y., Chang, C., and Long, Q.: Knowledge-guided statistical learning methods for analysis of high-dimensional -omics data in precision oncology. JCO Precision Oncology 3: 1-9, 2019.

Min EJ, Safo SE, Long Q: Penalized co-inertia analysis with applications to -omics data. Bioinformatics 35(6): 1018-1025, 2019 Notes: doi: 10.1093/bioinformatics/bty726.

Li Z, Roberts K, Jiang X, Long Q: Distributed Learning from Multiple EHR Databases: Contextual Embedding Models for Medical Events. Journal of Biomedical Informatics 92: 103138, 2019 Notes: doi: 10.1016/j.jbi.2019.103138. Epub 2019 Feb 27.

Zhao, Y.*, Chung, M., Johnson, B.A., Moreno, C.S., and Long, Q.: Hierarchical feature selection incorporating known and novel biological information: Identifying genomic features related to prostate cancer recurrence. Journal of the American Statistical Association 111(516): 1427-1439, 2016 Notes: *An earlier version won Yize Zhao the David P. Byar Travel Award from American Statistical Association’s Biometrics Section 2014.

Chang C, Kundu S, Long Q: Scalable Bayesian variable selection for structured high-dimensional data. Biometrics 74(4): 1372-1382, 2018 Notes: doi: 10.1111/biom.12882. Epub 2018 May 8.

Safo, S.E., Li, S., and Long, Q.: Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information. Biometrics 74(1): 300-312, 2018.

Long, Q., Xu, J., Osunkoya, A.O., Sannigrahi, S., Johnson, B.A., Zhou, W., Gillespie, T., Park, J.Y., Nam, R.K., Sugar, L., Stanimirovic, A., Seth, A.K., Petros, J.A., and Moreno, C.S.: Global transcriptome analysis of formalin-fixed prostate cancer specimens identifies biomarkers of disease recurrence. Cancer Research 74(12): 3228-3237, 2014.

Long, Q., Little, R.J., and Lin, X.: Causal inference in hybrid intervention trials involving treatment choice. Journal of the American Statistical Association 103(482): 474-484, 2008.

back to top
Last updated: 03/10/2025
The Trustees of the University of Pennsylvania