2 12 18
28
32 1a 2a

Faculty

61 16
19
1b
2c

Qi Long, Ph.D.

78 faculty photo 61
Professor of Biostatistics in Biostatistics and Epidemiology
7 75
Department: Biostatistics and Epidemiology
4 1 23 b
1d
46 Contact information
67
Department of Biostatistics, Epidemiology and Informatics
23 Perelman School of Medicine
22 University of Pennsylvania
19 201 Blockley Hall
35 423 Guardian Drive
Philadelphia, PA 19104
26
2e Office: 215-573-0659
32 Fax: 215-573-1050
24
99 12
4 3 3 3 2 4 b 1f
13 Education:
21 9 B.S. 19 (Biochemistry) c
73 School of Gifted Young, University of Science and Technology of China, Hefei, Anhui, China, 1998.
21 9 M.S. 1a (Biostatistics) c
3e University of Michigan, Ann Arbor, MI, 2003.
21 a Ph.D. 1a (Biostatistics) c
3e University of Michigan, Ann Arbor, MI, 2005.
c
3 27 5 3 3 92 Permanent link
2 29
 
1d
25
21
b6 > Perelman School of Medicine   > Faculty   > Details a
1e 1d
76

Description of Research Expertise

39c Dr. Long's research program bridges novel statistical and ML/AI research and impactful biomedical research. Its thrust is to develop robust statistical and ML/AI methods and models for advancing precision medicine and population health. Specifically, he has developed responsible statistical methods and ML/AI models for (integrative) analysis of big health data (such as -omics data, electronic health records data, and imaging data), missing data, causal inference, data privacy, algorithmic fairness, Bayesian methods and clinical trials. More recently, Dr. Long's research has branched into LLMs, foundation models and AI agents for biomedicine. Dr. Long's methodological research has been supported by the National Institutes of Health (NIH), the Patient-Centered Outcomes Research Institute (PCORI), the National Science Foundation (NSF), and the Advanced Research Projects Agency for Health (ARPA-H).
8
1d6 Dr. Long has directed the Statistical and Data Coordinating Center for national research networks and large-scale multi-site clinical studies—supervising a team of database administrators and programmers, application developers and statistical analysts. He currently co-directs (with Dr. Nicola Mason at Penn Vet) the Coordinating Center for the Premedical Cancer Immunotherapy Network for Canine Trials (PRECINCT), part of NCI’s Cancer Moonshot Initiative.
8
122 Dr. Long is the founding Director of the Center for Cancer Data Science, Associate Director for Cancer Informatics of the Penn Institute for Biomedical Informatics, and Associate Director for Quantitative Data Science of the Abramson Cancer Center at the University of Pennsylvania.
8
f6 Dr. Long is an elected fellow of the American Association for the Advancement of Science (AAAS), the American Statistical Association (ASA), the Institute of Mathematical Statistics (IMS), and the International Statistical Institute (ISI).
e 29
23

Selected Publications

19f Orcutt, X., Chen, K., Mamtani, R., Long, Q.# and Parikh, R.B.# : Evaluating Generalizability of Results from Landmark Randomized Controlled Trials in Oncology To Real-World Patients using Machine Learning-Based Emulated Trials. Nature Medicine Page: https://doi.org/10.1038/s41591-024-03352-5, 2025 Notes: #joint senior/corresponding authors.

143 Li, X., Ruan, F., Wang, H., Long, Q.# and Su, W.J.#: A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules Annals of Statistics Page: in press, 2025 Notes: #joint senior/corresponding authors.

14d Chang C, Jang A, Manatunga A, Taylor A.T., Long, Q : A Bayesian Latent Class Model to Predict Kidney Obstruction Based on Renography and Expert Ratings in the Absence of Gold Standard. Journal of the American Statistical Association 115(532): 1645- 1663, 2020.

116 Wu, Y., Keoliya, M., Chen, K., Velingker, N., Li, Z., Getzen, E., Long, Q., Naik, M., Parikh, R. and Wong, E. : DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation. ICML 2024 2024.

143 Zhou, Z., Ataee Tarzanagh, D., Hou, B., Tong, B., Xu, J., Feng, Y., Long, Q.# and Shen, L.#: Fair Canonical Correlation Analysis. 2023 Conference on Neural Information Processing Systems (NeurIPS 2023) 2023 Notes: #joint senior/corresponding authors.

11b Getzen, E.J., Ungar, L., Mowery, D., Jiang, X., and Long, Q.: Mining for Equitable Health: Assessing the Impact of Missing Data in Electronic Health Records. Journal of Biomedical Informatics 139: 104269, 2023.

f0 Zhang Y., Long, Q. : Assessing Fairness in the Presence of Missing Data. 2021 Conference on Neural Information Processing Systems (NeurIPS 2021) 34: 16007-16019, 2021.

120 Fang, C., He, H., Long, Q., Su, W.: Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training. Proceedings of the National Academy of Sciences (PNAS) 118(43): e2103091118, 2021.

f4 Chang, C., Deng, Y., Jiang, X., Long, Q.: Multiple Imputation for Analysis of Incomplete Data in Distributed Health Data Networks. Nature Communications 11(1): 5467, 2020.

cd Bu, Z., Dong, J., Long, Q., Su, W.: Deep Learning with Gaussian Differential Privacy. Harvard Data Science Review 2(3): 1-48, 2020.

12d Zheng, Q., Dong, J., Long, Q., Su, W.: Sharp Composition Bounds for Gaussian Differential Privacy via Edgeworth Expansion. Proceedings of the 37th International Conference on Machine Learning (ICML 2020) 119: 11420-11435, 2020.

102 Zhao, Y., Chang, C., and Long, Q.: Knowledge-guided statistical learning methods for analysis of high-dimensional -omics data in precision oncology. JCO Precision Oncology 3: 1-9, 2019.

f2 Min EJ, Safo SE, Long Q: Penalized co-inertia analysis with applications to -omics data. Bioinformatics 35(6): 1018-1025, 2019 Notes: doi: 10.1093/bioinformatics/bty726.

138 Li Z, Roberts K, Jiang X, Long Q: Distributed Learning from Multiple EHR Databases: Contextual Embedding Models for Medical Events. Journal of Biomedical Informatics 92: 103138, 2019 Notes: doi: 10.1016/j.jbi.2019.103138. Epub 2019 Feb 27.

1f7 Zhao, Y.*, Chung, M., Johnson, B.A., Moreno, C.S., and Long, Q.: Hierarchical feature selection incorporating known and novel biological information: Identifying genomic features related to prostate cancer recurrence. Journal of the American Statistical Association 111(516): 1427-1439, 2016 Notes: *An earlier version won Yize Zhao the David P. Byar Travel Award from American Statistical Association’s Biometrics Section 2014.

100 Chang C, Kundu S, Long Q: Scalable Bayesian variable selection for structured high-dimensional data. Biometrics 74(4): 1372-1382, 2018 Notes: doi: 10.1111/biom.12882. Epub 2018 May 8.

11f Safo, S.E., Li, S., and Long, Q.: Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information. Biometrics 74(1): 300-312, 2018.

1a2 Long, Q., Xu, J., Osunkoya, A.O., Sannigrahi, S., Johnson, B.A., Zhou, W., Gillespie, T., Park, J.Y., Nam, R.K., Sugar, L., Stanimirovic, A., Seth, A.K., Petros, J.A., and Moreno, C.S.: Global transcriptome analysis of formalin-fixed prostate cancer specimens identifies biomarkers of disease recurrence. Cancer Research 74(12): 3228-3237, 2014.

101 Long, Q., Little, R.J., and Lin, X.: Causal inference in hybrid intervention trials involving treatment choice. Journal of the American Statistical Association 103(482): 474-484, 2008.

2c
7 1d
2c back to top
26 Last updated: 11/10/2025
34 The Trustees of the University of Pennsylvania c
1f
27
24
 
1d
18
1 49 2 2 1a 32 34
19
12 12 1a 14