2
12
18
28
12
12
1a
14
e
12
16
a
a
2
2
a
20
32
1a
2a
19
Faculty
61 16
19
1
49
2
2
1a
32
34
1b
1d
18
2d
53
1d
2 29
1d
25
Kai Wang, Ph.D.
78
53
Professor of Pathology and Laboratory Medicine
7
78
Department: Pathology and Laboratory Medicine
4
1
23
1f
Graduate Group Affiliations
8
a
b
1d
46
Contact information
4f
4
3
3
3
2
4
b
1f
4f
3501 Civic Center Blvd, CTRB 6004
46 Children's Hospital of Philadelphia
Philadelphia, PA 19104
26
46 Children's Hospital of Philadelphia
Philadelphia, PA 19104
2c
Office: 2674259573
30 Fax: 2155903660
24
f
30 Fax: 2155903660
24
13
Education:
21 9 B.S. 2d (Biochemistry & Molecular Biology) c
2a Peking University, 2000.
21 9 M.S. 1a (Tumor Biology) c
24 Mayo Clinic, 2002.
21 a Ph.D. 31 (Microbiology & Computational Biology) c
31 University of Washington, 2005.
c
3
27
5
3
3
92
Permanent link21 9 B.S. 2d (Biochemistry & Molecular Biology) c
2a Peking University, 2000.
21 9 M.S. 1a (Tumor Biology) c
24 Mayo Clinic, 2002.
21 a Ph.D. 31 (Microbiology & Computational Biology) c
31 University of Washington, 2005.
c
2 29
21
1e
1d
24
5e
8
1a2 First, we are developing analytical pipelines for whole genome and whole exome sequencing data, all the way from FASTQ/FAST5/POD5 files to biological insights. Some examples of computational tools include ANNOVAR, InterVar, CancerVar, Phenolyzer, Phen2Gene and PhenoSV. These approaches enhance the interpretation of sequencing data by uncovering functional content and providing clinically relevant insights.
8
1e4 Furthermore, we are developing genomic assays and methods to analyze long-read data, such as those generated from PacBio and Oxford Nanopore sequencing. These methods aid in identifying causal genetic variants in cases that elude diagnosis by traditional whole genome or exome sequencing and enable the detection of aberrant DNA and RNA methylation patterns. Some examples of computational tools include RepeatHMM, LinkedSV, ContextSV, NanoRepeat, LIQA, DeepMod and DeepMod2.
8
1ee Finally, we are developing Artificial Intelligence (AI) and Machine Learning (ML) approaches to correlate genotype with phenotype, and to better understand the phenotypic heterogeneity of inherited diseases. We believe that multimodal AI holds the potential to transform our understanding of biology and medicine—what remains is to develop the right algorithms to fully harness its power. Some examples of computational tools include EHR-Phenolyzer, PhenoGPT, MutFormer and GestaltMML.
26 29
27
Description of Research Expertise
1a4 The research in our laboratory aims to develop novel genomics and bioinformatics methods to improve the diagnosis, treatment, and prognosis of rare diseases, to ultimately facilitate the implementation genomic medicine on scale. A detailed description of our research and rotation projects can be found on our lab website (https://wglab.org). In summary, our research can be divided into several areas.8
1a2 First, we are developing analytical pipelines for whole genome and whole exome sequencing data, all the way from FASTQ/FAST5/POD5 files to biological insights. Some examples of computational tools include ANNOVAR, InterVar, CancerVar, Phenolyzer, Phen2Gene and PhenoSV. These approaches enhance the interpretation of sequencing data by uncovering functional content and providing clinically relevant insights.
8
1e4 Furthermore, we are developing genomic assays and methods to analyze long-read data, such as those generated from PacBio and Oxford Nanopore sequencing. These methods aid in identifying causal genetic variants in cases that elude diagnosis by traditional whole genome or exome sequencing and enable the detection of aberrant DNA and RNA methylation patterns. Some examples of computational tools include RepeatHMM, LinkedSV, ContextSV, NanoRepeat, LIQA, DeepMod and DeepMod2.
8
1ee Finally, we are developing Artificial Intelligence (AI) and Machine Learning (ML) approaches to correlate genotype with phenotype, and to better understand the phenotypic heterogeneity of inherited diseases. We believe that multimodal AI holds the potential to transform our understanding of biology and medicine—what remains is to develop the right algorithms to fully harness its power. Some examples of computational tools include EHR-Phenolyzer, PhenoGPT, MutFormer and GestaltMML.
26 29
23
173 Gracia-Diaz C, Perdomo JE, Khan ME, Roule T, Disanza BL, Cajka GG, Lei S, Gagne AL, Maguire JA, Shalem O, Bhoj EJ, Ahrens-Nicklas RC, French DL, Goldberg EM, Wang K, Glessner JT, Akizu N.: KOLF2.1J iPSCs carry CNVs associated with neurodevelopmental disorders. Cell Stem Cell 31: 288-289, Mar 2024.
fa Jiang TT, Fang L, Wang K.: Deciphering "the language of nature": A transformer-based language model for deleterious mutations in proteins. Innovation (Camb) 4: 100487, Jul 2023.
10b Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K.: Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT. Patterns (N Y) 5: 100887, Dec 2023.
f8 Xu Z, Li Q, Marchionni L, Wang K.: PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants. Nat Commun 14: 7805, Nov 2023.
13f Fang L, Monteys AM, Dürr A, Keiser M, Cheng C, Harapanahalli A, Gonzalez-Alegre P, Davidson BL, Wang K.: Haplotyping SNPs for allele-specific gene editing of the expanded huntingtin allele using long-read sequencing. HGG Adv 4: 100146, Sep 2022.
139 Fang L, Liu Q, Monteys AM, Gonzalez-Alegre P, Davidson BL, Wang K: DeepRepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing. Genome Biol 23(1): 108, April 2022 Notes: DOI: 10.1186/s13059-022-02670-6.
145 Ahsan U, Liu Q, Fang L, Wang K: NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks. Genome Biol 22(1): 261, Sep 2021 Notes: DOI: 10.1186/s13059-021-02472-2.
113 Havrilla JM, Liu C, Dong X, Weng C, Wang K: PhenCards: a data resource linking human phenotype information to biomedical knowledge. Genome Med 13(1): 91, May 2021 Notes: DOI: 10.1186/s13073-021-00909-8.
f2 Hu Y, Fang L, Chen X, Zhong JF, Li M, Wang K: LIQA: Long-read Isoform Quantification and Analysis. Genome Biol 22: 182, June 2021 Notes: DOI: 10.1186/s13059-021-02399-8.
113 Georgieva D, Liu Q, Wang K*, Egli D*: Detection of Base Analogs Incorporated During DNA Replication by Nanopore Sequencing. Nucleic Acids Res 48(15): e88, September 2020 Notes: DOI: 10.1093/nar/gkaa517.
14b Doostparast Torshizi A, Armoskus C, Zhang H, Forrest MP, Zhang S, Souaiaia T, Evgrafov OV, Knowles JA, Duan J*, Wang K*: Deconvolution of Transcriptional Networks Identified TCF4 as a Master Regulator in Schizophrenia. Sci Adv 5(9): eaau4139, September 2019.
162 Fang L, Kao C, Gonzalez MV, Mafra FA, Pellegrino R, Li M, Wenzel S, Wimmer K, Hakonarson H, Wang K: LinkedSV for detection of mosaic structural variants from linked-read exome and genome sequencing data. Nat Commun 10(1): 5585, December 2019 Notes: DOI: 10.1038/s41467-019-13397-7.
129 Liu Q, Fang L, Yu G, Wang D, Xiao CL, Wang K: Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nat Commun 10(1): 2449, June 2019 Notes: doi: 10.1038/s41467-019-10168-2.
2c
7
1d
1f
Selected Publications
ff Ahsan MU, Gouru A, Chan J, Zhou W, Wang K.: A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing. Nat Commun 15: 1448, Feb 2024.173 Gracia-Diaz C, Perdomo JE, Khan ME, Roule T, Disanza BL, Cajka GG, Lei S, Gagne AL, Maguire JA, Shalem O, Bhoj EJ, Ahrens-Nicklas RC, French DL, Goldberg EM, Wang K, Glessner JT, Akizu N.: KOLF2.1J iPSCs carry CNVs associated with neurodevelopmental disorders. Cell Stem Cell 31: 288-289, Mar 2024.
fa Jiang TT, Fang L, Wang K.: Deciphering "the language of nature": A transformer-based language model for deleterious mutations in proteins. Innovation (Camb) 4: 100487, Jul 2023.
10b Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K.: Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT. Patterns (N Y) 5: 100887, Dec 2023.
f8 Xu Z, Li Q, Marchionni L, Wang K.: PhenoSV: interpretable phenotype-aware model for the prioritization of genes affected by structural variants. Nat Commun 14: 7805, Nov 2023.
13f Fang L, Monteys AM, Dürr A, Keiser M, Cheng C, Harapanahalli A, Gonzalez-Alegre P, Davidson BL, Wang K.: Haplotyping SNPs for allele-specific gene editing of the expanded huntingtin allele using long-read sequencing. HGG Adv 4: 100146, Sep 2022.
139 Fang L, Liu Q, Monteys AM, Gonzalez-Alegre P, Davidson BL, Wang K: DeepRepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing. Genome Biol 23(1): 108, April 2022 Notes: DOI: 10.1186/s13059-022-02670-6.
145 Ahsan U, Liu Q, Fang L, Wang K: NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks. Genome Biol 22(1): 261, Sep 2021 Notes: DOI: 10.1186/s13059-021-02472-2.
113 Havrilla JM, Liu C, Dong X, Weng C, Wang K: PhenCards: a data resource linking human phenotype information to biomedical knowledge. Genome Med 13(1): 91, May 2021 Notes: DOI: 10.1186/s13073-021-00909-8.
f2 Hu Y, Fang L, Chen X, Zhong JF, Li M, Wang K: LIQA: Long-read Isoform Quantification and Analysis. Genome Biol 22: 182, June 2021 Notes: DOI: 10.1186/s13059-021-02399-8.
113 Georgieva D, Liu Q, Wang K*, Egli D*: Detection of Base Analogs Incorporated During DNA Replication by Nanopore Sequencing. Nucleic Acids Res 48(15): e88, September 2020 Notes: DOI: 10.1093/nar/gkaa517.
14b Doostparast Torshizi A, Armoskus C, Zhang H, Forrest MP, Zhang S, Souaiaia T, Evgrafov OV, Knowles JA, Duan J*, Wang K*: Deconvolution of Transcriptional Networks Identified TCF4 as a Master Regulator in Schizophrenia. Sci Adv 5(9): eaau4139, September 2019.
162 Fang L, Kao C, Gonzalez MV, Mafra FA, Pellegrino R, Li M, Wenzel S, Wimmer K, Hakonarson H, Wang K: LinkedSV for detection of mosaic structural variants from linked-read exome and genome sequencing data. Nat Commun 10(1): 5585, December 2019 Notes: DOI: 10.1038/s41467-019-13397-7.
129 Liu Q, Fang L, Yu G, Wang D, Xiao CL, Wang K: Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nat Commun 10(1): 2449, June 2019 Notes: doi: 10.1038/s41467-019-10168-2.
2c
4d
22
22
7
10
a
a
2
2
19
18
10
22
10
11
c
5b © The Trustees of the University of Pennsylvania | Site best viewed a in a supported browser. | Site Design: 57 PMACS Web Team. 3 22
10
c