Dr. Stoeckert directs the Computational Biology and Informatics Laboratory. The goal of our work is to help make sense of the enormous amount of biomedical data generated by high-throughput genomic approaches and synthesize them into something more than the sum of the parts. To that end, we are developing tools that enable researchers to mine and integrate data from a variety of different sources and types of experiments.
The first step in that process is the development of data warehouses that collect and store information in a useable fashion. In one such project, we have been working with David S. Roos, Ph.D., E. Otis Kendall Professor of Biology at Penn, and Jessica Kissinger, Ph.D., at University of Georgia, to develop a bioinformatics resource center for eukaryotic pathogens, funded by the National Institute of Allergy and Infectious Diseases and The Bill and Melinda Gates Foundation. Within the resource center, we have built databases that serve research communities interested in specific pathogens. For example, PlasmoDB, houses information on the parasites that cause malaria.
To maximize the utility of data warehouses, we must have ways to represent and store data that enables researchers to make connections between experiments and between data from different types of experiments. Therefore, part of my group is involved in knowledge representation and developing ontologies, which standardizes data through the use of controlled vocabularies and relationships. Our goal is to provide the tools, including ontologies, to allow people to annotate their experiments or mark up their papers in a way that another researcher could efficiently search for and combine particular kinds of results from a variety of sources.
We work with a number of groups on ontology projects, including the Ontology for Biomedical Investigations Consortium. I have also been involved in a number of standards projects over the years, and am currently the president of the MGED society, which promotes data sharing and standardized representation of data, particularly from genomic experiments.
In addition to building systems that help other researchers maximize the value of their data, my team is involved in model building and network analysis with the aim of discovering new insights into biology. One area we focus on is type I diabetes. As a member of the Beta Cell Biology Consortium, we have established a data warehouse that houses datasets from consortium members. Additionally, our role has been to help the consortium integrate information from those datasets, as well as from key datasets from outside consortium, and to put those data into the context of beta cell development and diabetes.
For example, while many researchers look at the list of genes produced in a microarray experiment, we try to go beyond list making and use computational methods to uncover connections between genes. To do that, we are developing networks of genes based on expression data and information from a variety of other sources, including published information on known interactions and computational analyses that predict interactions between genes. Once we have that data, we can start to visualize interacting partners and show where and when they are important in beta cell function and development.
The approaches and tools we develop in one research arena can often be applied to another one. For instance, we are using network analysis techniques, like those designed for the beta cell, to study primitive blood cell development in mammals with Jim Palis, M.D., at the University of Rochester. We are also starting to apply our data integration and analysis approaches to high-throughput sequencing data, including RNA-seq and ChIP-seq data.
Aurrecoechea C, Barreto A, Brestelli J, Brunk BP, Cade S, Doherty R, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Hu S, Iodice J, Kissinger JC, Kraemer ET, Li W, Pinney DF, Pitts B, Roos DS, Srinivasamoorthy G, Stoeckert CJ Jr, Wang H, Warrenfeltz S.: EuPathDB: The Eukaryotic Pathogen database. Nucleic Acids Res. 41(D1): D684-91, January 2013.
Kingsley PD, Greenfest-Allen E, Frame JM, Bushnell TP, Malik J, McGrath KE, Stoeckert CJ Jr, Palis J.: Ontogeny of erythroid gene expression. Blood Dec. 2012.
Choi E, Kraus MR, Lemaire LA, Yoshimoto M, Vemula S, Potter LA, Manduchi E, Stoeckert CJ Jr, Grapin-Botton A, Magnuson MA.: Dual lineage-specific expression of Sox17 during mouse embryogenesis. Stem Cells 30(10): 2297-308, Oct. 2012.
Parikh PP, Zheng J, Logan-Klumper F, Stoeckert CJ Jr, Louis C, Topalis P, Protasio AV, Sheth AP, Carrington M, Berriman M, Sahoo SS.: The Ontology for Parasite Lifecycle (OPL): towards a consistent vocabulary of lifecycle stages in parasitic organisms. J Biomed Semantics 3(1): 5, May 2012.
Hald J, Galbo T, Rescan C, Radzikowski L, Sprinkel AE, Heimberg H, Ahnfelt-Rønne J, Jensen J, Scharfmann R, Gradwohl G, Kaestner KH, Stoeckert C Jr, Jensen JN, Madsen OD.: Pancreatic islet and progenitor cell surface markers with cell sorting potential. Diabetologia 55(1): 154-65, Jan 2012.
Zheng J, Stoyanovich J, Manduchi E, Liu J, Stoeckert CJ Jr.: AnnotCompute: annotation-based exploration and meta-analysis of genomics experiments. Database (Oxford) 2011: bar045, Dec. 2011.
Dorrell C, Schug J, Lin CF, Canaday PS, Fox AJ, Smirnova O, Bonnah R, Streeter PR, Stoeckert CJ Jr, Kaestner KH, Grompe M.: Transcriptomes of the major human pancreatic cell types. Diabetologia 54(11): 2832-44, Nov. 2011.
Grant GR, Farkas MH, Pizarro A, Lahens N, Schug J, Brunk B, Stoeckert CJ Jr, Hogenesch JB, Pierce EA.: Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified Mapper (RUM). Bioinformatics. July 2011 Notes: Epub ahead of print]
Civelek M, Manduchi E, Riley RJ, Stoeckert CJ Jr, Davies PF: Coronary artery endothelial transcriptome in vivo: identification of endoplasmic reticulum stress and enhanced reactive oxygen species by gene connectivity network analysis. Circ Cardiovasc Genet. 4 (3): 243-52, June 2011.
Fischer S, Aurrecoechea C, Brunk BP, Gao X, Harb OS, Kraemer ET, Pennington C, Treatman C, Kissinger JC, Roos DS, Stoeckert CJ.: The strategies WDK: a graphical search interface and web development kit for functional genomics databases. Database 2011: bar027, June 2011.
back to top
Last updated: 12/28/2012
The Trustees of the University of Pennsylvania