




"NonParametric Analysis of Competing Risks Data with Event Category Missing at Random"
Natalia Gouskova, PhD
Senior Biostatistician
Department of Biostatistics
UNC Gillings School of Global Public Health
Abstract: In competing risks setup, the data for each subject consists of event time, censoring indicator, and event category. However, sometimes the information about the event category can be missing, as, for example, in a case when the date of death is known but the cause of death is not available. In such situations, treating subjects with missing event category as censored leads to the underestimation of the hazard functions. We suggest nonparametric estimators for the cumulative causespecific hazards and the cumulative incidence functions which use local polynomial regression techniques to estimate the contribution of an event with missing category to each of the causespecific hazards, and derive the propertied of these estimators. The method is illustrated using the data on infections in patients from the United States Cystic Fibrosis Foundation Patient Registry.
“Cancer Absolute Risk Projection with Incomplete Predictor Variables”
Lu Chen
PhD Candidate
Graduate Program in Biostatistics
Graduate Group in Epidemiology and Biostatistics
Dissertation Advisor: Jinbo Chen, PhD
Committee Chair: Hongzhe Li, PhD
Committee Members: Emily F. Conant, MD, Daniel F. Heitjan, PhD, and Andrea B. Troxel, ScD
Abstract: A popular approach to projecting cancer absolute risk is to integrate a relative risk function of predictors with hazard rates obtained from different sources. To assess added values of candidate risk predictors, it is very common that data for standard risk predictors is fully available from a frequencymatched casecontrol study, but that of candidate predictors is available only for a subset of cases and controls. In the first project, we developed statistical measures for quantifying predictive accuracy of cancer absolute risk prediction models, accommodating incomplete predictor variables. We particularly focused on a measure that is useful for evaluating efficiency of modelbased cancer screening, the proportion of cases that can be captured by screening only people with high projected risk. In the second project, using a logistic regression model to describe the relationship between cancer status and all risk predictors, we developed a novel semiparametric maximum likelihood approach under rare disease approximation for the estimation of relative risk parameters and the distribution of candidate predictors. Through theoretical and simulation studies, we showed that our estimator is consistent with an asymptotically normal distribution and has improved statistical efficiency. In the third project, we applied the statistical methods developed in the first two to evaluate the added values of percent mammographic density and breast cancer risk SNPs in breast cancer absolute risk projection. Our results showed that the two sets of predictors had similar added values and can lead to more efficient modelbased screening for breast cancer.
“Case Fatality and Population Mortality Associated with Anaphylaxis in the United States”
Larry MA, PhD
Johnson & Johnson Consumer Products Company
"Issues and challenges to analyze Genomewide Methylation data and possible solutions”
Hemant K. Tiwari PhD
William "Student" Sealy Gosset Professor
Head, Section on Statistical Genetics
Department of Biostatistics
University of Alabama at Birmingham School of Public Health
Abstract: For complex diseases there often exists the problem of missing heritability. It is continually debated whether this is because of highly complex genetic architecture that is not accounted for, or if there are actually greater heritable contributions unrelated to the actual DNA sequence itself in the study of epigenetics. An epigenetic modification is defined as any alteration to DNA that does not affect the sequence itself but serves a function and is retained when the cell divides. The study of epigenetics may provide vital information that can be used to provide a better understanding of this phenotypic variability among individuals. The most widely studied and perhaps the foundational epigenetic modification is DNA methylation. However, there are challenges to DNA methylation data analysis, specifically the analysis of DNA from the Illumina Methyl450 array which is a relatively new area that presents both computational and statistical challenges. This talk will cover issues with quality control, genomewide heritability estimation, and association methods for DNA methylation as well provide possible solutions.
“How Much Can Blocking and Randomization Improve Molecular Biomarker Discovery?”
A Block Randomized Study of microRNAs in Gynecologic Tumors
LiXuan Qin, PhD
Memorial SloanKettering Cancer Center and
Weill Medical College of Cornell University, NY
“Statistical Quantitative Imaging"
Russell Takeshi Shinohara, PhD
Assistant Professor of Biostatistics in Biostatistics and
Epidemiology, University of Pennsylvania, Perelman School of Medicine
“Emergence, control and reemergence of Trypanosoma cruzi in Southern Peru"
Michael Z. Levy, PhD
Assistant Professor of Epidemiology
University of Pennsylvania, Perelman School of Medicine
"Fast Covariance Estimation for HighDimensional Functional Data"
David Ruppert, PhD
Professor
Department of Statistical Science
Cornell University
Abstract: High dimensional functional data are becoming increasingly common. For such data, we propose fast methods for smooth estimation of the covariance function. These methods scale up linearly with J, the number of observations per function. Most available methods and software cannot smooth covariance matrices of dimension J greater than 500; a recently introduced sandwich smoother is an exception, but it is not adapted to smooth covariance matrices of large dimensions, such as J = 10, 000. We introduce two new methods that circumvent this problem: 1) an extremely fast implementation of the sandwich smoother for covariance smoothing; and 2) a twostep procedure that first obtains the singular value decomposition of the data matrix and then smooth the eigenvectors. In high dimensions, these new approaches are at least an order of magnitude faster than standard methods and drastically reduce memory requirements. The new approaches provide instantaneous (a few seconds) smoothing for matrices of dimension J = 10,000 and very fast (< 10 minutes) smoothing for J = 100, 000.
This is joint work with Luo Xiao, Ciprian Crainiceanu, and Vadim Zippunikov.
"Bayesian Modeling of Epigenetic Variation in Multiple Human Cell Types"
Yu Zhang, PhD
Associate Professor of Statistics
Department of Statistics
Pennsylvania State University
Abstract: With massive amount of sequencing data generated for many chromatin modifications in a variety of cell/tissue types, the chief challenges are to build effective and quantitative models explaining how the dynamics in multiple epigenomes lead to differential gene expression and diverse phenotypes. Current stateoftheart approaches for characterizing epigenetic landscapes are via genome segmentation, yet existing segmentation tools ignore the critical information of position specificity of epigenetic events and often treat all epigenomes equally without considering cell typespecific regulation in local regions. We developed a unified Bayesian framework for jointly annotating multiple epigenomes and detecting differential regulation among multiple tissues and cell types over regions of varying sizes. The method, called IDEAS (integrative and discriminative epigenome annotation system), achieves superior power and accuracy over existing methods by modeling both position and cell type specific regulatory activities. Using 84 genomewide epigenetic data sets in 6 cell types from ENCODE, we identified epigenetic variation that are strongly associated with differential gene expression. The detected regions are significantly enriched in genetic variants associated with complex phenotypes that are highly relevant to the corresponding cell types. They yielded much stronger enrichment scores than that achievable by existing approaches. Our analysis of cell type specificity could be of great importance in elucidating the interplay between genetic variants, gene regulation and diseases.
This is a joint work with Feng Yue and Ross Hardison.
"Shrinkage methods utilizing auxiliary information from external data sources to improve prediction models with many covariates"
Bhramar Mukherjee, PhD
Professor
Department of Biostatistics
University of Michigan
Abstract: We consider predicting an outcome Y using a large number of covariates X. However, most of the data we have to fit the model contains only Y and W, which is a noisy surrogate for X, and only on a small number of observations do we observe Y, X, and W. We develop Ridgetype shrinkage methods that tradeoff between bias and variance in a dataadaptive way to yield smaller prediction error using information from both datasets. We also demonstrate how the problem can be treated in a full Bayesian context with different forms of adaptive shrinkage. Finally, we introduce the notion of a hyperpenalty for guiding choices of the tuning parameter to perform adaptive shrinkage.
Our work is motivated by the rapid development of genomic assay technologies. In our application, mRNA expression of a selected number of genes is measured by both quantitative realtime polymerase chain reaction (qRTPCR, X) and microarray technology (W) on a small number of lung cancer patients. In addition, only microarray measurements (W) are available on a larger number of patients. For future patients, the goal is to predict survival time (Y) using qRTPCR (X). The question of interest is whether the large dataset containing only W aid with prediction of Y using X.
The highdimensionality of the problem, the large fraction of missing covariate information, and the fact that we are interested in a prediction model for YX (rather than YW) make this a nonstandard statistical problem. The general idea of integrating/leveraging information from existing diverse data sources to boost prediction has broader application in contemporary scientific studies. This is joint work with Philip S. Boonstra and Jeremy MG Taylor from the Department of Biostatistics, University of Michigan.