The 2016 Conference will bring together leading scientists who will discuss current statistical issues in adaptive clinical trial designs. Registration opens January 4, 2106. Participants from all sectors with an interest in clinical trials methodology are encouraged to register. For more information, please see the event page at http://www.med.upenn.edu/cceb/biostat/ClinTrials16_index.shtml
“Trend-in-Trend: A Novel Epidemiologic-Ecologic Design for Causal Inference"
Sean Hennessy, PharmD, PhD
Professor of Epidemiology in Biostatistics and Epidemiology
University of Pennsylvania
"Penalized Nonlinear Mixed Effects Model to Identify Biomarkers for Predicting Disease Progression”
Huaihou Chen, PhD Assistant Professor Department of Biostatistics University of Florida
Abstract: Precisely modeling disease progression is crucial for research on neurodegenerative disorders, since early intervention before clinical diagnoses in premanifest subjects is expected to be more effective. Early prediction of disease abnormality at preclinical stage may be possible by assessing prodromal neuroimaging biomarkers, which are indicative of the underlying disease pathology and are often high-dimensional. As observed in many studies including PREDICT-HD, longitudinal measurements of clinical outcomes such as motor or cognitive symptoms often present nonlinear sigmoid shapes, where inflection points in the trajectories mark a critical time when disease progression stops acceleration. Therefore, to identify neuroimaging biomarkers predicting the disease progression, we propose a nonlinear mixed effects model based on a sigmoid function to predict longitudinal clinical outcomes. The proposed method models subject-specific inflection points of disease progression using a high-dimensional linear predictor of the neuroimaging biomarkers. We propose an EM-based method to fit the nonlinear model to avoid direct optimization of a highly nonlinear objective function. Variable selection is incorporated in the algorithm in order to identify important biomarkers for disease progression. We apply the proposed method to the PREDICT-HD study to select brain subcortical regional volumes for predicting motor and cognitive function degeneration. Our results reveal that brain atrophy in the striatum and expansion of the ventricular system are highly associated with the inflection points of motor and cognitive function. Furthermore, these inflection points may precede clinically defined disease onset by as early as a decade and thus the brain atrophy in the striatum and expansion of the ventricular system can be useful biomarkers for early diagnosis of Huntington's disease.
"Dynamic Prediction with Time-to-Event Outcome”
Yingye Zheng, PhD Affiliate Professor Department of Biostatistics University of Washington
Abstract: Long term follow-up is common in many medical investigations where the interest lies in predicting patients' risks for a future adverse outcome using repeatedly measured predictors over time. A key quantity in this setting is the likelihood of developing an adverse outcome among individuals who survived up to time given their covariate information up to that time. Simple, yet reliable, methodology for updating the predicted risk of disease progression using longitudinal markers remains elusive. Two main approaches have been considered in the literature. One approach, based on joint modeling (JM) of failure time and longitudinal covariate process, derives such longitudinal predictive probability from the joint probability of a longitudinal marker and an event at a given time. A second approach, the partly conditional (PC) modeling, directly models the predictive probability conditional on survival up to a landmark time and information accrued by that time. In this paper we propose new two-stage PC models for longitudinal prediction that are more flexible than joint modeling and improve the prediction accuracy over existing PC models. We provide procedures for making inference regarding future risk for an individual with longitudinal measures up to a given time. In addition, we conduct comprehensive simulations to evaluate both JM and PC approaches in order to provide practical guidance on modeling choices.
"Case-Cohort Studies for Multiple Diseases”
Jianwen Cai, PhD Distinguished Professor Department of Biostatistics University of North Carolina at Chapel Hill
Abstract: Case-cohort study design is one of the commonly used methods to reduce cost in large observational cohort studies. The design consists of a random sample of the entire cohort, named subcohort, and all the subjects with the disease of interest. One important advantage of the case-cohort study design is to re-use the same subcohort when several diseases are of interest. In multiple case-cohort studies, covariates collected on subjects with other diseases are available when estimating the risk effect on one disease. Usually, the analysis is done separately for each disease ignoring data collected on subjects with the other diseases. We propose more efficient estimators to make better use of the available information. We consider the additive hazards models for stratified case-cohort studies with rare and non-rare diseases and investigate both joint and separate analyses. An estimating equation approach with a new weight function is proposed and the proposed estimators are shown to be consistent and asymptotically normally distributed. Simulation studies show that the proposed methods using all available information gain efficiency. We apply our proposed method to data from the Atherosclerosis Risk in Communities (ARIC) study.
"Assessing Time-Varying Causal Effect Moderation in Mobile Health”
Daniel Almirall, PhD Research Assistant Professor Survey Research Center Institute for Social Research, University of Michigan
Abstract: "In mobile health (mHealth) for behavior change and maintenance, interventions are frequent and momentary. Typically, a great deal of information on patient states (e.g., stress), environmental factors, and behavioral responses is generated over time. Such intensive longitudinal data is often collected by self-report or passively with the aid of sensors. One way in which intensive longitudinal intervention data may aid the design of a mobile intervention is the examination of effect moderation; that is, inference about which factors strengthen or weaken the response to just-in-time interventions. In this setting, treatments, outcomes, and candidate moderators are all time-varying. This paper introduces a formal definition for moderated effects in terms of potential outcomes, suitable for intensive longitudinal data, and it develops and compares three estimation strategies (an inverse-probability of treatment weighted approach, a treatment centering approach, and routine regression) for investigating these moderated effects using primarily standard software. The approach is illustrated using BASICS-Mobile, a smartphone-based intervention designed to curb heavy drinking and smoking among college students. This is joint work with Audrey Boruvka, Katie Witkiewitz and Susan A. Murphy.
"On the Design of a Stratified Biomarker Trial in the Presence of Measurement Error”
Susan Halabi, PhD Professor Department of Biostatistics & Bioinformatics Duke University School of Medicine
Absract: Identifying patients who benefit from a specific treatment based on their molecular history is one of the most important topics in clinical oncology. The stratified biomarker design has become a popular design due to its ability to address various questions. The stratified biomarker design has also been used to test for a treatment-biomarker interaction in predicting an outcome. Many biomarkers, however, are tissue-based, and hence are heterogeneous. Thus, biomarker levels are measured with error and would have an adverse impact on the power of a stratified biomarker clinical trial. We investigate analytically and numerically the impact of biomarker misclassification on the coverage of the confidence intervals and the power for testing biomarker-treatment arm interaction. We propose sample size formulae for both binary and time-to-event endpoints subject to censoring, and apply the proposed sample size formulae to the design of a renal cancer trial.
“Bridging Health Disparities Gaps through the Use of Medical Legal Partnership"
Omar Martinez, JD, MPH, MS
Assistant Professor, School of Social Work
College of Public Health Temple University
Dissertation Defense “The COURAGE Trial: A Phase II Randomized Clinical Trial to Evaluate the Dose-Response Effects of Exercise on Prognostic Biomarkers Among Colon Cancer Survivors" Justin Brown Division of Epidemiology Graduate Group in Epidemiology and Biostatistics
Abstract: The high rate of recurrent disease is a critical barrier to promote the health and longevity of colon cancer survivors. Observational epidemiologic data suggest physical activity after diagnosis of colon cancer reduces the risk of cancer recurrence, cancer specific mortality, and all-cause mortality. However, the biologic mechanisms that mediate the relationship between physical activity and disease outcomes among colon cancer survivors have not been characterized. Excess visceral adipose tissue and hyperinsulinemia promote the growth and progression of existing micro-metastases and the development of new distant metastases. Exercise reduces visceral adipose tissue and hyperinsulinemia among non-diabetic persons with obesity. However, it is unknown if exercise alters visceral adipose tissue and hyperinsulinemia among colon cancer survivors. To address this knowledge gap, we conducted a phase II, randomized, six-month, dose-response exercise trial that compared 150 min/wk or 300 min/wk of moderate-intensity aerobic exercise to a usual care control group among 39 colon cancer survivors. We examine the efficacy of exercise to reduce visceral adipose tissue and fasting insulin. To understand the generalizability of this randomized trial we examine demographic, clinical, and geographic characteristics of trial participants as compared to the state cancer registry population from which they were recruited. The findings from this trial will inform key design aspects for a phase III randomized controlled trial to examine the efficacy of exercise to reduce the risk of recurrent disease and death among colon cancer survivors.
Dissertation Advisor: Kathryn H. Schmitz, MPH Committee Chair: Andrea B. Troxel, ScD Committee: Nevena Damjanov, MD, Michael R. Rickels, MD, MS, Babette S. Zemel, PhD
"Medication Therapy Weak Spots and the Risk of Falling in the Nursing Home Setting"
Richard D Boyce, PhD
Assistant Professor of Biomedical Informatics Faculty, Center for
Pharmaceutical Policy and Prescribing Faculty, Geriatric
Pharmaceutical Outcomes and Gero-Informatics Research and Training Program
University of Pittsburgh
Dissertation Defense "Doubly robust causal inference with complex parameters" Edward H. Kennedy, MS Division of Biostatistics Graduate Group in Epidemiology and Biostatistics
Abstract: Semiparametric doubly robust methods for causal inference help protect against bias due to model misspecification, while also reducing sensitivity to the curse of dimensionality (e.g., when high-dimensional covariate adjustment is necessary). However, doubly robust methods have not yet been developed in numerous important settings. In particular, standard semiparametric theory mostly only considers independent and identically distributed samples and smooth parameters that can be estimated at classical root-n rates. In this dissertation we extend this theory and develop novel methodology for three settings outside these bounds: (1) matched cohort studies, (2) nonparametric dose-response estimation, and (3) complex high-dimensional effects with continuous instrumental variables. In Chapter 1 we show that, for matched cohort studies, efficient and doubly robust estimators of effects on the treated are computationally equivalent to standard estimators that ignore the non-standard sampling. We also show that matched cohort studies are often more efficient than random sampling for estimating effects on the treated, and derive the optimal number of matches for given matching variables. We apply our methods in a study of the effect of hysterectomy on the risk of cardiovascular disease. In Chapter 2 we develop a novel approach for causal dose-response curve estimation that is doubly robust without requiring any parametric assumptions, and which naturally incorporates general off-the-shelf machine learning. We derive asymptotic properties for a kernel-based version of our approach and propose a data-driven method for bandwidth selection. The methods are used to study the effect of hospital nurse staffing on excess readmissions penalties. In Chapter 3 we develop novel estimators of the local instrumental variable curve, which represents the treatment effect among compliers who would take treatment when the instrument passes some threshold. Our methods do not require parametric assumptions, allow for flexible data-adaptive estimation of effect modification, and are doubly robust. We derive asymptotic properties under weak conditions, and use the methods to study infant mortality effects of neonatal intensive care units with high versus low technical capacity, using travel time as an instrument.
Dissertation Advisor: Dylan Small, PhD, Marshall Joffe, MD, MPH, PhD Committee Chair: Russell Taki Shinohara, PhD Committee: Harold Feldman, MD, MSCE, Zongming Ma, PhD
Dissertation Defense "Sparse Simultaneous Signal Detection with Applications in Complex Disease GWAS" Julie Kobie, MS Division of Biostatistics Graduate Group in Epidemiology and Biostatistics
Abstract: Studying complex diseases, such as autoimmune diseases, can lead to the detection of pleiotropic loci with otherwise small effects. Through the detection of pleiotropic loci, the genetic architecture of these complex diseases can be better defined, allowing for subsequent improvements in their treatment and prevention efforts. Here, I investigate the genetic relatedness of complex diseases through the detection and quantification of simultaneous disease-associated genetic variants. I propose two max-type statistics, with and without an added level of dependency on the directions of the genetic effects, that globally test whether a pair of complex diseases shares at least one disease-associated genetic variant. The proposed global tests are based on the simultaneity of complex disease-associated genetic variants, allowing for the determination of exact p-values from a permutation distribution assuming independence. While an independence assumption is often imposed on genetic variants, I propose a perturbation procedure for evaluating the statistical significance of one of the proposed global tests, preserving the inherent dependency structure among genetic variants. I extend that global test beyond the detection of genetic relatedness at identical genetic variants, to the detection of genetic relatedness within dependency-defined windows across the genome. With the proposed methods, I identify pairs of pediatric autoimmune diseases that exhibit evidence of genetic sharing, such as Crohn’s disease and ulcerative colitis.
This presentation is sponsored by the Institute for Biomedical Informatics (IBI), for potential recruitment to the DBE and IBI.
Arjun Krishnan, Ph.D.
Associate Research Scholar
Lewis-Sigler Institute for Integrative Genomics
“Bringing genomic data into focus for understanding multicellular function and disease”
Our body has more than 200 types of cells and tissues, each performing a highly specialized function. This diversity emerges from how the ~25,000 genes in our genome interact in distinct ways in different tissues/cell-types. Deciphering these tissue-specific gene networks is experimentally intractable, and yet, fundamental to our understanding of gene functions and disease-gene associations.
In this talk, I will describe a Bayesian framework that we recently developed that integrates thousands of genomic datasets to predict tissue-specific relationships between genes in each of 144 specific human cell-types and tissues (available at giant.princeton.edu). I will show examples to illustrate how the resulting networks predict tissue-specific molecular response to perturbation, and the changing roles of multifunctional genes.
I will use autism spectrum disorder (ASD) to further elaborate on how tissue-networks are valuable in generating hypotheses about the molecular basis of human diseases. Using an evidence-weighted machine learning approach that utilizes the human brain-specific functional gene network, we have produced the first genome-wide prediction of autism-associated genes. We have further established how the large set of ASD genes, including a host of novel candidates, converges on a smaller number of key cellular pathways and specific early developmental stages of the brain (available at asd.princeton.edu).
Manifesting in early development and being five times more common among boys than among girls, ASD is one among several diseases whose incidence/risk varies dramatically across the human lifespan and between the sexes. I will conclude by broadly laying out my future goals in expanding our genomics toolkit to address how genes and their interactions shape health, disease, and therapy in an age-, sex-, and tissue-specific manner.
“The Spread of Health Behaviors in Social Networks"
Damon Centola, PhD
Annenberg School for Communications
University of Pennsylvania
Dissertation Defense "Sudden Cardiac Arrest: Novel Uses of Risk Standardization and Post-arrest Body Temperature to Improve Outcomes" Anne Grossestreuer Division of Epidemiology Graduate Group in Epidemiology and Biostatistics
Abstract: Sudden cardiac arrest is a leading cause of death and disability in the US, with over 500,000 events annually and <20% surviving to hospital discharge. Half of survivors suffer some degree of neurologic disability from massive ischemic injury and subsequent reperfusion processes. It therefore is vital to evaluate cardiac arrest at both population and clinical levels to improve outcomes. In response, this dissertation had three objectives. First, we examined whether hospital performance could be benchmarked using administrative data, which is more common than registry data. Two risk standardization models were developed using logistic regression involving 2453 patients treated from 2000-2015 at University of Pennsylvania Health System hospitals. Registry and administrative data were accessed for all patients and used to develop separate risk standardization models with survival to hospital discharge as the outcome and the registry model considered the “gold standard.” The administrative model had a receiver operating characteristic (ROC) area of 0.891 (95% CI: 0.876-0.905) compared to a registry area of 0.907 (95% CI: 0.895-0.919), indicating that risk standardization can be performed using administrative data. Second, serial temperatures were collected during the 72 hours following targeted temperature management (TTM) and rewarming on 465 TTM-treated patients from the Penn Alliance for Therapeutic Hypothermia (PATH) registry, of whom 179 (38.5%) had at least one pyrexic temperature (≥38oC). Higher maximum temperature was associated with worse neurologic outcome and lower survival in pyrexic patients. Pyrexia duration and outcomes were not related, unless duration was calculated as hours at or above 38.8oC; at those elevated temperatures, longer duration was associated with worse neurologic and survival outcomes. Third, serial temperatures were collected during the 72 hours post-arrest on 578 PATH patients not treated with TTM; 228 (39.5%) had at least one pyrexic temperature. Worse neurologic and survival outcomes were associated with increasing maximum temperature, the combination of higher maximum temperatures and longer durations at an elevated temperature, and timing of onset of pyrexia between 10.2-24.5 hours post-arrest. This work establishes the potential for using administrative data to create new opportunities to compare hospital performance regarding cardiac arrest and extends knowledge on clinical implications of post-arrest temperature on outcomes.
Dissertation Advisor: Benjamin Abella, MD, MPhil Committee Chair: Douglas Wiebe, PhD Committee: David Gaieski, MD, Michael Donnino, MD
Dissertation Defense "Statistical Methods for Human Microbiome Studies" Pixu Shi, MS Division of Biostatistics Graduate Group in Epidemiology and Biostatistics
Abstract: In human microbiome studies, sequencing reads data are often summarized as counts of bacterial taxa at various taxonomic levels. In this thesis, we investigate the relation between these counts and other variables. We first consider regression analysis with bacterial counts normalized into compositional data as covariates. In order to satisfy the subcompositional coherence of the results, linear models with a set of linear constraints on the regression coefficients are introduced. A penalized estimation procedure for estimating the regression coefficients and for selecting variables under the linear constraints is developed. A method is also proposed to obtain de-biased estimates of the regression coefficients that are asymptotically unbiased and have a joint asymptotic multivariate normal distribution. This provides valid confidence intervals of the regression coefficients and can be used to obtain the p-values. Simulation shows the validity of the confidence intervals and smaller variances of the de-biased estimates when the linear constraints are imposed. The proposed methods are applied to a gut microbiome data set and identify four bacterial genera that are associated with the body mass index after adjusting for the total fat and caloric intakes.
We then consider the problem of testing difference between two repeated measurements of microbiome data from the same subjects. Multiple measurements of microbiome from the same subject are often obtained to assess the difference in microbial composition across body sites or time points. Existing models for such count data are limited in modeling the covariance structure of the counts and in handling paired multinomial data. A new probability distribution is proposed for paired multinomial count data, which allows flexible covariance structure of the counts and can be used to model repeated measured multivariate counts. Based on this new distribution, a test statistic is developed for testing the difference in compositions of paired multivariate count data. The proposed test can be applied to count data observed on a taxonomic tree in order to test difference in microbiome compositions and to identify subtrees with different subcompositions. Simulation shows that the proposed test has correct type 1 errors and increased power compared to some commonly used methods.
Dissertation Advisor: Hongzhe Li, PhD Committee Chair: Nandita Mitra, PhD Committee: Frederic D. Bushman, PhD, Toni Cai, PhD, James Lewis, MD, MSCE
Dissertation Defense "Doubly Robust and Machine Learning Approaches for Economic Evaluation Using Observational Data" Jiaqi Li, MS Division od Biostatistics Graduate Group in Epidemiology and Biostatistics
Abstract: Policy makers are often interested in the economic evaluation of health care interventions in their decision making. However, proper cost effectiveness (CE) analysis is complicated by the need to account for unique features of cost data including informative censoring and distributional heterogeneity. In addition, medical costs are often collected from observational claims data which are susceptible to confounding.
We propose a doubly robust (DR) method based on propensity scores for estimating CE. This approach accounts for informative censoring and allows for the incorporation of cost history via inverse probability weighting and partitioning. We then investigate an ensemble machine learning approach to choose among popular cost models to estimate outcome parameters in the DR approach and to choose among various parametric and non-parametric propensity score models. We analytically demonstrate that this approach is unbiased. Our simulation studies confirm that the proposed DR approach performs well even under misspecification of either the PS model or the outcome model. We apply this approach to a cost-effectiveness analysis of two competing lung cancer surveillance procedures, CT versus chest X-ray, using SEER-Medicare data. Lastly, we explore Big Data tools and other machine learning algorithms that can be used for cost prediction.
Biostatistics in Practice II/MS Thesis Presentation “Estimating the predictive value of continuous markers for censored survival data using a likelihood ratio approach” Andrew Smith Advisors: Wei-Ting Hwang, PhD, John Christodouleas, MD, MPH
Biostatistics in Practice II/MS Thesis Presentation “Two-stage gatekeeping procedures for use in multi-arm trials” Leah Suttner Division of Biostatistics Graduate Group in Epidemiology and Biostatistics Advisors: Andrea B. Troxel, ScD and David A. Asch, MD, MBA
Biostatistics in Practice II/MS Thesis Presentations
1:30 p.m. - 2:00 p.m. “Phenotype Validation in Electronic Medical Health Records-Based Genetic Association Studies” Lu Wang Division of Biostatistics Graduate Group in Epidemiology and Biostatistics Advisors: Jinbo Chen, PhD and Jason H. Moore, PhD
2:00 p.m. - 2:30 p.m. “A Comparison of Nonparametric Two Sample Tests for the Human Gut Microbiome” Lu Wang Division of Biostatistics Graduate Group in Epidemiology and Biostatistics Advisors: Hongzhe Li, PhD and Frederic D. Bushman, PhD
2:00 p.m. - 2:30 p.m. "ERRBS analysis from aminocytes exposed to diabetes in pregnancy" Yan Che Division of Biostatistics Graduate Group in Epidemiology and Biostatistics Advisors: Mingyao Li, PhD and Sara E. Pinney, MD, MS
Biostatistics in Practice II/MS Thesis Presentation “Categorical predictors based on optimal cut points in regression modeling: Evaluation of Lymph node count and survival of melanoma patients” Sidan He Advisors: Phyllis A. Gimotty, PhD, Giorgos C. Karakousis, MD, FACS