"Causal and Statistical Inference with Social Network Data: Massive Challenges and Meager Progress”
Elizabeth Ogburn, PhD Assistant Professor Department of Biostatistics Johns Hopkins University
Abstract: Interest in and availability of social network data has led to increasing attempts to make causal and statistical inferences using data collected from subjects linked by social network ties. But inference about all kinds of estimands, from simple sample means to complicated causal peer effects, is challenging when only a single network of non-independent observations is available. There is a dearth of principled methods for dealing with the dependence that such observations can manifest. We demonstrate the dangerously anticonservative inference that can result from a failure to account for network dependence, explain why results on spatial-temporal dependence are not immediately applicable to this new setting, and describe a few different avenues towards valid statistical and causal inference using social network data.
"Scalable Bayesian Variable Selection for Structured High-Dimensional Data via Adaptive Shrinkage Priors”
Qi Long, PhD Associate Professor Department of Biostatistics and Bioinformatics Emory University
Abstract: Variable selection for structured covariates lying on an underlying known graph is a well-studied problem which has received considerable attention, with the primary focus being on discrete mixture approaches. However, most of the existing approaches are not scalable to high dimensional settings, for example, in genomic studies involving tens of thousands of genes lying on known pathways. We propose a Bayesian shrinkage approach which incorporates prior information through shrinkage parameters, with the coefficients for two connected variables in the graph being encouraged to have a similar degree of shrinkage. We fit our model via a computationally efficient EM algorithm which is scalable to high dimensional settings (p~100000). We establish theoretical properties for fixed as well as increasing dimensions, even when the number of variables increases faster than the sample size. We demonstrate the advantages of our approach via a simulation study, and apply the method to a real data example. This is a joint work with Changgee Chang and Suprateek Kundu.
"Young Driver Safety: Graduated Driver Licensing, Distracted Driving and Beyond"
Motao Zhu, MD, PhD, Department of Epidemiology
School of Public Health, West Virginia University
"New Methods for Analyzing Data from 16s rRNA Microbiome Studies”
Glen Satten, PhD
Center for Disease Control and Prevention
"Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity"
Veronika Ročková, PhD Postdoctoral Research Associate Department of Statistics The Wharton School
Abstract: Rotational post-hoc transformations have traditionally played a key role in enhancing the interpretability of factor analysis. Regularization methods also serve to achieve this goal by prioritizing sparse loading matrices. In this work, we bridge these two paradigms with a unifying Bayesian framework. Our approach deploys intermediate factor rotations throughout the learning process, greatly enhancing the effectiveness of sparsity inducing priors. These automatic rotations to sparsity are embedded within a PXL-EM algorithm, a Bayesian variant of parameter-expanded EM for posterior mode detection. By iterating between soft-thresholding of small factor loadings and transformations of the factor basis, we obtain (a) dramatic accelerations, (b) robustness against poor initializations and (c) better oriented sparse solutions. To avoid the pre-specification of the factor cardinality, we extend the loading matrix to have infinitely more columns with the Indian Buffet Process (IBP) prior. The factor dimensionality is learned from the posterior, which is shown to concentrate on sparse matrices. Our deployment of PXL-EM performs a dynamic posterior exploration, outputting a solution path indexed by a sequence of spike-and-slab priors. For accurate recovery of the factor loadings, we deploy the Spike-and-Slab LASSO prior, a two-component refinement of the Laplace prior (Rockova 2015). A companion criterion, motivated as an integral lower bound, is provided to effectively select the best recovery. The potential of the proposed procedure is demonstrated on both simulated and real high-dimensional gene expression data, which would render posterior simulation impractical.