|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sandra Griffith
PhD Candidate
Division of Biostatistics
Department of Biostatistics and Epidemiology
Dissertation Advisor: Daniel F. Heitjan, PhD
Committee Chair: Jason A. Roy, PhD
Committee Members: Kevin G. Lynch, PhD and Robert A. Schnoll, PhD
Abstract: Measures of daily cigarette consumption, like many self-reported numerical data, exhibit a form of measurement error termed heaping. This occurs when quantities are reported with varying levels of precision, often in the form of round numbers. As heaping can introduce substantial bias to estimates, conclusions drawn from data subject to heaping are suspect. Because more precise measurements are seldom available, methods to estimate the true underlying distribution from heaped data depend on unverifiable assumptions about the heaping mechanism. A doubly-coded dataset with both a conventional retrospective recall measurement (timeline follow back) and an instantaneous measurement not subject to heaping (ecological momentary assessment), motivates this dissertation and allows us to model the heaping mechanism.
We take three approaches to this problem. First, we develop a nonparametric method that involves the estimation of heaping probabilities directly, where possible, and calculating others by smoothing, interpolation and subtraction. Next, we use the motivating data as a calibration data set, allowing us to create a predictive model for imputation. We apply this model to multiply impute precise cigarette counts for data from a randomized, placebo-controlled trial of bupropion where only heaped cigarette counts are available. Finally, we build on findings from the first two approaches to develop a more flexible model, which forgoes the restrictive rounding framework of previous models. Rather than assuming subjects will round off when providing self-reported counts, we posit that numbers possess an intrinsic gravity that tends to attract subjects to characteristically round numerals. We outline procedures for parameterizing and estimating such a model and apply it to the motivating data. Our findings suggest that the self-reporting process is more complex than the mechanism assumed in conventional rounding-based models. While we apply these models exclusively to smoking cessation data, they have wide applicability to many types of self-reported count data.