CHIMERA: Clustering of Heterogeneous Disease Effects via Distribution Matching of Imaging Patterns

Many brain disorders and diseases exhibit heterogeneous symptoms and imaging characteristics, as shown in figure (A). This heterogeneity is typically not captured by commonly adopted neuroimaging analyses that seek only a main imaging pattern when two groups need to be differentiated (e.g., patients and controls, or clinical progressors and non-progressors). On the other hand, standard data-driven clustering methods may group patients according to the largest data variability, which are not induced by the disease. This proposed probabilistic clustering approach, CHIMERA, as illustrated in figure (B), models the pathological process by a combination of multiple regularized transformations from normal control population to the patient population, while controlling the similarity in covariates (e.g. age, gender, height). Therefore, it seeks to identify multiple imaging patterns that relate to disease effects and to better characterize disease heterogeneity.

Disease subtyping has been of increasing importance, as many diseases and disorders previously classified under single umbrella, are highly heterogeneous both biologically and clinically. HYDRA [7] and CHIMERA [8] use discriminative and generative machine learning methods, respectively, to determine multiple patterns by which healthy individuals differ from patients, thereby determining imaging signatures of different disease subtypes. HYDRA uses semi-supervised clustering along with SVM-like classification based on a convex polytope separating a control population from a heterogeneous diseased population:

heterogeneous diseased pop

A comparison group (gray), such as a group of healthy individuals, is separated from a heterogeneous target group, e.g. a patient population (red) via a convex polytope. 3 disease subtypes have been identified in this example.

CHIMERA uses a generative method that assumes that statistical distributions of patients are derived from those of controls under a number of transformations, which capture the (heterogeneous) pathologic processes:

pathologic processes

A control group (e.g. healthy individuals; blue) is mapped to a heterogeneous target population (yellow, e.g. patients) via a number of transformations Ti, constrained by underlying covariates, such as age, sex, scanner, etc).  The probability density function of the yellow group is then generated from the probability density function of the blue control group via these transformations. These transformations identify pathological processes of separate disease subtypes.

[[+hidden]]