Unsupervised machine learning of radiomic features for predicting treatment response and overall survival of early stage non-small cell lung cancer patients treated with stereotactic body radiation therapy

Feature selection and dimensionality reduction are important techniques for alleviating the curse of dimensionality (small sample size and large feature dimensionality) and improve the prediction performance in pattern recognition studies. Most feature selection techniques are designed in a supervised setting to identify discriminative features by optimizing performance of prediction models based on validation datasets and are, therefore, prone to overfitting training data in small sample size studies. On the other hand, feature dimensionality reduction techniques, such as principal component analysis, learn a new feature representation to characterize original features in a lower dimension feature space in an unsupervised setting. However, the low-dimension representation is not necessarily informative for building prediction models, as no relevant guidance is utilized in both the feature extraction and feature dimensionality reduction. To narrow the gap between the supervised feature selection and unsupervised dimensionality reduction procedures, we introduce unsupervised two-way clustering analysis methods for reducing feature dimensionality and learning meta-features by simultaneously identifying sub-clusters of samples and features Particularly, the sub-clusters of the features capture covariations among high dimensional features to generate a low-dimension representation, and the sub-clusters of samples facilitate characterization of samples with different feature patterns and in turn serve as weak supervision that could lead to more informative feature dimensionality reduction for capturing differences of feature patterns between sub-clusters of samples. These methods have been successfully used in radiomics studies [1, 2] and are generally applicable to other imaging studies for identifying heterogenous subgroups and simultaneously improving performance in predictive modeling [9, 10].

predictive modeling

Prediction modeling with unsupervised learning of radiomic features, including radiomic feature extraction, unsupervised two-way clustering for meta-feature extraction, and clinical outcome analyses of patients at both group and individual subject levels.