A deep learning model to enhance the classification of primary bone tumors based on incomplete multimodal images in X-ray, CT, and MRI

In this study, we developed the PBTC-TransNet fusion model, which leverages incomplete multimodal images from X-ray, CT, and MRI, along with clinical characteristics, to accurately classify PBTs as benign, intermediate, or malignant. The model demonstrated good performance, achieving a micro-average AUC of 0.847 on the internal test set and a micro-average AUC of 0.782 on the external test set.

Accurate classification of PBTs is essential to ensure their effective treatment and management [3]. Previous studies have built single-modal models based on EfficientNet or Mask-RCNN-X101 to classify PBTs, achieving an AUC of 0.79 and an accuracy of 80.2% on the external test set, respectively [11, 12]. Our team proposed an ensemble multi-task framework that simultaneously detects, segments, and classifies PBTs and bone infections and subclassifies the benign, intermediate, and malignant PBTs on MRI [13]. While these approaches represent significant advancements, they are primarily limited to single-modal data, and their clinical applicability diminishes when faced with the common issue of incomplete multimodal images in real-world settings [11,12,13, 25]. In contrast, our current study addresses this critical gap by developing the PBTC-TransNet fusion model, which is specifically designed to handle incomplete multimodal images from X-ray, CT, and MRI, alongside clinical characteristics. This design enables our model to maintain robust performance even when certain imaging modalities are unavailable, thereby enhancing its clinical relevance and applicability in diverse healthcare environments. We selected EfficientNet, which is recognized as a state-of-the-art (SOTA) model for classification tasks, as the baseline model for quantitative comparison. This choice was made due to EfficientNet’s proven effectiveness and efficiency in medical image classification [12, 26, 27]. By comparing our PBTC-TransNet fusion model with this baseline, we demonstrated that our approach not only matches or exceeds the performance of these widely used models but also effectively handles cases where certain imaging modalities are missing. Machine learning techniques to discriminate bone lesions have achieved relatively good performance across various imaging modalities, with reported AUC values ranging from 0.73 to 0.96 in several cohort studies [28]. However, most studies that have developed machine learning classification models for bone tumors are preliminary and limited by small sample sizes and retrospective analyses [28]. To our knowledge, there are currently no studies specifically focused on bone tumor classification models that utilize multimodal fusion methods. Nevertheless, some studies highlight the promise of multimodal techniques, which can integrate data from different imaging modalities along with clinical information, offering significant potential for improving diagnostic accuracy and comprehensiveness [29, 30]. In this study, we developed the PBTC-TransNet fusion model to fully utilize patients’ incomplete multimodal images and clinical characteristics, which is expected to be applied to classify a broader PBT population and better fit in real clinical scenarios.

Strategies for handling the samples with missing modalities include direct discarding, data imputation techniques, and separate model training based on the available data for each modality [15, 31, 32]. Nevertheless, these strategies have limitations, such as ignoring valuable information, introducing unnecessary noise, or failing to exploit the correlations across multiple modalities, which potentially compromises classification performance [15, 16]. To effectively integrate information from different image modalities, we deployed the Transformer networks and Bernoulli indicators in our study. The Bernoulli indicators were used for every modality to simulate real-world scenarios where multimodal images might be incomplete [33]. This allowed our model to adapt to situations where certain imaging modalities are unavailable. The Transformer took advantage of the attention mechanism to foster the establishment of long-range dependencies both within and across distinct imaging modalities, facilitating the efficient amalgamation of information sourced from multiple modalities [17]. Moreover, to address missing clinical characteristics, we implemented an iterative imputation strategy and then integrated it into the PBTC-TransNet fusion model to simulate the actual diagnostic process of radiologists. These design strategies enhanced the model’s ability to account for incomplete clinical information, thereby improving its overall performance and clinical relevance. By effectively addressing missing clinical characteristics and mimicking real-world diagnostic scenarios, our model demonstrates promising potential for widespread adoption and generalizability in diverse clinical settings. The SHAP analysis highlights that age, pain, and overall location are crucial for distinguishing benign, intermediate, and malignant PBTs. These insights from the SHAP analysis suggest that focusing on these clinical characteristics in practice could enhance diagnostic accuracy and improve patient outcomes. Integrating such detailed impact analyses into predictive models can provide more robust and clinically relevant tools for radiologists.

Accurate diagnosis and timely treatment are crucial for patients with malignant PBTs to prevent their progressions and potentially life-threatening complications [1]. Our PBTC-TransNet fusion model demonstrated strong performance in identifying malignant PBTs, achieving high accuracies of 82.6% on the internal test set and 79.0% on the external test set. Visualization analysis revealed that our model effectively recognized characteristic imaging manifestations of malignant PBTs, such as tumor bone, soft‑tissue mass, and invasive periosteal reaction (Fig. 5c and f). Moreover, we observed that most patients with malignant PBTs have complete multimodal images on the internal (112 of 263 patients) and external (21 of 62 patients) test sets, which provided valuable information and clues for classification. Unfortunately, our model occasionally failed to diagnose tumors with atypical imaging appearances, such as those lacking evident bone destruction (Fig. 6d). Future research efforts should focus on enhancing the model’s ability to recognize such atypical imaging cases, thereby further improving classification accuracy and clinical utility.

Accurately and timely diagnosis of benign PBTs based on medical images is significant for avoiding unnecessary expensive and invasive examinations [34]. The PBTC-TransNet fusion model displayed satisfactory classification performance with an AUC of 0.827 for benign PBTs on the internal test set. Many benign PBTs with typical imaging manifestations are easily identified (Fig. 5b). For example, among the 195 osteochondromas included in our study, which typically present as bony protuberances with well-defined boundaries extending into soft tissue, the fusion model correctly identified 180 cases [1]. This highlights the model's ability to accurately recognize benign PBTs with typical imaging manifestations, thereby facilitating their timely diagnosis and appropriate management.

Classifying intermediate PBTs before surgery poses a significant challenge due to their potential to present both benign and malignant imaging features [1]. Previous studies have attempted binary classification models based on radiographs and MRI for differentiating benign and malignant bone lesions [11, 12], but these models are not sufficient for intermediate PBTs, as they cannot be simply categorized as benign PBTs and often demand subsequent interventions beyond those typically prescribed for benign lesions [1, 35]. Another study proposed a triple classification model that incorporated patient clinical characteristics and radiographs, achieving high accuracy (85.1%) in classifying intermediate PBTs [36]. In comparison, our PBTC-TransNet fusion model, which is based on incomplete multimodal imaging, showed comparable performance, with an accuracy of 83.3% on the internal test set. Remarkably, even when applied to the external test set, which mainly included patients who underwent only X-ray imaging (40.4%), limiting the available information for classification (Table S4), the PBTC-TransNet fusion model achieved an accuracy of 78.2%. This result underscores the robust generalization capabilities of the PBTC-TransNet fusion model across different datasets and imaging modalities.

A model that robustly deals with incomplete data subsets from various modalities demonstrates strong applicability in real-world clinical settings [17]. The model exhibited the best classification results (micro-average AUC: 0.909) within the patient subgroup solely reliant on X-rays on the internal test set, despite the inherent limitations of X-rays, such as superimpositions and soft tissue resolution [1]. This success was largely attributed to most of them (90 of 210 patients) having osteochondroma, which with typically identified imaging manifestations. However, the heterogeneous and rare PBTs are difficult to classify and often require further MRI examination in clinical practice [37]. For patient subgroups that underwent MRI examinations (MRI, X + MRI, CT + MRI, and X + CT + MRI) on the internal test set, the PBTC-TransNet fusion model gained slightly inferior classification performance. It was interpreted that more patients in these subgroups suffered difficult-to-classify rare PBTs, such as lymphoma and haemangioma (Table S4) [38, 39]. On the external test set, the stratification analysis results of the model were less significant, with wide 95% CIs for several patient subgroups (CT, X + MRI, and CT + MRI), primarily due to the limited sample size (Table 2). In the future, prospective multicenter studies with larger datasets are needed to validate the model’s classification performance in real-world clinical practice settings. Other imaging techniques can also provide additional information for the classification of bone tumors [7]. For example, Diffusion-weighted imaging has been proven to provide valuable information for characterizing benign and malignant musculoskeletal tumors [40]. Considering the inclusion of this imaging sequence in future studies may further improve the accuracy of tumor classification. The stratified analysis revealed that the PBTC-TransNet fusion model performed optimally in the 11–19 years age group and among female patients, while performance was comparatively lower in older age groups and among male patients. These findings underscore the importance of considering demographic factors in model development and highlight areas for future optimization.

Our study had several limitations. First, there may be potential selection bias because we only retrospectively studied histopathologically confirmed cases of PBTs, excluding clinically diagnosed cases. Second, despite the proven value of dynamic contrast-enhanced images in PBT diagnosis, we did not incorporate them into our model development. This decision was influenced by factors such as the variability in patient compliance with contrast-enhanced imaging, as well as concerns regarding risks associated with gadolinium deposition and patient anxiety [41,42,43]. Third, we did not perform visualization for our models, which might limit their clinical application due to the inherent “black-box” nature of DL techniques. Future work will focus on implementing visualization methods, such as Grad-CAM, to interpret and gain deeper insights into the predictions made by our model. In addition, we have yet to quantify the impact of different modality combinations on the model’s performance. Subsequent research will aim to assess the clinical benefits of our model across different modality combinations. Fourth, the external test set was relatively small, with only 262 patients, which may limit the robustness of the validation. While the model's consistent performance across datasets provides some reassurance, we plan to include larger and more diverse external datasets in future studies to further validate the model's generalizability. Finally, due to the retrospective nature and data limitations of our study, we were unable to include patient outcomes and follow-up data. Future prospective studies will address this by collecting detailed patient outcomes to better assess the long-term impact and clinical utility of the PBTC-TransNet model.

Comments (0)

No login
gif