Survival outcome prediction of esophageal squamous cell carcinoma patients based on radiomics and mutation signature

Clinical characteristics of patients

A total of 205 patients (140 men, 65 women) with a mean age of 60.3 years ± 7.7 years were included. Stratified random sampling was used and the patients were divided into two groups, the training cohort (109 men, 44 women) and the test cohort (31 men, 21 women), at a ratio of 7:3. The clinical characteristics and statistics of the training and test cohorts are summarized in Table 1. There were no significant differences in age, gender, tumor location, drinking history, smoking history, genetic alterations, depth of invasion, TNM stage, or lymph node metastasis between the training and test cohorts (p > 0.05), which justified their use as training and test cohorts. Moreover, according to the Kaplan–Meier curve shown in Fig. 2, the patients were divided into three groups according to TNM stage, and the group with a higher TNM stage had a worse prognosis. The log-rank test showed that p < 0.05 was significantly related to the survival of the three groups. According to lymph node metastasis, the patients were divided into a metastasis group and non-metastasis group, and the metastasis group had a worse prognosis; moreover, the survival status of the two groups was significantly different according to the log-rank test. Therefore, the TNM stage and lymph node metastasis status are closely related to the prognosis of patients with ESCC.

Fig. 2figure 2

Kaplan–Meier curve for the TNM stage (a) and lymph node metastasis (b) groups

Selection of the radiomics features and construction of the Rad-score

A total of 842 features were extracted from the CT images. After screening by the Spearman correlation coefficient, 359 related features were obtained, and the pairwise correlation was less than 0.95 (Fig. S1a). According to univariate statistical tests (p < 0.05), we screened out 52 features. The method based on the LASSO regression algorithm was subsequently applied to the training cohort. With increasing lambda, the number of features gradually decreases. When lambda.min was 0.044, 17 features had nonzero coefficients (Fig. 3a and b). Finally, the 17 features were used to construct the multivariate Cox regression model, and stepwise regression analysis was adopted to screen for secondary features. The model ultimately included 8 features: first-order mean HHH, glcm idmn HHH, glcm cluster shade HLL, glcm correlation LHL, glrlm run entropy LLL, glszm large area high gray level emphasis LLL, glszm size zone nonuniformity normalized LLL and glszm gray level variance (Fig. S1b). The discriminative ability of the survival status based on the radiomics signatures was assessed by ROC analysis in two cohorts (Fig. 3c). The AUC of the radiomics model in the training cohort was 0.834 (95% CI, 0.767–0.900), and that in the test cohort was 0.733 (95% CI, 0.574–0.892).

Fig. 3figure 3

Construction of the radiomics model. a LASSO coefficient profiles of the 52 radiomics features. b Identification of the optimal penalization coefficient lambda (λ) in the LASSO model used tenfold cross-validation and the minimum criterion. As a result, a λ value of 0.044 was selected. c The ROC curve was used to assess the discriminative performance of the radiomics signature for survival status. The ROC in the training cohort was 0.834 (95% CI: 0.767–0.900); the ROC in the test cohort was 0.733 (95% CI: 0.574–0.892). Kaplan–Meier curve of risk grouping in the training cohort (d) and test cohort (e)

The patients were separated into high-risk and low-risk groups based on the median cutoff value of the Rad-score. Kaplan–Meier analysis revealed that the Rad-score was significantly associated with the prognosis of patients with ESCC, and the log-rank test showed that there were significant differences in survival between the high-risk group and the low-risk group (Fig. 3d and e). We found that the greater the Rad-score was, the worse the patient's prognosis.

We independently assessed the impact of each radiomics feature on the prognosis of patients with ESCC and found that all of the features had prognostic significance (Fig. 4). We found that the greater the GLCM cluster shade, GLSZM large area high gray level emphasis, and GLSZM gray level variance, the worse the patient's prognosis was, and the opposite was true for other radiomics features.

Fig. 4figure 4

Kaplan–Meier curve of the 8 radiomics features

Mutation signatures for prognosis

To better understand the contribution of these mutations to ESCC etiology, we investigated mutational signatures. Using a modified nonnegative matrix factorization (NMF) algorithm we identified 7 mutational signatures (S1-S7) in the 205-WGS cohort (Fig. S2a and 5a). In addition to S6 and S7, all the other signatures corresponded to mutation signatures in the Catalogue of Somatic Mutations in Cancer (COSMIC) database (Fig. 5b). S1 and S2 were related to APOBEC (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like) activity, S3 was related to spontaneous deamination of 5-methyicytisine, S4 was related to damage by reactive oxygen species, and S5 was related to aristolochic acid exposure (Table 2). We then quantified the relevant contributions of the seven mutation signatures to each patient, with all but S5 contributing strongly (Fig. S2b).

Fig. 5figure 5

Mutational processes in ESCC. a Seven mutation signatures detected in ESCC (S1 − S7). b Cosine similarity between the 79 cosmic signature (horizontal axis) and ESCC cohort 7 signatures. Kaplan–Meier curve of the S3 (c) and S6 (d). e Volcano plot indicating mutational rate differences (x-axis) for each gene (represented as dots), and significance (y-axis, negative-log scale). f Mutation status of 8 differentially mutated genes

Table 2 Relationships between the identified signature and COSMIC curated signatures

We correlated the proportions of patients with different mutational signatures and OS. Kaplan–Meier analysis revealed that S3 and S6 were significantly associated with the prognosis of patients with ESCC (Fig. 5c and d, log-rank test, p < 0.05). We found that the greater the proportion of S3 macrophages was, the worse the patient's prognosis was, and the opposite was true for S6.

We determined the differences in the genomic landscape between patients with high and low proportions of S6 and identified eight differentially mutated genes, TP53, MUC16, FAT1, LRP18, SI, USH2A, DMD and MDN1 (Fig. 5e and f). We separately analyzed the proportion of S6 in patients with or without these eight mutated genes (Fig. S3). A greater proportion of patients with MND1 mutations had S6, and the difference in the proportion of S6 among the other genes was reversed, which was consistent with the difference analysis results.

Construction and evaluation of the nomogram

The RM nomogram constructed by combining the Rad-score and mutation signature (S3 and S6) is shown in Fig. 6a. The AUC of the RM nomogram model in the training cohort was 0.830 (Fig. 6b, 95% CI, 0.761–0.899), and that in the test cohort was 0.793 (Fig. 6b, 95% CI, 0.653–0.934).

Fig. 6figure 6

a The RM nomogram combining the Rad-score and mutation signature (S3 and S6). b ROCs of the training cohort and test cohort. The AUC of the RM nomogram model in the training cohort was 0.830 (95% CI, 0.761–0.899), and that in the test cohort was 0.793 (95% CI, 0.653–0.934)

The RMC nomogram constructed by combining the Rad-score, mutation signature (S3 and S6), and clinical factors (TNM stage and lymph node metastasis) is shown in Fig. 7a. The calibration curve of the nomogram showed a good agreement between the prediction and observation results (Fig. 7b and c). The DCA results showed that the predictive model combining the Rad-score, mutation signature, and clinical factors had a greater net benefit than the single-factor predictive model (Fig. 7d). The AUC of the RMC nomogram model in the training cohort was 0.862 (Fig. 7e, 95% CI, 0.795–0.928), and that in the test cohort was 0.837 (Fig. 7e, 95% CI, 0.705–0.969).

Fig. 7figure 7

a The RMC nomogram combining the Rad-score, mutation signature (S3 and S6), and clinical factors (TNM stage and lymph node metastasis). b Calibration curve of the RMC nomogram for predicting 2-year OS. c Calibration curve of the RMC nomogram for predicting 3-year OS. d DCA of the RMC nomogram. The integrated nomogram model had better net benefits than did the traditional prediction model. e ROCs of the training cohort and test cohort. The AUC of the RMC nomogram model in the training cohort was 0.862 (95% CI, 0.795–0.928), and that in the test cohort was 0.837 (95% CI, 0.705–0.969)

Comments (0)

No login
gif