We performed unsupervised cluster analysis using the TCGA-BRCA dataset to classify BRCA patients into two distinct subtypes based on Detoxification enzymes related-genes expression (Fig. 2A). Using PCA to visualize the data and it showed significant differences between the two distinct subtypes (Fig. 2B). We also performed a K-M survival analysis and found that BRCA patients in Cluster 1 had a higher survival rate than those in Cluster 2 (Fig. 2C). In addition, in order to explore the influence of detoxification enzymes on the mutational spectrum of breast cancer, we conducted mutation analysis for two subtypes. The mutation waterfall map showed that PIK3 CA, TP53, TTN and MUC16 were the most common mutated genes (mutation rate > 10%) in the two clusters (Fig. 2D,E), and the mutation rates of these genes were not significantly different in the two different subtypes. We then performed subtype differential analysis and found 25 differentially expressed genes in the two clusters (Fig. 2F). To further explore the biological functions of these differentially expressed genes (DEGs), we used KEGG, GO, and DO enrichment analyses (Fig. 2G-I). In the biological process, DEGs is mainly involved in the regulation of inflammatory response, leukocyte proliferation and immune response-activation signaling pathway. In terms of cell composition, DEGs was mainly concentrated in the outer part of the plasma membrane. In terms of molecular function, DEGs mainly affects peptidase regulator activity, heparin binding and sulfide binding (Fig. 2G). In terms of action pathway, these genes were mainly concentrated in the cell adhesion molecules and NF-κB signaling pathway (Fig. 2H). In addition, DO enrichment analysis showed that these genes were mainly enriched in primary immunodeficiency diseases (Fig. 2I).
Fig. 2Subtypes of BRCA and their characteristics. A Based on the expression of detoxification enzymes-related genes, the BRCA subtypes was constructed by NMF algorithm. B Data were visualized using PCA. C K-M survival analysis of OS for the two subtypes. D, E The mutation waterfall plots of two subtypes. F Differential expression profiling between two subtypes. G GO enrichment analysis of DEGs. H KEGG enrichment analysis of DEGs. I DO enrichment analysis of DEGs
3.2 Construction and verification of prediction modelIn previous studies, we have successfully identified two breast cancer subtypes associated with detoxification enzymes-related genes, and the next goal is to model breast cancer prognosis associated with detoxification enzymes-related genes. We obtained BRCA patient data from the TCGA database and randomly divided 1069 BRCA patient data into a training cohort and an internal validation cohort at a ratio of 7:3. We first used univariate Cox regression to analyze the train cohort (N = 749) and identified 39 detoxification enzymes-related genes associated with breast cancer prognosis. Through Lasso regression and multivariate Cox regression analysis, we further screened 14 genes for the construction of prognostic models and used these genes to establish risk score formula (Fig. 3A–C). The calculation formula is: Risk Score = CYP27 A1* (− 0.153773043733361) + IFNG* (− 1.13029577293788) + SERPINA1* (− 0.130967963213151) + ALDH1 A1* (− 0.160501405610823) + CPT1 A* (0.25451256806553) + PLAT* (− 0.229065565958707) + GSTM2* (0.290013288958423) + ATP6 AP1* (0.387236315911989) + SLC35 A2* (0.595324950705634) + GSTM4* (− 0.398966850228594) + NFKBIA* (− 0.37157076126092) + POMGNT2* (− 0.406833583296618) + NT5E* (0.347167101015933) + SQLE* (− 0.274411899158172). Each patient's risk score was calculated according to the formula, and they were divided into low-risk and high-risk groups according to the median risk score value. As can be seen from the risk score distribution map, the survival scatter plot and the K-M curve both survival probability and survival time of the low-risk group were higher than those of the high-risk group in the four cohorts (Fig. 3D-F). Notably, in the training cohort (N = 749), overall survival was significantly higher in the low-risk group (N = 374) than in the high-risk group (N = 375). To evaluate the predictive performance of the prognostic model, we performed a time-dependent ROC curve analysis and obtained encouraging results in the TCGA train cohort with auc of 0.695 (1 year), 0.777 (3 years), and 0.752 (5 years), respectively (Fig. 3G). Good results were also observed in the TCGA test cohort, with auc of 0.632 (1 year), 0.637 (3 years), and 0.605 (5 years), respectively. In addition, these good results were confirmed in the TCGA all cohort and in the GEO-GSE20685 cohort, indicating that the detoxification enzymes-related genes-based breast cancer prognosis model has a good predictive performance.
Fig. 3Construction and validation of prognostic models. A, B Lasso regression screened the genes of the prognostic model. C Prognostic genes screened by multivariate COX regression analysis. D The risk score distribution plot of four cohorts. E The survival status diagram of four cohorts. F K-M survival analysis. G Time-related ROC curve analysis
3.3 DNA promoter methylation of core genesPromoter DNA methylation affects transcriptional inhibition and tumorigenesis [36], so we explored the methylation values of core genes in this model in both normal and BRCA tissues. The results showed that promoter methylation levels of ALDH1 A1, ATP6 AP1, IFNG, NFKBIA, SERPINA1 and SLC35 A2 were significantly reduced in tumor tissues. The promoter methylation levels of CPT1 A, SQLE, GSTM2, GSTM4, CYP27 A1, PLAT and NT5E were significantly increased in tumor tissues (Fig. 4).
Fig. 4Promoter methylation of core genes. Boxplots visualized the methylation levels of ALDH1 A1 (A), ATP6 AP1 (B), CPT1 A (C), SQLE (D), GSTM2 (E), GSTM4 (F), IFNG (G), NFKBIA (H), NT5E (I), PLAT (J), CYP27 A1 (K), SERPINA1 (L) and SLC35 A2 (M) in normal and BRCA tissues
3.4 GSVA, GSEA analysis of high and low risk groupsTo investigate the biological differences between the high-risk and low-risk groups identified by the model, we conducted Gene Set Variation Analysis (GSVA) and Gene Set Enrichment Analysis (GSEA) using all datasets. GSVA results showed significant differences in biological processes between high-risk and low-risk groups. Signaling pathways were abundant in the low-risk group, including cytosolic DNA sensing pathway, apoptosis, B cell receptor signaling pathway, T cell receptor signaling pathway and cytokine-cytokine receptor interaction (Fig. 5A). GSEA results showed that cell cycle, homologous recombination and oocyte meiosis pathway were enriched in the high-risk group (Fig. 5B), while chemokine signaling pathway, cytokine-cytokine receptor interaction, and T cell receptor signaling pathways were significantly enriched in the low-risk group (Fig. 5C). These results suggest that the link with immunity may be stronger in the low-risk group.
Fig. 5GSEA and GSVA analysis of two groups. A GSVA heatmap of two groups. B GSEA enrichment analysis in high risk group. C GSEA enrichment analysis in low risk group
3.5 Immune landscape of high and low risk groupsTo further investigate the relationship between high and low risk groups and immunity, we looked at features such as tumor microenvironment (TME) and immune infiltration associated with the immune landscape. The results showed that compared with the high-risk group, the ESTIMATEScore, ImmuneScore and StromalScore of BRCA patients in the low-risk group were significantly higher than the high-risk group, while the TumorPurity result was opposite (Fig. 6A,B), indicating that the content of stromal cells and immune cells in the TME of the low-risk group was higher than that of tumor cells. In addition, we investigated the relationship between different risk groups and immune cell infiltration. We found that the infiltration level of naive B cells, plasma cells, CD8 T cells, activated NK cells, resting dendritic cells and activated dendritic cells was high in the low-risk group, while the infiltration level of M0 macrophages and M2 macrophages was completely opposite. We also found that the infiltration levels of naive B cells, memory B cells, resting dendritic cells, plasma cells, activated NK cells, CD4 activated memory T cells, CD8 T cells, and γδ T cells were negatively correlated with the risk score (Fig. 6C). These results suggest that the risk score of the prognostic model is closely related to immune cells, and the lower the risk score, the higher the infiltration of stromal cells and immune cells in TME.
Fig. 6Immune landscape analysis of TME and immune infiltration. A Heat map showing the overall immune landscape in the two risk group. B Differential analysis of TME between two risk groups. C Differential analysis of immune infiltration cells between two risk groups and correlation analysis between risk score and immune infiltration cells
3.6 Relationship between prognostic model and immunotherapyTo further evaluate the potential of a prognostic model for predicting immunotherapy response in BRCA patients, we analyzed the expression of immune checkpoint genes between high-risk and low-risk groups. We found that immune checkpoint gene expression was significantly higher in the low-risk group (Fig. 7A), suggesting that patients in the low-risk group may be more sensitive to immune checkpoint inhibitors. We then used IPS as indicators to evaluate the effectiveness of immune checkpoint inhibitors, and further evaluated the potential clinical efficacy of immune checkpoint inhibitors in different risk populations. The violin plot results showed that the IPS score of the low-risk group was higher than that of the high-risk group (Fig. 7B). These results showed that patients in the low-risk group responded better to immunotherapy than those in the high-risk group. In addition, we also analyzed the relationship between the prognostic model and the commonly used chemotherapy drugs for breast cancer treatment, using the IC50 value of chemotherapy drugs as the evaluation index. The study results showed that the IC50 values of vinorelbine, cisplatin, doxorubicin, paclitaxel, gemcitabine and embelin in the low-risk group were lower than the high-risk group (Fig. 7C). This suggests that the low-risk group is more sensitive to these commonly used chemotherapy agents, and using these agents may be better for patients in the low-risk group, with a lower likelihood of resistance. Our findings suggest that the low-risk group may respond better to both immunotherapy and chemotherapy, which could have important clinical implications.
Fig. 7Drug-therapy Prediction. A Expression of immune checkpoint genes in high and low risk groups. B IPS in high and low risk groups. C Drug sensitivity tests of vinorelbine, cisplatin, doxorubicin, paclitaxel, gemcitabine and embelin
3.7 Establishment and verification of nomogramTo investigate the independent predictive power of riskscore as prognostic factors, univariate and multifactor Cox regression analyses were performed. The results showed that riskscore, age, gender, and TNM stage of breast cancer patients were significantly correlated with prognosis, and riskscore and age could be used as independent prognostic factors of breast cancer patients (Fig. 8A,B). We then combined clinicopathological features (including gender, TNM stage, and age) and riskscore to establish nomogram to predict survival in BRCA patients. The riskscore and clinical variables were combined to calculate the total score for each patient, and the higher the score, the lower the survival rate (Fig. 8C). In addition, we evaluated the accuracy of the nomogram by using the calibration curve and the area under the ROC curve. The calibration curves for 1-year, 3-year, and 5-year OS prediction showed that the predictive nomogram performed well (Fig. 8D). The 1-year, 3-year, and 5-year nomogram auc of the TCGA cohort were 0.847, 0.776 and 0.773, respectively which were significantly higher than the other parameters (Fig. 8E). This result suggests that the nomogram had better predictive accuracy than other clinical features and the original risk score.
Fig. 8Nomogram for predicting OS. A Univariate Cox regression analysis was used to analyze the predictive ability of age, TNM stage and riskscore. B Multivariate Cox regression analysis was used to analyze the predictive ability of age, TNM stage and riskscore. C The predictive nomogram combining clinicopathological features and risk score. When the total point was 241, the 1-year, 3-year and 5-year survival rates were 0.993, 0.961 and 0.923, respectively. D Calibration curve of the predictive nomogram. E ROC curves for 1, 3 and 5 years of the nomogram
3.8 Web toolWe have successfully integrated the prognostic model into a web-based application featuring customized algorithms designed to predict overall survival (OS) and drug therapy responses in BRCA patients based on personalized characteristics (Fig. 9) (http://wys.helyly.top/zjd/cox.html). Initially, the expression levels of detoxification enzymes-related genes are input into the application to generate a risk prediction for BRCA patients. This prediction is then combined with clinical variables, including the patient's age, gender, and TNM stage, to forecast the patient's drug therapy response and OS at 1, 3, and 5 years, which are displayed on the web interface. The tool is designed to be intuitive and easy to use, offering valuable support for individualized prognosis prediction in BRCA patients.
Fig.9Webpage interface of the prediction tool. Allow users to input patient-specific data, including gene expression related to detoxification enzymes, as well as clinical variables such as age, gender, and TNM staging, through the left panel. In turn, the system generates predicted outcomes on the right panel, providing estimates for the patient's response to chemotherapy and immunotherapy, along with the 1-year, 3-year, and 5-year overall survival (OS) probabilities
3.9 Cellular constitution of breast cancer and developmental trajectories of epithelial cells (cancer cells)In this study, we analyzed eight single-cell samples from breast tumors in the GSE161529 dataset and grouped them into 23 distinct clusters (Fig. 10A,B). According to the different marker genes expressed by each cluster (Fig. 10C,D), we manually annotated these clusters into seven different cell populations, including B cells, endothelial cells, epithelial cells (cancer cells), fibroblasts, macrophages, monocytes and T cells (Fig. 10E). Subsequently, we analyzed the expression of 14 prognostic genes in breast cancer epithelial cells (cancer cells) at the single-cell level (supplementary Fig. 1) and found that SQLE was specifically highly expressed in the epithelial cells (cancer cells) of breast cancer (Fig. 10F,G). This observation highlights the important role of SQLE in tumor cells, suggesting that it may be involved in specific biological processes within the epithelial cells or the tumor microenvironment. In addition, we mapped the developmental trajectories of breast cancer epithelial cells through single-cell sequencing analysis and examined the expression patterns of SQLE in these trajectories (Fig. 10H,I). The result showed that the expression of SQLE in breast cancer varied with the stage of epithelial cell development (Fig. 10J). Immunohistochemical staining images showed that SQLE protein levels were higher in breast cancer tumor tissue than in normal tissue (Fig. 10K,L). This suggests that significant upregulation of SQLE in breast cancer may influence tumorigenesis and cancer progression.
Fig.10Single-cell atlas of breast cancer and immunohistochemical images. A t-SNE plot of eight sample sets. B t-SNE plot of 23 clusters. C The bubble map showed the expression of marker genes in 23 different cell clusters. D The heat map shows the 10 most highly expressed genes in each cell cluster. E t-SNE plot of annotating 7 distinct cell types in single-cell RNA sequencing. F t-SNE plot of SQLE expression in different cell populations. G Violin plot shows the expression of SQLE in different cell populations. H Different developmental stages of breast cancer epithelial cells. I Pseudo-temporal trajectory analysis of breast cancer epithelial cells. J Dynamic expression of SQLE along pseudo-time. K Expression of SQLE in normal tissues. L Expression of SQLE in breast cancer, Staining: High; Intensity: strong; Quantity: > 75%
3.9.1 Interaction between SQLE + epithelial cells (cancer cells) and exhausted CD8 + T cellsIn this study, we analyzed the interactions between different cells in breast cancer and found that both the number and intensity of the interaction between epithelial cells (cancer cells) and T cells were stronger (Fig. 11A,B). Subsequently, We divided epithelial cells (cancer cells) into SQLE + and SQLE − epithelial cells and analyzed their interactions with different types of T cells (Fig. 11C,D). We found that compared with SQLE-epithelial cells, SQLE + epithelial cells had a stronger effect on different types of T cells, especially exhausted CD8 + T cells (Fig. 11E,F). Subsequently, we investigated the pathways by which the two interact and found that in breast cancer, SQLE + epithelial cells affect exhausted CD8 + T cells mainly through the MIF signaling pathway (Fig. 11G,H). The result suggests that SQLE may affect T cells primarily by affecting exhausted CD8 + T cells, further influencing cancer progression.
Fig. 11Analysis of intercellular communication. A The number of interactions in intercellular communication networks between 7 different cell type. B The interaction weights in intercellular communication networks between 7 different cell types. C The number of interactions in intercellular communication networks between SQLE +, SQLE-epithelial cells and different cell subtypes. D The interaction weights in intercellular communication networks between SQLE +, SQLE-epithelial cells and different cell subtypes. E The number of interactions in intercellular communication networks between SQLE +, SQLE-epithelial cells and different T-cell subtypes. F The interaction weights in intercellular communication networks between SQLE +, SQLE − epithelial cells and different T-cell subtypes. G MIF signaling pathway network of SQLE +, SQLE − epithelial cells and different T-cell subtypes. H Heatmap depicting ligand-receptor interaction patterns in the MIF signaling pathway between SQLE +, SQLE − epithelial cells and different T-cell subtypes
Comments (0)