GC is the fifth most common cancer and the third leading cause of cancer-related deaths. The complexity of gastric cancer lies in its heterogeneity and multifactorial origins, which requires us to comprehensively explore its underlying mechanisms to improve treatment outcomes.1 Infection with Helicobacter pylori is the most significant risk factor for the development of GC and has been classified as a Group I carcinogen by the World Health Organization.2 The prognosis for GC remains poor, with a 5-year survival rate of <30% in advanced stages, primarily due to late diagnosis and the lack of effective early screening methods. Current treatment options primarily include surgical resection, chemotherapy, and radiotherapy; however, these approaches often encounter limitations, such as resistance to chemotherapy and the inability to effectively target metastatic disease.3 Therefore, identifying novel prognostic genes is crucial for elucidating the underlying mechanisms of GC, improving patient management strategies, optimising personalised treatment, and developing targeted therapeutic agents. In this severe clinical context, migrasomes have attracted significant attention due to their unique functions.4
Table 1 The 11 Pathways Co-Enriched by Prognostic Genes
Migrasomes are a type of novel organelle formed during cell migration, structurally similar to extracellular vesicles, and closely related to the migration process. These structures can be generated in various cell types, including immune cells, metastasising tumour cells, specialised cells such as podocytes, and cells at developmental stages.5 Migrasomes are newly discovered extracellular vesicle-like organelles, with functions closely linked to tumour immunity and the tumour microenvironment. Although their specific roles in GC remain unclear, migrasome-related genes (MRGs) are abnormally expressed in several tumour types and are associated with GC prognosis. Extracellular vesicles are key mediators of local and distant material exchange and signaling between tumor cells and mesenchymal cells in the tumor microenvironment (TME).6 Many aspects of tumorigenesis and tumor-related pathology, including immune regulation, vascular permeability, and matrix remodeling, are influenced by extracellular vesicles.7 Due to their involvement in many disease processes, extracellular vesicles can serve as disease targets, and they are important for shaping the local immune status before and during tumor metastasis. As organelles with extracellular vesicle-like structures that have been newly discovered in recent years.8 Circulating tumor cells are expected to be used as biomarkers for predicting tumor progression, recurrence, and metastasis. Specific drugs developed to regulate the production and modulation of circulating tumor cells may also provide new research ideas for tumor treatment.
In multiple cancers, including GC, high MRG expression is significantly correlated with poor prognosis, suggesting that these genes may contribute to tumour progression and adversely affect patient survival. Research has also revealed a positive correlation between migrasomes and macrophage abundance in the tumour microenvironment, indicating their potential involvement in tumour immune evasion mechanisms. Furthermore, immune checkpoint (IC) genes have been significantly associated with migrasome scores, suggesting that migrasomes may influence the response to immunotherapy in patients with GC.6 Migrasomes also function as intercellular messengers, carrying signalling molecules that regulate the tumour microenvironment. They play key roles in tumour immune evasion, enabling cancer cells to escape immune recognition and destruction. Additionally, migrasomes promote the growth, migration, and infiltration of cancer cells by transferring tumour-related proteins and RNAs, thereby enhancing tumour development and contributing to poor clinical outcomes.9 Therefore, migrasomes may serve as potential biomarkers for predicting the progression, recurrence, and metastasis of GC.
Currently, there are many gaps in the research on migratory bodies, with most studies focusing on the basic formation mechanisms, such as TSPAN-dependent biogenesis, but there is insufficient research on specific cancers like gastric cancer.5 The prognostic role of migration-related genes in gastric cancer has not been fully studied.6 In addition, clinical application research is relatively lagging, lacking reliable diagnostic markers and therapeutic targets. This study systematically identified the prognostic value of metastasis-related genes in gastric cancer through the integration of multi-omics analysis, revealing their molecular mechanisms in promoting tumor progression by regulating epithelial-mesenchymal transition (EMT) and the immunosuppressive microenvironment. It provides new tools for precise diagnosis and treatment of gastric cancer, filling the gap in existing research regarding clinical translation. This study aimed to identify migrasome-related prognostic genes in GC via transcriptome data from public databases and bioinformatics methods. A prognostic risk model was constructed to predict survival in patients with GC, and further analyses were conducted to investigate the potential biological functions and molecular mechanisms of the selected prognostic genes, providing new insights for clinical prediction of prognosis and the development of personalised immunotherapy for patients with GC.
Materials and Methods Data SourceThe Cancer Genome Atlas - Stomach Adenocarcinoma (TCGA-STAD) dataset, comprising 350 GC tissue samples with survival and clinical data (GC group) and 31 paraneoplastic tissue samples (control group), was obtained from the University of California, Santa Cruz Xena database (https://xenabrowser.net) (accessed on 1 September 2024). Additionally, the GSE62254 (GPL570) dataset, which includes 300 GC tissue samples with survival and clinical information, was obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). A set of 35 MRGs was also obtained from the literature.10
Functional Pathway and Protein–Protein Interaction (PPI) Analyses of Candidate GenesTo identify differentially expressed genes (DEGs) in the TCGA-STAD dataset, the DESeq2 package (v1.42.0)11 was used to perform differential expression analysis between the GC and control groups using the threshold |log2fold change (FC)| > 0.5 and P < 0.05.12 Twenty DEGs (10 upregulated and 10 downregulated) with the highest |log2FC| values were highlighted in a volcano plot generated using the ggplot2 package (v3.5.1).13 Their expression patterns were visualised in a heatmap created with the ComplexHeatmap (v2.10.8).14 Differential MRGs were obtained using the ggvenn package (v0.1.0)15 by intersecting DEGs with MRGs, and the resulting genes were designated as candidate genes.
The biological roles of these candidate genes were explored using the clusterProfiler package (v4.10.1).16 Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted, applying a significance threshold of P < 0.05. For GO analysis, the top five terms from each of the three categories, biological process (BP), cellular component (CC), and molecular function (MF), with the highest gene counts were visualised. Similarly, the top five KEGG pathways by gene count were illustrated.
To investigate protein-level interactions among the candidate genes, a PPI network was constructed using the STRING database (https://cn.string-db.org/) with a confidence score > 0.4. The resulting network was visualised using Cytoscape software (v3.9.1).17
Expression and Correlation Analyses of Prognostic GenesTo assess the potential value of candidate genes in GC, univariate Cox regression analysis was performed within the TCGA-STAD dataset using the survival package (v3.7.0)18 Candidate genes were analysed with thresholds of hazard ratio (HR) ≠ 1 and P < 0.2 and tested for compliance with the proportional hazards (PH) assumption (P > 0.05). Genes meeting these criteria were visualised using the forestplot package (v2.0.1)19 and defined as candidate prognostic genes. These genes were then subjected to machine learning-based selection. Specifically, the glmnet package (v4.1–4)19 was used to perform least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation, resulting in the identification of prognostic genes. To verify their expression levels in the TCGA-STAD dataset, differences across groups were explored via the Wilcoxon rank-sum test (P < 0.05). Furthermore, among the GC samples, Spearman correlation coefficients were calculated for the prognostic genes using the psych package (v2.4.3)20 (|cor| > 0.3, P < 0.05), and the corrplot package (v0.92)21 was used to illustrate their relationships.
Risk Model Construction and ValidationA risk model was constructed using prognostic genes within the TCGA-STAD dataset using the following formula:
where coef and expr represent the LASSO regression coefficient and expression of each gene, respectively.
GC samples were stratified into high-risk group (HRG) and low-risk group (LRG) based on the optimal cut-off value of the risk scores. The ggplot2 package (v3.5.1) was used to visualise the distribution of risk scores and survival status. Kaplan‒Meier (KM) survival curves for the HRG and LRG groups were generated using the survival package (v3.7.0), with group differences assessed via the Log rank test (P < 0.05). Moreover, after setting 2-, 3-, and 5-year time points, the pROC package (v1.18.5)22 was employed to plot receiver operating characteristic (ROC) curves. A model performance threshold was set at area under the curve (AUC) > 0.6. For external validation, the model was further tested on the GSE62254 dataset to evaluate its generalisability and robustness.
Stratified Analysis of Clinical CharacteristicsWithin the TCGA-STAD dataset, the association between clinical characteristics and the risk score was assessed by categorising GC samples into subgroups based on age and T, N, and M stages. The survival package (v3.7.0) was used to generate KM survival curves for the HRG and LRG within each clinical subgroup. In addition, the Log rank test (P < 0.05) was applied to evaluate intergroup differences.
Nomogram Establishment and ValidationTo identify independent prognostic factors within the TCGA-STAD dataset, clinical characteristics and risk scores were subjected to a sequential analysis process: univariate Cox regression (HR ≠ 1, P < 0.05), the PH assumption test (P > 0.05), and multivariate Cox regression (HR ≠ 1, P < 0.05). Factors that passed all three analyses were defined as independent prognostic variables. A nomogram model was then established via the rms package (v6.5.0)23 to predict 2-, 3-, and 5-years survival probabilities. In addition, calibration and decision curve analysis (DCA) curves were generated to evaluate the model’s performance. Specifically, the regplot (v1.1)24 and ggDCA (v1.2)25 packages were used to plot the calibration and DCA curves, respectively, to assess the model’s predictive accuracy and clinical utility.
Functional Pathway Analysis of Prognostic GenesTo understand the biological functions and signalling pathways associated with prognostic genes, gene set enrichment analysis (GSEA) was performed using data from the TCGA-STAD dataset. The gene set “h.all.v2023.2.Hs.symbols” was retrieved from the Molecular Signatures Database (https://www.gsea-msigdb.org/gsea/msigdb). Spearman correlation coefficients between each prognostic gene and other genes were calculated using the psych package (v2.4.3). The resulting correlations were ranked in descending order for each gene. GSEA was then conducted individually for each prognostic gene using the clusterProfiler package (v4.10.1), with the following thresholds: |normalised enrichment score| > 1, false discovery rate < 0.05, and P < 0.05. The 10 most significantly enriched pathways were identified via the enrichplot package (v1.22.0).21
Immune Microenvironment, Immunotherapy Response, and Drug Sensitivity AnalysesWithin the TCGA-STAD dataset, the immune microenvironments of the HRG and LRG were analysed. The estimate package (v1.0.13)26 was used to calculate and analyse the immune microenvironment score (stromal score, immune score, estimation of stromal and immune cells in malignant tumour tissues via expression data (ESTIMATE) score, and tumour purity). Additionally, their association with the risk score in the HRG and LRG was evaluated via Spearman correlation analysis using the psych package (v2.4.3) (|cor| > 0.3, P < 0.05). Intergroup differences were assessed using the Wilcoxon rank-sum test (P < 0.05).
The infiltration of 22 immune cell types27 in the HRG and LRG was analysed using the cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT) algorithm in the immuno-oncology biological research (IOBR) package (v0.99.8).28 Differential immune cell types were then identified via the Wilcoxon rank-sum test (P < 0.05). Thereafter, Spearman correlation analysis was performed using the psych package (v2.4.3) (|cor| > 0.3, P < 0.05). Specifically, we focused on the relationships between differential immune cell types and the risk score, as well as between differential immune cell types and prognostic genes.
Furthermore, the tumor immune dysfunction and exclusion (TIDE) algorithm on the official TIDE website (http://tide.dfci.harvard.edu/) was employed to evaluate differences in potential responses to immunotherapy between the two risk groups. The results were illustrated using the ggpie package (v0.2.5) (https://github.com/showteeth/ggpie). Differences in the expression of 38 IC genes29 between the HRG and LRG were subsequently assessed using the Wilcoxon rank-sum test (P < 0.05). Similarly, the psych package (v2.4.3) was applied to examine associations between the expression of IC genes and the risk score, as well as between the expression of IC genes and prognostic genes, by calculating the Spearman correlation coefficients (|cor| > 0.3, P < 0.05).
Eventually, based on 198 clinically tested and Food and Drug Administration-approved chemotherapy drugs identified from the CellMiner database (https://discover.nci.nih.gov/cellminer/home.do), the oncoPredict package (v1.2)30 was applied to calculate 50% inhibitory concentrations (IC50) for GC samples in the risk groups to evaluate drug sensitivity. In addition, differences in drug IC50 between the groups were assessed using the Wilcoxon rank-sum test (P < 0.05). The psych package (v2.4.3) was used to calculate the Spearman correlation coefficient (|cor| > 0.3, P < 0.05) between prognostic genes and the 10 drugs with the most significant IC50 differences between the risk groups.
Somatic Mutation and Copy Number Variation (CNV) AnalysesAfter obtaining the somatic mutation data of the GC samples (123 samples in the HRG dataset and 225 samples in the LRG dataset) from the TCGA-STAD dataset, the TCGAmutations package (v0.4.0)31 was used to analyse and visualise the somatic mutation profiles of the two risk groups. In addition, the 20 tumour mutational burden (TMB) genes with the highest mutation frequencies in the HRG and LRG were visualised.
Furthermore, the copy number alteration profiles of each prognostic gene in the GC, control, and both risk group samples were analysed using the Genomic Identification of Significant Targets in Cancer software (v2.0),32 based on CNV mutation data from the TCGA-STAD dataset.
Molecular Regulatory Network ConstructionTo predict upstream regulatory factors targeting prognostic genes, the DIANA-microT database (http://diana.imis.athena-innovation.gr/DianaTools/index.php) and MicroRNA Target Prediction Database (https://www.mirdb.org/) were used to predict microRNAs (miRNAs) that target these genes (ie, mRNAs). The miRNAs predicted by both databases were defined as key miRNAs. These key miRNAs were subsequently used to identify long non-coding RNAs (lncRNAs) via the miRNet (https://www.mirnet.ca/miRNet/home.xhtml) and StarBase (https://rnasysu.com/encori/) databases. Finally, Cytoscape software (v3.9.1) was used to visualise the lncRNA–miRNA–mRNA regulatory network.
Reverse Transcription-Quantitative Polymerase Chain Reaction (RT-qPCR)We randomly selected five patients who underwent gastric cancer resection in the Gastrointestinal Surgery Department of the Affiliated Suqian Hospital of Xuzhou Medical University, and collected tumor and peritumoral tissue from each patient (Table S1). The tissue samples were collected at RNAs from the samples were isolated using TRIzol reagent (Ambion, USA). After electrophoresis and purity analysis with a NanoPhotometer N50, the RNA was reverse transcribed into complementary DNA (cDNA) using the SureScript First-Strand cDNA Synthesis Kit (Servicebio, China). RT-qPCR was subsequently performed using Universal Blue SYBR Green qPCR Master Mix (Servicebio, China) on a CFX Connect RT-qPCR detection system (Bio-Rad, USA). The primers for the prognostic genes (BMP1, LEFTY1, TGFB2, PDGFD, and TSPAN5) and the internal reference gene (GAPDH) are listed in Table S2. Details of the reaction system and programmes used for reverse transcription and RT-qPCR are provided in Tables S3–S6. Relative gene expression was calculated using the 2−ΔΔCt method. A t-test (P < 0.05) was used to evaluate intergroup differences. GraphPad Prism 5 software (v8.0)33 was used for visualisation.
Statistical AnalysisBioinformatic analyses were conducted using R software (v4.3.1). The Wilcoxon rank-sum test and t-test were employed to assess differences between groups, with statistical significance defined as P < 0.05. The relevant information table of the software packages used in this study can be found in Table S7.
Results Related Functional Pathways and Complex PPI Network of Candidate GenesIn the TCGA-STAD dataset, 19,427 DEGs were identified between the GC and control groups, including 12,757 upregulated and 6670 downregulated genes (P < 0.05) (Figure 1A and B). Seventeen candidate genes associated with migrasome function in GC were subsequently obtained from the intersection of MRGs and DEGs (Figure 1C).
Figure 1 Related functional pathways and complex PPI network of candidate genes. (A) Volcano plot of significant DEGs between the GC and control groups, the red dots represent up-regulated genes and the green dots represent down-regulated genes. (B) A heatmap of the DEGs, the heat map consists of two parts: the upper part is the heat map of the distribution density of the expression levels of differentially expressed genes, and the lower part is the heat map of the expression levels of differentially expressed genes. Color represents the magnitude of gene expression. The darker the color, the higher the expression level of the DEGs (red indicates high expression, and cyan indicates low expression). (C) Venn diagram of candidate genes. The red circle represents differentially expressed genes, the green circle represents migration-related genes, and the overlapping part is the intersection of the two. (D)The results of GO Enrichment analysis. In the figure, different colors represent different categories of pathways. The abscissa represents the number of genes enriched in the pathway. The text on the columns is the name of the pathway, and the text below the columns is the genes enriched in the pathway. (E) The results of KEGG Enrichment analysis.The size of the dots and the abscissa represent the number of genes enriched in the pathway. (F) The PPI network of candidate genes. The color of the nodes ranges from light to dark, and the area ranges from small to large, representing that the degree of the node increases from small to large.
The candidate genes were significantly enriched in 369 GO terms: 345 BP terms (eg, myeloid leukocyte migration), six CC terms (eg, collagen-containing extracellular matrix), and 18 MF terms (eg, cytokine activity) (P < 0.05) (Figure 1D). In addition, the genes were enriched in 24 KEGG pathways, including the Hippo signalling pathway (P < 0.05) (Figure 1E). Overall, these functional annotations supported the involvement of the candidate genes in key GC-related pathways, highlighting the potential biological roles of MRGs in GC.
Furthermore, 11 proteins and 10 protein interaction pairs were identified in the PPI network, which revealed the intricate interactions among proteins encoded by the candidate genes and underscored the possible importance of MRGs in GC cellular functions and tumour progression (Figure 1F).
Expression Trends and Positive Correlations of Eight Prognostic GenesWithin the TCGA-STAD dataset, 12 candidate genes remained following univariate Cox analysis (P < 0.2) (Figure 2A). Among them, LEFTY1 was associated with a better prognosis (HR < 1), suggesting that it might inhibit GC progression. Conversely, the other 11 candidate genes, such as BMP1, were associated with adverse prognosis (HR > 1), suggesting they might promote GC progression. In addition, the PH assumption test revealed that these genes satisfied the assumption (P > 0.05) (Figure 2B). Therefore, these genes were defined as candidate prognostic genes. Notably, eight prognostic genes were identified following LASSO regression analysis, including BMP1, CPQ, PDGFD, TSPAN5, TSPAN7, TGFB2, WNT11, and LEFTY1 (log(lambda.min) = −4.2257) (Figure 2C and D). Of these, BMP1, LEFTY1, and TGFB2 were markedly upregulated in the GC group. In contrast, PDGFD, TSPAN5, and TSPAN7 were notably significantly upregulated in the control group (P < 0.05) (Figure 2E). Moreover, significantly positive correlations were observed between prognostic genes (P < 0.05) (Figure 2F). The correlations between CPQ and PDGFD (r = 0.55, P < 0.0001), as well as between CPQ and TGFB2 (r = 0.52, P < 0.0001), were markedly strong. These prognostic genes might exert synergistic effects in GC through shared biological mechanisms, thereby influencing the progression or prognosis of GC.
Figure 2 Expression trends and positive correlations of the eight prognostic genes. (A) Results of univariate Cox regression analysis. Each row represents a gene, showing its P-value and the hazard ratio (HR) within the 95% confidence interval. Red dots denote the estimated HR, while blue lines indicate the confidence intervals. An HR greater than 1 signifies an increased risk, whereas an HR less than 1 implies a reduced risk. (B) Results of the PH hypothesis test. Each subplot shows the Schoenfeld residuals of a gene over time. The red dots represent the residuals, the solid line is the fitted curve, and the dashed line is the 95% confidence interval. If the fitted curve is close to 0 and within the confidence interval, it indicates that the variable satisfies the proportional hazards assumption. The p-values shown in the figure are used for the Schoenfeld residual tests of individual genes and the overall model. A larger p-value (typically > 0.05) suggests compliance with the proportional hazards assumption. (C) LASSO regression coefficient path diagram. The horizontal axis represents Log(Lambda), and the vertical axis represents the coefficients. Curves in different colors represent different genes. The dashed line marks the optimal Lambda value (0.0146), and the corresponding model is the simplest model with non-zero gene coefficients. (D) LASSO cross-validation results. The horizontal axis represents Log(Lambda), and the vertical axis represents the partial likelihood deviance. Dots and error bars show the deviance ranges of different models, with colors indicating the corresponding number of non-zero genes. The dashed line marks the optimal log.lambda value (log.lambda.min = −4.2257). (E) Differences in expressions of the prognostic genes between the control and GC groups. The blue represents the control group, and red represents the gastric cancer group. The white box lines indicate the median and its quartiles. Asterisks denote the significance differences between the two groups: **** for p < 0.0001, *** for p < 0.001, * for p < 0.05, and ns for no significant difference. (F) Heatmap of the prognostic genes in GC samples. The blank space in the figure represents non-significant correlation.
Construction of a Risk Model with Notable Predictive AbilityAfter the prognostic genes were obtained, a risk model was constructed as follows: Risk score = (0.089332494 × BMP1 expression level) + (0.0003596 × CPQ expression level) + (0.074722162 × PDGFD expression level) + (0.026623357 × TSPAN5 expression level) + (0.047167114 × TSPAN7 expression level) + (0.023629132 × TGFB2 expression level) + (0.010867593 × WNT11 expression level) + (−0.03409899 × LEFTY1 expression level). The patients with GC in the TCGA-STAD dataset were subsequently divided into HRG and LRG (124:226) groups based on the optimal cut-off value (2.5818). The distributions of the risk score and survival state within the risk groups revealed that mortality increased with increasing risk score (Figure 3A and B). Moreover, KM survival curves showed that the patients with GC in the LRG had a markedly greater survival probability (P < 0.001) (Figure 3C). Additionally, the AUCs of the ROC curves at 2, 3, and 5 years all exceeded 0.6, reflecting the notable effectiveness of the risk model (Figure 3D).
Figure 3 Construction of a risk model with predictive performance. (A) The distribution of risk scores for patients in the TCGA-STAD dataset. The abscissa represents patients sorted from low to high risk scores, while the ordinate denotes the risk scores. In the figure, red indicates the high-risk group, and blue indicates the low-risk group. The dashed line represents the optimal cutoff point, which classifies patients into high-risk and low-risk groups. (B) Distribution of survival status of TCGA-STAD patients. The abscissa shows patients sorted by risk score, and the ordinate indicates survival time (in years). Blue dots represent surviving patients, while red dots represent deceased patients. (C) Survival curves of patients in the high-risk and low-risk groups. The red curve represents the high-risk group, and the blue curve represents the low-risk group. The abscissa is the survival time (days), and the ordinate is the survival probability. (D) ROC curve of the risk score. (E) Distribution of survival status of Validation set (GSE62254) patients. (F) Survival curves of patients in the high-risk and low-risk groups. (G) Survival curves of patients in the high-risk and low-risk groups (Validation set GSE62254). (H) ROC curve of the risk score (Validation set GSE62254).
Furthermore, the risk model was validated using the GSE62254 dataset. Specifically, the patients with GC in this dataset were classified into HRG and LRG (195:105) according to the optimal cut-off value (0.44613). The distributions of the risk score and survival state (Figure 3E and F), KM survival curves (P < 0.001) (Figure 3G), and ROC curves (all AUC values exceeded 0.6) (Figure 3H) were largely consistent with those in the TCGA-STAD dataset. These findings indicated the strong generalisability of the constructed prognostic risk model and demonstrated that the identified prognostic genes could effectively stratify patients with GC by risk and predict survival outcomes. This risk model may serve as a valuable tool for personalised prognostic assessment in clinical practice for patients with GC.
Associations Between Clinical Characteristics and the Risk ScoreIn the TCGA-STAD dataset, KM survival curves revealed significant survival differences across the two risk groups based on age (<60 and ≥60 years) and T/N/M stages (T1–2, T3–4, N0, N1–3, and M0) (P < 0.05). However, no survival differences were observed between the HRG and LRG within the M1 clinical subgroup (Figure 4A–H). In summary, these clinical characteristics were strongly associated with the risk scores of patients with GC. Personalised treatment plans developed based on these clinical features might represent a promising therapeutic direction.
Figure 4 Survival differences across clinical subgroups. (A) K-M survival curves of high- and low-risk groups in patients < 60 years old. The red curve represents the high-risk group, and the blue curve represents the low-risk group. The abscissa is the survival time (days), and the ordinate is the survival probability. A P-value < 0.05 in the figure indicates that the difference in survival rates between the high- and low-risk groups is statistically significant. (B) K-M survival curves of high- and low-risk groups in patients ≥60 years old. (C) K-M survival curves of high- and low-risk groups in T1-2 subgroup. (D) K-M survival curves of high- and low-risk groups in T3-4 subgroup. (E) K-M survival curves of high- and low-risk groups in N0 subgroup. (F) K-M survival curves of high- and low-risk groups in N1-32 subgroup. (G) K-M survival curves of high- and low-risk groups in M0 subgroup. (H) K-M survival curves of high- and low-risk groups in M1 subgroup.
Establishment of a Nomogram Demonstrating Robust Predictive Ability and Clinical UtilityIn the TCGA-STAD dataset, four factors that remained after univariate Cox analysis (P < 0.05) satisfied the proportional hazards assumption (P > 0.05) (Figure 5A and B). These factors were then subjected to multivariate Cox analysis, in which risk score, age, and N stage were identified as independent prognostic factors (P < 0.05) (Figure 5C). Specifically, these factors were associated with adverse outcomes (HR > 1). These results were subsequently used to construct a nomogram (Figure 5D). Higher total points indicated higher survival probabilities at 2, 3, and 5 years for patients with GC. Furthermore, the slope of the corresponding calibration curve closer to 1 demonstrated excellent predictive accuracy of the nomogram (Figure 5E). Additionally, the nomogram yielded the greatest net benefit in decision curve analysis, indicating its superior clinical utility (Figure 5F). In summary, this nomogram can be used to assist physicians in rapidly assessing risks based on patients’ clinical parameters and guiding the selection of individualized treatment plans.
Figure 5 Construction and Evaluation of an Independent Prognostic Model. (A) Univariate Cox regression analysis. Each row represents a factor, showing its P-value and hazard ratio (HR) within the 95% confidence interval. Red dots denote the estimated HR, and blue lines indicate the confidence intervals. An HR greater than 1 implies increased risk, while an HR less than 1 suggests reduced risk. (B) Proportional hazards (PH) assumption test. (C) Multivariate Cox regression analysis. (D) Prediction model based on clinical factors and risk scores. (E) Calibration curve comparing predicted and actual survival. (F) DCA curve illustrating clinical usefulness of the model.
Crucial Functional Pathways of Prognostic GenesBiological pathways associated with prognostic genes in the TCGA-STAD dataset were explored through GSEA. Specifically, BMP1, CPQ, PDGFD, TSPAN5, TSPAN7, TGFB2, WNT11, and LEFTY1 were significantly enriched in 48, 43, 35, 40, 30, 37, 27, and 39 pathways, respectively. The 10 most significant pathways for each gene were visualised (Figure 6A–H). Mitotic spindle and epithelial–mesenchymal transition were enriched across most prognostic genes. The eight prognostic genes were jointly enriched in 11 pathways, including TGF-beta signalling, adipogenesis, and bile acid metabolism (Table 1). These prognostic genes may play pivotal roles in GC progression by influencing these pathways, and therapeutic approaches targeting these genes or pathways could represent promising strategies for the treatment of GC.
Figure 6 GSEA enrichment analysis of prognostic genes. The result figure can be divided into three parts: The first part is the gene Enrichment Score. The horizontal axis shows the gene set to be tested sorted by gene correlation, and the vertical axis shows the corresponding Running ES. The peak of the line chart is the enrichment score of the enrichment pathway, and the genes before the peak are the core genes located in the enrichment pathway within the gene set to be tested. The second part is the hit, which uses lines to mark the genes under the gene set to be tested. The third part is the distribution map of the rank values of all genes, which by default uses the Signal2Noise algorithm. (A–H) GSEA enrichment score plots for BMP1, CPQ, PDGFD, TSPAN5, TSPAN7, TGFB2, WNT11, and LEFTY1.
Differential Immune Microenvironment, Immunotherapy Response, and Drug SensitivityWithin the TCGA-STAD dataset, the stromal score (r = 0.49) and ESTIMATE score (r = 0.3) were significantly positively correlated with the risk score (P < 0.0001). In addition, both scores were markedly higher in the HRG (P < 0.0001). In contrast, tumour purity was significantly higher in the LRG and showed a marked inverse correlation with the risk score (r = –0.3) (P < 0.0001). A significant difference in the immune score between the HRG and LRG was also observed along with a strong correlation between the immune score and risk score (Figure 7A). Moreover, the infiltration of seven immune cell types, including monocytes and neutrophils, differed significantly between the HRG and LRG (P < 0.05) (Figure 7B). Among these, monocytes and resting mast cells were significantly positively correlated with the risk score (r > 0.1, P < 0.05), whereas T follicular helper cells and activated CD4 memory T cells exhibited significant negative correlations with the risk score (|r| > 0.2, P < 0.001) (Figure 7C). In addition, notable associations were observed between specific immune cell types and prognostic genes (Figure 7D). For example, the strongest positive correlation was found between TSPAN7 and monocytes (r = 0.27), whereas the strongest negative correlation was observed between TSPAN7 and activated CD4 memory T cells (r = −0.3) (P < 0.05). Overall, the immune microenvironment differed between the two groups, highlighting the potential of modulating it as a strategy to influence GC progression. Moreover, the complex associations between prognostic genes and the immune microenvironment might play a crucial role in the risk stratification of patients with GC, offering new targets for personalised therapeutic approaches. Furthermore, the response to immunotherapy varied between risk groups, with lower effectiveness observed in the HRG (21%) compared to the LRG (30%) (Figure 7E). Among the 38 IC genes, 20 showed significant differences in expression between the HRG and LRG (P < 0.05) (Figure 7F).
Figure 7 Continued.
Figure 7 Conitnued.
Figure 7 Immune microenvironment and immunotherapy response analyses. (A) Differences in immune microenvironment scores between high and low-risk groups. Pink represents the high-risk group, and turquoise represents the low-risk group. The black line in the scatter plot represents the regression line, and the shaded area indicates the 95% confidence interval. (B) Box plot of immune infiltration differences. The horizontal axis represents the correlation coefficient, and the vertical axis represents the immune cell types. The color and line thickness in the figure represent different p-values and correlation strengths: the color indicates the magnitude of the p-value, where purple represents p < 0.001, blue represents p < 0.05, and light blue represents p > 0.05. The size of the dots represents the absolute value of the correlation coefficient, with dots smaller than 0.3 being smaller in size. (C) Correlation between risk scores and differentially infiltrating immune cells. Asterisks indicate significant differences between the two groups, and “ns” indicates no significant difference. The symbols in the figure represent the significance levels: * p < 0.05, ** p < 0.01, **** p < 0.0001. (D) Correlation between prognostic genes and differentially infiltrating immune cells. The color from blue to red represents the correlation coefficient from −1 to +1, and non-significant correlations are left blank in the figure. (E) TIDE algorithm predictions for immunotherapy response. The red part represents the probability of immunotherapy failure, and the gray part represents the probability of immunotherapy success. (F) Differential expression of immune checkpoint genes. Yellow represents the high-risk group, and blue represents the low-risk group. The abscissa is the name of the immune checkpoint, and the ordinate is the expression level. “ns” indicates that there is no significant difference in the expression of the immune checkpoint between the high-risk and low-risk groups, while “*” represents a significant difference. (G) Correlation between immune checkpoint genes and prognostic genes. Non-significant correlation coefficients are left blank in the figure. (H) Correlation between risk scores and immune checkpoint expression levels. Each point represents an immune checkpoint. The horizontal axis shows the correlation coefficient, and the vertical axis shows the name of the immune checkpoint. The color of the points indicates different intervals of p-values: dark purple represents p < 0.001, and light blue represents p > 0.05. The size of the points represents the absolute value of the correlation coefficient, where a larger size indicates a stronger correlation. (I) Differences in IC50 values of drugs between high- and low-risk groups. (J) Correlation between prognostic genes and chemotherapy drug responses. (K) Scatter plot of the correlation between TSPAN7 and CNDAC drug, and box plot of high and low expression groups. (L) Scatter plot of the correlation between TGFB2 and CNDAC drug, and box plot of high and low expression groups. (M) Scatter plot of the correlation between PDGFD and CNDAC drug, and box plot of high and low expression groups.
These 38 IC genes were all notably correlated with specific prognostic genes (P < 0.05) (Figure 7G). Particularly, CD276 and BMP1 exhibited the strongest positive correlation (r = 0.65), whereas IDO2 and WNT11 exhibited the strongest negative correlation (r = −0.22) (P < 0.05). Additionally, among the 27 IC genes positively correlated with the risk score, NRP1 and CD200 showed stronger correlations (r > 0.6, P < 0.001), whereas only TNFSF9 showed a notable negative correlation (r = 0.11, P < 0.001) (Figure 7H). These results suggest that prognostic genes might influence the immunotherapy response in GC by modulating specific IC gene expression, representing potential targets for personalised treatment. Further research into these interactions could enhance our understanding of immune regulatory mechanisms within the GC microenvironment.
Regarding therapeutic drugs, the IC50 values of 119 agents differed significantly between the two risk groups (P < 0.05). The 10 drugs with the most significant changes in IC50 values were visualised. Among them, IC50 values of erlotinib_1168 and SCH772984_1564 were higher in the HRG, suggesting that lower doses may be more effective in low-risk patients. These two drugs exhibited strong positive correlations with most prognostic genes, specifically CPQ (r > 0.5, P < 0.0001). In contrast, IC50 values of eight drugs, including BMS-754807_2171 and JQ1_2172, were lower in the HRG, indicating that lower doses might be sufficient for therapeutic effects in these patients (Figure 7I). These eight drugs exhibited significant negative correlations with most prognostic genes, with the strongest negative correlation observed between JQ1_2172 and CPQ (r = −0.61, P < 0.0001) (Figure 7J). These differences in drug sensitivity may result from drug metabolism mechanisms influenced by specific prognostic genes across risk groups in GC.
The results of drug sensitivity analysis showed that among the 7 genes studied, TSPAN7 exhibited the most significant drug association, being significantly correlated with the sensitivity to 8 anticancer drugs (p<0.05) (Figure 7K). TGFB2 showed significant associations with 3 drugs, and PDGFD was only significantly associated with 1 drug (Figure 7L and M). However, no significant correlations were found between BMP1, CPQ, TSPAN5, WNT11, and LEFTY1 and any of the tested drugs. It is worth noting that although the box plots of some genes (such as PDGFD) showed a trend of differences between groups, these differences did not reach statistical significance (p>0.05), possibly due to the limitation of sample size (only 29 samples in each group) and the high variability (standard deviation >50%) of IC50 values for some drugs. These results suggest that TSPAN7 may be a potential regulator of drug sensitivity and warrants further investigation.
Differential Somatic Mutation and CNV Profiles and the Elaborate Molecular Regulatory Network of Prognostic GenesThe somatic mutation profile was elucidated in the TCGA-STAD dataset. Specifically, TP53 (50%) and TTN (58%) were identified as the genes with the highest mutation rates in the HRG and LRG, respectively. Additionally, the most common mutation types were missense mutations and multiple hits (Figure 8A and B). Notably, prognostic genes likely influence the tumour mutation profile by regulating the expression of specific TMB genes. These findings may help elucidate the potential effects of prognostic genes in patients with tumour mutations and aid in the development of personalised treatments, guiding prognostic assessments in GC.In addition, the CNV profile of each prognostic gene was visualised. Only BMP1 and CPQ exhibited low-level copy number amplification in a small subset of control samples, whereas no CNVs were observed for the remaining prognostic genes (Figure 8C). In GC samples and two risk groups, all eight prognostic genes exhibited CNVs to varying extents, with low-level copy number amplification being the most frequent, followed by single-copy deletions (Figure 8D–F). These findings suggest that prognostic genes may influence risk stratification and the progression of GC by altering specific CNVs.
Figure 8 Somatic mutations, CNVs analysis, and molecular regulatory network. (A and B) Waterfall plots showing genomic mutations in high- and low-risk groups. The abscissa represents genes, and the ordinate represents samples. The color of each square represents different mutation types. The bar chart at the top shows the mutation frequency of each gene across all samples, and the bar chart on the right indicates the number of mutated genes in each sample. (C–F) Copy number variation profiles of prognostic genes in control, GC, high-risk, and low-risk group. Copy number variations are classified into five categories: homozygous deletion, single-copy deletion, diploid normal copy, low-level copy number amplification, or high copy number amplification, which are represented by five colors respectively. The figure shows the proportion of samples with variations in each prognostic gene to the total number of samples. (G) Molecular regulatory network involving lncRNAs, miRNAs, and prognostic genes. Orange nodes are prognostic genes, green nodes are miRNAs, and blue nodes are lncRNAs.
A regulatory network was constructed for prognostic genes (Figure 8G). Specifically, the lncRNA‒miRNA–mRNA regulatory network comprised two prognostic genes (TGFB2 and WNT11), 13 miRNAs, and 27 lncRNAs, demonstrating the complex regulatory interactions among them. These two genes are regulated by specific miRNAs (eg, hsa-miR-148a-3p) and lncRNAs (eg, NEAT1), suggesting possible upstream regulatory mechanisms. These findings may guide the development of therapeutic strategies targeting GC.
Validation of the Expression of Prognostic GenesCompared with the control group, BMP1, LEFTY1, and TGFB2 were significantly upregulated in the GC group (P < 0.05), whereas TSPAN5 was significantly downregulated (P < 0.001) (Figure 9A–D). The expression trends of these genes were consistent with those observed in the TCGA-STAD dataset, reinforcing the reliability of the findings. Furthermore, the significant differential expression of these prognostic genes between the control and GC groups supports their prognostic value in GC. Although PDGFD expression tended to decrease in the GC group, the difference was not statistically significant, which may be attributed to the limited sample size (Figure 9E).
Figure 9 Verification of prognostic gene expression, with the samples being 5 GC tissue samples and 5 control tissue samples. (A) The expression level of BMP1 showed that BMP1 was significantly upregulated in the gastric cancer group (*P < 0.05). (B) The LEFTY1 expression level: LEFTY1 was significantly upregulated in the gastric cancer group (* indicates P < 0.05). (C) The TGFB2 expression level: TGFB2 was significantly upregulated in the gastric cancer group (**P < 0.01). (D) The expression level of TSPAN5 showed that TSPAN5 was significantly downregulated in the gastric cancer group (***P < 0.001). (E) The PDGFD expression level: There was no significant difference in PDGFD in the gastric cancer group (ns = not significant).
DiscussionGC is a malignant tumor with diverse etiologies and a high mortality rate.1 Current research indicates the important role of migrasomes in cell migration and communication within the tumor microenvironment.34 Although there has not yet been research on migrasomes in gastric cancer, the role of migrasomes in tumors may play a significant role in the carcinogenesis of gastric cancer. This study identified prognostic genes associated with migrasomes (BMP1 and TSPAN7) using transcriptome sequencing data and bioinformatics methods, and constructed a new prognostic model for gastric cancer. The study confirmed that MRGs have prognostic and immunological value in gastric cancer, providing a scientific basis for the prognostic diagnosis and immunotherapy of gastric cancer.
In this study, we highlighted the significance of MRGs in GC, elucidating their prognostic relevance. A total of eight prognostic genes were identified—BMP1, PDGFD, TSPAN5, TSPAN7, TGFB2, WNT11, CPQ, and LEFTY1—underscoring the complex interactions between these biomolecules and migrasome-related mechanisms.
BMP1, an extracellular metalloproteinase of the astacin M12A family, regulates TGF-β and BMP signalling pathways and plays dual roles in gastrointestinal tumours by facilitating tumour progression and invasion. Elevated BMP1 levels in gastric cancer tissues suggest its potential as a biomarker for tumour aggressiveness and a therapeutic target, thereby deepening our understanding of GC progression.35 BMP1 activates the TGF-β and BMP signaling pathways through proteolytic cleavage, which plays a dual role in the occurrence and development of gastrointestinal tumors. With the help of BMP1, TGF-β promotes the invasion and metastasis of GC, so the upregulation of BMP1 may increase the aggressiveness of cancer in GC.36
Research has demonstrated a significant association between PDGFD and GC, with higher PDGFD expression correlating with poorer prognosis and survival outcomes.37,38 PDGFD likely contributes to GC progression through the circ-0007707/miR-429/PDGFD axis, which influences immune-related gene characteristics and tumour development.39 Specifically, circ-0007707 may function as a competing endogenous RNA, upregulating PDGFD and promoting tumour cell proliferation and survival, thereby increasing GC aggressiveness.40 This axis also implicates PDGFD in immune evasion and tumour growth.41,42 Collectively, the circ-0007707/miR-429/PDGFD axis highlights key molecular mechanisms in GC and may offer promising therapeutic targets. Research has shown that PDGFD is highly expressed in gastric cancer tissues and promotes the proliferation, migration, and invasion of gastric cancer cells by activating the PI3K/Akt signaling pathway, exerting a pro-cancer effect.39 However, this study found through RT-qPCR detection that the expression level of PDGFD in gastric cancer tissues is low compared to adjacent normal tissues. This difference may stem from the following reasons: first, technical factors such as primer design failing to cover all splice variants of PDGFD or detection bias caused by tumor heterogeneity;43 second, PDGFD may have stage-specific functions, playing different roles at various stages of tumorigenesis;44 furthermore, stromal cells in the tumor microenvironment may exert effects through paracrine mechanisms.45 These findings suggest that PDGFD may have a more complex regulatory network in gastric cancer, and its exact mechanisms of action still need to be further elucidated through methods such as protein level detection and spatial transcriptomics analysis.
TSPAN5, a member of the tetraspanin superfamily, plays a significant role in GC biology by regulating cell signalling and modulating the tumour microenvironment.46,47 Recent studies have emphasised the importance of tetraspanins in tumour biology through their impact on cellular communication and microenvironmental interactions.48,49 The research results indicate that the expression of TSPAN5 is negatively correlated with the overall survival rate of patients, acting as a tumor suppressor in the stomach to control tumor growth in GC.50
TSPAN7, another member of the tetraspanin superfamily, displays variable expression in cancers such as gastric, liver, and colorectal cancer.51–53 Although tetraspanins influence cell adhesion and signalling, the specific role of TSPAN7 in metastasis remains underexplored.54,55 Interestingly, although TSPAN7 is typically downregulated in these cancers, suggesting a potential tumour suppressor function, its expression is elevated in patients with GC and bone metastases, indicating a possible role in promoting metastatic spread.56,57 These findings highlight TSPAN7 as a potential biomarker for metastatic progression in GC, warranting further mechanistic studies to uncover its therapeutic relevance.
Our findings also underscore the pivotal role of TGFB2 in GC progression and metastasis. Elevated TGFB2 expression is associated with poor prognosis, reinforcing its value as a prognostic biomarker. The TGFβ signalling pathway, particularly the TGFβ/EPB41L5/p120-catenin axis, promotes metastasis through TGFB2 release and receptor-mediated intracellular signalling.58–60 Notably, TGFB2 expression levels vary significantly among GC subtypes, suggesting that patients with high TGFB2 expression could benefit from targeted therapy.61–63 Additionally, downregulation of TGFB2 by miR-618 inhibits GC metastasis, consistent with prior studies on the regulatory role of miR-61 in TGFB2 modulation.64–66 Additionally, transcriptional regulation of TGFB2 by HOXA10 adds another level of complexity to GC metastasis. Therefore, targeting the TGFB2/Smad/METTL3 axis may interfere with epithelial–mesenchymal transition, a key process in cancer metastasis.67,68 Altogether, our results highlight the critical role of TGFB2 in GC and suggest promising avenues for targeted therapy aimed at improving patient outcomes.
WNT11 is increasingly expressed in GC, indicating its key role in tumour development through the WNT signalling pathway, particularly via the RSPO2–LGR5 complex, which regulates stem cell traits and tumour progression.69–71 Our variance analysis revealed significantly elevated WNT11 expression in GC tissues compared with adjacent nontumour tissues, s
Comments (0)