The global burden and trends of esophageal cancer caused by smoking among men from 1990 to 2021 and projections to 2040: An analysis of the Global Burden of Disease 2021

Data resources

The Global Burden of Disease (GBD) 2021 project is an international effort to quantify the burden of disease in 204 countries and territories over the period 1990–2021 [15, 16]. The data for this study were extracted from the GBD 2021 database (Global Health Data Exchange: https://ghdx.healthdata.org/gbd-2021), with a particular focus on the following indicators of smoking-attributable esophageal cancer (SCEC): number of deaths and age-standardized mortality rate (AMR). The age-standardised mortality rate (ASMR) and DALYs with the age-standardised DALY rate (ASDR) are reported. All rates are reported per 100,000 population. It is important to note that GBD 2021 utilises the Comparative Risk Assessment (CRA) framework to calculate the attributable burden of smoking [15]. The specific process to be followed is as follows: firstly, global smoking exposure data (prevalence, intensity) must be integrated with evidence of relative risk (RR) for EC; secondly, the Population Attributable Fraction (PAF) must be calculated; thirdly, the PAF must be applied to the total EC burden to derive smoking-attributable deaths/DALYs. A comprehensive overview of the methodologies employed can be found in the GBD core literature [15]. The Socio-Demographic Index (SDI) is a composite indicator of a nation's developmental status, encompassing factors such as per capita income, educational attainment, and total fertility rate. The SDI ranges from 0, indicating the lowest level of development, to 1, denoting the highest level of development. The 204 countries/regions are categorised into five groups based on their SDI values: high (0.805–1), medium–high (0.690–0.805), medium (0.608–0.690), medium–low (0.455–0.608), and low (0–0.455) [15]. A known limitation of the GBD framework for EC is that the estimates for smoking-attributable burden are not disaggregated by histological subtype (squamous cell carcinoma vs. adenocarcinoma). Therefore, the results presented in this study represent a composite of both subtypes, with the understanding that the vast majority of the smoking-attributable burden, particularly among males in high-burden regions like East Asia, is likely driven by ESCC.

The estimates provided by the GBD study are based on annual data. However, to enhance the stability of estimates and mitigate the impact of year-to-year volatility, particularly for causes of death and diseases with sparse data, the GBD modeling framework employs spatial–temporal Gaussian process regression and other smoothing algorithms. This approach borrows strength across time and from geographically or demographically similar locations to produce more robust estimates [15, 16].

Regarding the treatment of missing and unreliable data, which is more prevalent in low-SDI regions, the GBD study employs a comprehensive and standardized modeling strategy. This strategy does not simply exclude areas with poor data quality. Instead, it uses all available data—including vital registration, verbal autopsy, cancer registry, and published study data—and processes them through sophisticated statistical models (like Cause of Death Ensemble modeling—CODEm for mortality and Bayesian meta-regression tool—DisMod-MR 2.1 for non-fatal outcomes) to generate complete and comparable estimates for all locations. These models explicitly account for data bias and non-sampling error, thereby producing estimates that are corrected for incompleteness and misclassification [15, 16]. Consequently, the estimates for low-SDI regions presented in our analysis are not raw reported data but are model-based, cross-country comparable estimates that reflect the GBD study's best effort to address data gaps.

Ethics statement

The Chongqing Medical University Institutional Review Board (IRB) granted approval for a waiver of the informed consent requirement, on the grounds that only publicly available aggregated data would be used and that this would not contain any personally identifiable information. The University of Washington Institutional Review Board (IRB) authorised the utilisation of the de-identified GBD 2021 dataset and waived the requirement for informed consent. The present study was conducted in accordance with the Statement for the Strengthening of Reporting of Observational Studies in Epidemiology (STROBE).

Descriptive and trend analyses

All statistical analyses and data visualizations were performed using R software (version 4.3.3; R Foundation for Statistical Computing, Vienna, Austria).A comprehensive evaluation was conducted to ascertain the global health implications of smoking-related EC from 1990 to 2021. This evaluation employed descriptive analyses at the global, regional, country, and quintile levels, utilising the socio-demographic index (SDI) as a metric. The core analytic indicators encompassed the total number of DALYs, ASDR, the total number of deaths, and ASMR attributable to smoking-related EC. The assessment of trends over time was conducted utilising Estimated Annual Percentage Changes (EAPC) and 95% Confidence Intervals (CIs). These were derived through the implementation of a linear regression model, expressed as follows: y = α + βx + ε, wherein y denotes the natural logarithm of the ASR, x represents the calendar year [17], and ε serves as the error term. The EAPC was determined as (exp(β)-1). The 100% and its corresponding 95% confidence interval (95% CI) were derived by means of model analysis [18]. The statistical significance of the trend was determined as follows: A downward trend was identified when both the EAPC estimate and its upper 95% confidence interval (CI) were less than 0. An upward trend was identified when both the EAPC estimate and its lower 95% CI were greater than 0. A stable trend was identified when the 95% CI of the EAPC contained zero.

Joinpoint analysis and AAPC

To identify the key turning points (Joinpoints) of the trend change in the burden of smoking-associated EC during 1990–2021 and the trend characteristics at different stages (APCs), a Joinpoint regression model (version 5.1.0.0. Joinpoint regression analysis was performed using the Joinpoint Regression Program (version 5.1.0.0, April 2024; National Cancer Institute, Division of Statistical Research and Applications, Bethesda, MD, USA) [19]. This statistical method involves the division of the entire study period into multiple consecutive time periods, with the identification of key turning points (junctions) in the time-series data [20]. It then calculates the annual percentage change (APC) and its 95% confidence interval (95% CI) within each time period, with the objective of quantifying the pattern of change in disease burden over time. The calculation of the average annual percentage change (AAPC) and its 95% CI was undertaken to ascertain the overall trend. An upward trend was determined if both the upper and lower bounds of the AAPC estimate and its 95% CI were greater than 0; a downward trend was determined if both were less than 0; and the trend was considered stable if the 95% CI included 0.

Age-period-cohort analysis

The age-period-cohort (APC) model reflects time trends in disease incidence or mortality by age, period and cohort. Estimating the differential effects of each component is difficult due to the linear correlation between age, period and cohort (The age-period-cohort analysis was conducted using the web-based tool provided by the US National Cancer Institute (https://analysistools.nci.nih.gov/apc/) [21, 22]. This is attributed to the high correlation between age, period, and cohort in APC models, where their interactions and complexity prevent independent estimation of their respective effects [23]. In this study, the APC model was further applied to differentiate and quantify the independent effects of age, period, and birth cohort factors on the ASMR and ASDR for smoking-related EC. The analysis was performed separately at the level of each SDI quintile group. The three effects defined in the model are: age effects: reflecting the inherent changes in disease risk due to the physiological aging process at a given age; period effects: capturing the effects of external environmental changes, policy interventions, or improvements in diagnostic technology that affect all age groups in a given year; and cohort effects: capturing the cumulative exposure to specific health risk factors (e.g., smoking) throughout the life course of a population of people from the same birth generation. Cohort effect: Reflects the long-term health consequences of cumulative exposure to a specific health risk factor (e.g., smoking) throughout the life course for the same birth generation. The APC model was obtained from the U.S. National Cancer Institute (https://analysistools.nci.nih.gov/apc). The APC model is modeled using a log-linear Poisson regression framework, which takes the following basic form: lnErij = lnθijNij = µ + αi + βj + γk, with Erij denoting the expected disease rate for the ith age group and jth period group. μ is the intercept term (or base level) of the model; αi represents the fixed effect for the ith age group; βj represents the fixed effect for the jth period group; and γk represents the fixed effect for the kth birth cohort group [18].

Relationship between SDI and ASR

This study utilised Spearman's correlation coefficient to analyse the relationship between SDI and the various indicators of the impact of smoking on EC. This methodological approach permitted the assessment of the strength and direction of the linear relationship across various geographical regions, including countries and areas. To assess the association between SDI and ASMR and ASDR, Spearman correlation coefficients (r) were calculated [18].

Cross-country inequality analysis

Monitoring health inequalities in the burden of smoking-associated EC is essential for developing evidence-based policies to mitigate observable disparities. In this study, a cross-country inequality analysis was conducted, with two standardised metrics—the slope index of inequality (SII) and the concentration index (CI)—being utilised to assess absolute and relative gradients of inequality. The objective of this analysis was to quantify the association between smoking exposure across countries and EC ASMR, as well as the socioeconomic gradient associations between smoking exposure and EC ASMR and ASDR [24, 25].The SII, as a measure of absolute inequality, quantifies the absolute incidence disparity between the populations with the highest and lowest levels of exposure. This is calculated by weighted least squares regression of countries' EC ASMRs on a relative positional scale related to smoking exposure (defined as the mid-point of the population range ranked by cumulative level of smoking exposure). The concentration index was obtained by numerical integration, calculated by determining the area under the Lorenz concentration curve. This curve depicts the cumulative EC ASR in relation to the cumulative population proportion ranked by smoking exposure level, reflecting the degree of relative inequality. The findings of this study indicate that negative SII/CI values are associated with higher levels of smoking exposure, which correspond to higher ASRs for EC. Furthermore, the study demonstrates that larger absolute values are indicative of greater inequality.

Decomposition analysis

In this study, the Das Gupta decomposition method was utilised to analyse the global burden of smoking-related EC death and DALYs for the period 1990–2021, with a focus on men in the five SDI regions. The analysis incorporated the age structure of the population, demographic growth, and epidemiological changes [15, 26]. The three main components of decomposition analysis are as follows: The three factors to be considered are as follows: (i) Epidemiological shifts: the role of health policies, medical innovations, and interventions in shaping age-specific patterns of morbidity and mortality; (ii) The population growth effect: the multiplier effect of expanding the total population size on the absolute burden of disease when the age-specific risk rate is stable; and (iii) The aging effect: the continued rise in the proportion of older people in the population, which has a profound effect on the steep increase in the risk of chronic and degenerative disease burden over the age-specific risk rate. Thirdly, the phenomenon of the ageing effect must be considered. This is defined as the cumulative effect of the continued rise in the proportion of elderly people in the population on the burden of chronic and degenerative diseases, whose risk increases steeply with age. The impact of each factor on the alteration in the number of deaths and DALY from 1990 to 2021 was ascertained through the modification of a single factor, whilst maintaining the others constant.

Predictive analyses

The preceding analysis focuses on the burden of smoking-related EC over the past decades. To facilitate the development of more effective public health and tobacco control policies on a global scale, further projections of the burden of smoking-related EC in the coming decades were made. The Bayesian age-period-cohort model (BAPC) [27] was utilised in this study.The Bayesian age-period-cohort (BAPC) model was implemented with the following key assumptions: (1) smoothness over age, period, and cohort dimensions was enforced through random walk priors; (2) future trends were projected based on the continuation of the most recent period and cohort effects; and (3) the model accounted for overdispersion in the count data. Convergence of the Markov Chain Monte Carlo (MCMC) algorithms was assessed using the Gelman-Rubin diagnostic statistic, ensuring that the potential scale reduction factor (R-hat) was below 1.1 for all key parameters. The potential range of influence of key model parameters (e.g., choice of prior distribution, MCMC convergence threshold) on the prediction results was assessed through an extensive sensitivity analysis, and Monte Carlo simulations were used to generate predictions of future burden covering a 95% uncertainty interval (UI). The prediction results are presented in a visualised form through R (4.3.3).

To assess the robustness of our projection results and the influence of key model parameters, we conducted an extensive sensitivity analysis. This involved running the BAPC model under different prior distributions for the random walk components of the age, period, and cohort effects (e.g., varying the precision/smoothness parameters). We also tested the impact of the forecast function’s settings on the long-term stability of the projections.

The convergence of the Markov Chain Monte Carlo (MCMC) sampling algorithms was rigorously monitored. For each model run, we used a minimum of 50,000 iterations with a burn-in period of 10,000 iterations. Convergence was confirmed by ensuring that the Gelman-Rubin diagnostic (potential scale reduction factor, R-hat) was below 1.1 for all key parameters and by visually inspecting the trace plots for stable mixing and the absence of trends. The final model parameters and priors were selected based on the convergence diagnostics and the model's goodness-of-fit to the historical data, as evaluated by the Deviance Information Criterion (DIC). The predictions are presented with 95% uncertainty intervals (UIs) derived from the posterior distributions of the MCMC simulations.

Use of R packages

The following R packages were utilized for specific analyses: the ‘bayesAPC’ package for Bayesian age-period-cohort modeling; the ‘ggplot2’ package (version 3.5.0) for generating figures; and the ‘stats’ package (version 4.3.3) for core statistical computations including Spearman's correlation and linear regression for EAPC calculation.

Comments (0)

No login
gif