This cross-sectional study utilized the China Migrants Dynamic Survey (CMDS) 2017 database, administered by the Migrant Population Service Centre, National Health Commission P. R. China. This database covers 31 mainland China provinces and 12 months between March 2016 and March 2017 [17]. As a rural-to-urban migrants-focused survey, the participants are limited to migrants over 15 years old living in the cities outside their family registration region for over one month. There are 169,989 participants in total, and the participants were sampled by the multi-stage probability proportionate to size sampling method according to the population sizes at five administrative levels—provinces, cities, districts, townships, and villages.
Eligible samples in this study were selected based on the following requirements: (1) Migrant workers who have lived in current urban cities for more than one year, aged between 18 and 60; and (2) Migrant workers have demographic, diarrhea, and accommodation data available. Figure 1 shows the flow chart of the selection process against these criteria.
Fig. 1Flow chart of the sample selection process
Approximately 19% of the respondents were excluded because their migration duration was less than one year during the survey. This criterion was chosen to ensure that this study’s population was settled in urban accommodations for more than one year, not just seasonal workers. It also ensures that the reported diarrhea episodes in that survey were associated with migrant workers’ urban accommodations. The age criterion was chosen to exclude potentially vulnerable age groups (3.8% excluded), and the missing data criterion was chosen to ensure the data quality (1.9% excluded). The final sample size was 127,906 out of 169,989 observations, located in 351 urban cities, which include 1290 lower-level administrative districts.
Outcome and exposure variablesThe outcome variable is self-reported diarrhea (the binary answers of yes and no were collected and coded as 1 and 0). Self-reported diarrhea is widely used in developing countries to reflect foodborne diseases in social epidemiological research [18]. In the CMDS questionnaire, participants were asked if they had experienced diarrhea disease at least once during the last 12 months. The survey defines diarrhea disease as having at least three episodes of diarrhea in a day. This definition is relatively clear since it is quantified, easy to understand, and less likely to be affected by subjective understandings.
Demographic variables such as age, sex, marital status, family income per person, education levels, and migration (cross-province or inner province) as independent factors at the individual level were considered.
The accommodation types were classified by the CMDS team in their national survey questionnaire. The classification considered migrant workers’ homeownership through the private real estate market or government subsidies, accommodation-related employment benefits, and other informal living arrangements [19] (see Table 1).
Table 1 Chinese migrant workers’ accommodation typesStatistical analysisWe examined whether diarrhea prevalence is associated with the accommodation types among Chinese migrant workers across different urban cities in China. We clustered the residential areas into three levels. The first level categorized all individuals into China's eastern, western, and central regions. These three regions divide all cities in Mainland China based on geographic location and socioeconomic status [21]. The east region is the most developed, consisting of 11 coastal provinces. The central region, including nine inland provinces, is less developed. The western region is China’s most underdeveloped area, covering ten provinces. The second level clusters 1290 urban districts. The 1290 urban districts were categorized into two types: higher-income and lower-income districts. We defined the higher-income district based on the percentage of residing migrants with higher incomes. Migrants with an average family income per person exceeding 1870 Chinese Yuan per month were classified as having higher incomes, whereas those below this threshold were categorized as having lower incomes [22]. A district is designated as a higher-income district if more than 50% of its migrant population falls into the higher-income category. The third level is the individual level, which contains the types of accommodations and other individual characteristics.
We applied the multilevel logistic regression method following the three-step procedure to inspect the impact of different areas and broader sociodemographic factors [23]. We analyzed the relationship between accommodation types and diarrhea outcomes using three logistic regression models, with accommodation types as the independent variables and diarrhea outcomes as the dependent variable. The dataset included 127,906 observations clustered across three regions and 1290 urban districts. Private entire-rental accommodation, the most prevalent housing type among migrants in urban China, was used as the reference category. All other accommodation types were compared to private entire-rental housing to assess their impact on reported diarrhea outcomes.
The base Model 1 estimated the variation of diarrhea outcomes between different regions and urban districts without individual factors. We tested the regional differences, as well as the different districts nested in these three regions. Model 1 established the base of area differences for this analysis:
$$Logit\left(_\right)=log\left(\frac_}_}\right)=M+_,$$
where M represents the overall mean probability on the logistic scale, and \(}}_}}\) represents the area-level residuals on the logistic scale.
Continuing from the regional analysis in Model 1, Model 2 introduced individual-level characteristics, including age, sex, income, education, and accommodation types. Model 2 aims to identify the most influential socio-demographic factors affecting diarrhea prevalence among migrants:
$$Logit\left(_\right)=M+_Se_+_Ag_+_Marriag_+_Migratio_+_Incom_+_Educatio_+_Accommodatio_+_ ,$$
where β1, β2…, β7 are the individual covariate regression coefficients.
Model 3 is the multilevel logistic regression model with individual-level and area variables to reveal the impact of area differences and individual characteristics across areas by including the area variables:
$$Logit\left(_\right)=M+_Se_+_Ag_+_Marriag_+_Migratio_+_Incom_+_Educatio_+_Accommodatio_+_Incom__}+_ ,$$
where the additional β8 is the regression coefficient for the district income level variable. For Model 3, we generated the intraclass correlation coefficient (ICC), demonstrating the proportion of the total variance in the outcome attributed to the district level.
$$} = }/\left( } + }} \right),$$
where VA is the district-level variance, and VI is the individual-level variance. We also generated the marginal R2 and the conditional R2 to assess the goodness of fit of Model 3. Model 2 and Model 3 are constructed to evaluate how individual and area-level variables interplay to affect health outcomes and depict how each variable contributes to the model, with coefficients representing the strength and direction of these relationships.
The multilevel logistic regression models were performed using R (4.3.3) and RStudio (2023.12.1 + 402) [24, 25].
Comments (0)