Quality of Cancer-Related Information on New Media (2014-2023): Systematic Review and Meta-Analysis


IntroductionBackground

Digital new media have become indispensable tools for health information and network support, notably among individuals experiencing chronic and terminal health conditions, such as many forms of cancer [,]. New media platforms, represented by social media and artificial intelligence (AI) chatbots, offer collaborative and participatory environments that serve as valuable resources for patients with cancers, survivors of cancer, and caregivers of patients [,]. These platforms facilitate the exchange of information [], provide opportunities to seek external support [], and share personal experiences of cancer []. The rise of AI chatbots has attracted considerable interest in their use for health information [], with studies indicating that they can provide higher-quality information and more empathetic responses than physicians []. Further, among the US adults who report being on the internet almost constantly, specifically on social media [], 72% report leveraging their preferred platform, and increasingly generative AI, to learn about health conditions diagnosed in themselves and their friends and family [].

Research shows that turning to web-based outlets can improve psychosocial outcomes across various acute and chronic conditions []. Access to high-quality information further enhances health literacy, decision-making, health behaviors, outcomes, and health care experiences [-]. Conversely, low-quality information can promote trust in unproven, ineffective, and harmful treatments, as well as fake news, conspiracy theories, and misleading testimonials about “natural” or alternative cures []. Kington et al [] described “high-quality information” as science-based and aligned with the best available evidence. Huang et al [] further identified 15 key attributes of information quality, including accuracy, objectivity, accessibility, completeness, and clarity.

As cancer remains a leading cause of death globally, with rising incidence rates [], related discussions are widespread on social media and AI chatbots [,], making the topic a target for commercial and political interests [,]. Substantial financial incentives drive the promotion of treatments [], and patients, often desperate for hope, are particularly susceptible to misleading information circulated in digital spaces []. Limitations in models, databases, and algorithms pose challenges to the accuracy and credibility of the information provided by AI chatbots [,]. Survivors also face cognitive challenges such as fatigue, memory deficits, and reduced executive function [,], which may impair their ability to critically evaluate the information quality they received []. Consequently, low-quality information may lure patients into submitting to costly and harmful treatments, undermining the potential benefits of digital support networks [].

To date, systematic, scoping, and other literature reviews generally report inconclusive and nondefinitive findings on the quality of cancer-related information on social media [,,]. Most cross-sectional studies indicate considerable variability in information quality across platforms, content formats (ie, text-based or video-based), and cancer types []. This inconsistency may stem from differences in cancer types studied, platforms analyzed, and quality assessment approaches used. A critical gap remains the breadth of literature on this topic that directly accounts for (1) a wide variety of cancer types, (2) a multitude of platforms, and (3) differences in findings over time. Social media, in particular, has evolved considerably in the past two decades, giving rise to AI chatbots that now represent a new source of health information. Accounting for factors such as users’ shift in medium preference (text-based vs video-based), and platform changes (Twitter to X, and at the time of writing, the potential closure of TikTok) may reveal temporal trends that implicate improvement or worsening of information quality over time.

Purpose

This paper presents a systematic review of empirical studies published between 2014 and 2023 that address cancer-related information on new media. It explores the growing reliance on these platforms for cancer health information, trends in platform use, and myriad tools used in the included studies to determine the information quality. Specifically, we addressed the following research questions (RQs): (RQ1) What are the key characteristics of studies evaluating the quality of cancer-related information on social media and AI chatbots, and how have these characteristics evolved? (RQ2) What factors influence the conclusion of quality assessments, and how do they vary across platforms, assessment tools, and cancer types? (RQ3) What patterns emerge in the assessment findings of new media cancer-related information quality?

Findings from this review contribute to further supporting our understanding of new media’s role as a source of cancer-related health information. By adopting a longitudinal perspective, the study identifies trends in quality assessment studies using data from social media and AI chatbots and offers evidence-based recommendations for future research on this vital topic.


MethodsGuidelines and Ethical Considerations

This review was reported according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines (). Our protocol was retrospectively registered on PROSPERO (CRD420251058032) on May 23, 2025.

As a review study of published studies, our research is considered nonhuman participant research and is exempt from review by an Institutional Review Board.

Search Strategy

Between January and September 2024, we conducted a systematic literature search on four electronic databases (PubMed, Web of Science, Scopus, and Medline) for studies. Full strategies for the databases are available in Table S1 in . We only included studies that underwent peer review and publication between 2014 and 2023 and presented empirical qualitative, quantitative, or computational findings in English. We also reviewed the reference lists of included studies to identify any additional publications that met the inclusion criteria but were not retrieved in the original search. Three researchers conducted independent examinations of the selection criteria, search terms, and the records returned by the search. Disagreements were resolved by discussion among the three researchers until a consensus was reached.

Selection Criteria

We initially included studies if they met three broad categories of inclusion: (1) they contained a synthesis of cancer-related new media data, (2) they used an information quality assessment to rate the quality of data along varying criteria, and (3) they were published in English between 2014 and 2023. Cancer types were defined according to the International Classification of Diseases for Oncology. Likewise, we considered three broad categories of new media for inclusion: (1) text and graph-based media, such as Facebook, Instagram, Pinterest, Reddit, Twitter (now X), WeChat, and Weibo; (2) video-based media such as YouTube, TikTok, and Xigua; and (3) generative AI-based media, such as ChatGPT, Chatsonic, Microsoft Bing AI, and Perplexity.

We excluded studies that did not present original findings, unpublished manuscripts from preprint repositories such as arXiv, conference abstracts, narratives, literature reviews, and editorials. Studies were also excluded if they discussed the quality of information but did not elaborate on how it was assessed, for example, without conducting a comprehensive and systematic data collection and analysis, without referring to any quality criteria.

Assessments of Risk of Bias, Quality, and Certainty

Two researchers independently assessed the quality and risk of bias of the included studies using the Joanna Briggs Institute Critical Appraisal Checklist for Analytical Cross-Sectional Studies [] and the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) statement checklist []. Studies evaluated by the Joanna Briggs Institute were considered to have a yes or no risk of bias or to have some concerns in study design, conduct, and analysis. The STROBE checklist assessed and illustrated the quality of reporting in key sections of studies. Additionally, we recorded whether each study reported an ethical approval process, including compliance with human and animal rights standards, informed consent procedures, and data privacy protections.

Two researchers independently rated the certainty of evidence of outcome using GRADE (Grading of Recommendations Assessment, Development and Evaluation) via the GRADEpro Guideline Development Tool online platform, including assessment domains of risk of bias, inconsistency, indirectness, imprecision, and publication bias.

Disagreements between the two raters were discussed until a consensus was reached.

Data Extraction

We extracted the following data from the included studies: (1) authors, (2) year of publication, (3) journal name, (4) study design (ie, quantitative, qualitative, or computational), (5) sample size, (6) sample genders, (7) study location, (8) authors’ identities (individual or constitute, nonmedical professional or medical professional), (7) social media platform, (8) search strategies, (9) cancer types, (10) the language of the original data, (11) the profession of the raters, (12) contents of the posts, and (13) study results.

We extracted the information assessment tools used in the included studies, such as the DISCERN [] tool, along with the outcomes measured by each. Despite the variation in tools, most studies assessed at least one of the following: (1) overall quality: a holistic assessment of the posts; (2) technical criteria: an assessment of the layout and functionality of the data; (3) readability and understandability: an assessment of reading and comprehension level; (4) accuracy and misinformation: an evaluation of scientific validity of the content; (5) completeness and coverage: the scope of topics; (6) actionability and usefulness: an assessment of the practical nature of the content; (7) harmfulness: whether the content can cause or promote harm, and (8) commercial bias: evidence of the direct influence of commercial interests. We categorized the 75 studies based on the tools and metrics used to evaluate the quality of information and draw comparisons across studies. Data were extracted by one researcher and independently verified by a second researcher. All analyses were performed using Stata/SE (version 18.0; StataCorp LLC) and RStudio (version 4.3; Posit PBC).

AnalysisDescriptive Analysis

First, we illustrated several granular details for all studies about the search process, rating process, and review process. Second, we documented the frequencies and percentages of all mentioned media platforms, cancer types, and assessment tools for information quality, as well as changes in those patterns over the study period.

Statistical Analysis

For a single study reporting information quality for stratified groups, such as different posters (eg, individual or institute or with or without medical professional), content types (eg, prevention or treatment), or other specified categories, we calculated the pooled mean and pooled standard deviation. For a study that reported a median instead of a mean for a quality indicator, we estimated the mean and SD from the median [].

We standardized quality indicators across different scales by mapping them onto a proportional rate system. For example, a score of 2 in a 1-5 scale system would be standardized as 40% in the proportional rating system. Then those quality scores are classified as follows: “low quality” for scores within 0%-30%, “median quality” for scores within 30%-70%, and “high quality” for scores within 70%-100%. For negative indicators (misinformation, harmfulness, and commercial bias), the coding system was the opposite. To categorize the conclusion of each study based on quality indicators used, we adopted the following criteria: The overall conclusion of information in a study was deemed “positive” if it reported more “high-quality” than “low-quality” results. It was considered “negative” if it reported more “low quality” results than “high quality” results. Studies reporting exclusively “medium quality” results, or equal numbers of “high” and “low quality” results, were classified as “neutral.”

We then used ordinal logistic regression to estimate the associations between study conclusions and media type and cancer type, as well as characteristics of studies based on the search, rating, and report processes []. Regression models were weighted by the natural log of each study’s sample size [], since studies on some kinds of platforms, such as Twitter and Reddit, included larger sample sizes than others, such as YouTube. This analysis reported odds ratios (ORs), where estimates greater than 1 indicate a positive association between the predictor variables and higher quality conclusions.

Meta-Analysis

We used a meta-analysis of proportions to test the homogeneity of results obtained using the same quality assessment tool. Only results reported by three or more studies using the same tool were included in the analysis. Statistical heterogeneity was assessed via I2 and Cochran Q test values, where an I2 value of 25% represents low heterogeneity, 50% represents moderate heterogeneity, and 75% represents high heterogeneity. Analysis of proportions was pooled with a random-effect model with DerSimonian-Laird intervals when the test of heterogeneity was moderate or high []. For studies reporting indicators with 0% or 100% estimates, we used the Freeman-Tukey double arcsine transformation to stabilize the variance []. We calculated 95% CIs with an α level of .05 to estimate statistical significance.

Assessment of publication bias was done via an Egger test and a visual inspection of funnel plot asymmetry, respectively, for each indicator. Because publication bias was suspected, a sensitivity analysis was conducted using leave-one-out analysis and evaluate the influence of each study on the overall estimate.


ResultsOverall Characteristics of Included Studies

This review summarizes the body of literature on the quality of cancer-related information on new media, published from 2014 to 2023. Our initial search identified 7841 studies, including (1) 3001 records from PubMed, (2) 2032 records from Web of Science, (3) 1293 records from Scopus, and (4) 1515 records from Medline. After screening papers by title and abstract, 140 potentially eligible papers remained for full-text review. Through full-text review, we excluded 65 additional studies because they lacked analytic rigor or did not include quality of information as a component of results. Our final sample size comprised 75 individual studies. depicts our filtering process according to the PRISMA guidelines.

Figure 1. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart.

We present our results below in relation to each of our RQs.

(RQ1) What are the key characteristics of studies evaluating the quality of cancer-related information on social media and AI chatbots, and how have these characteristics evolved?

summarizes characteristics of included studies, including binary classifications of whether certain features were present or not. More than 80% of studies followed similar practices in the search process, including documenting their search period (70/75, 93%), search tool (69/75, 92%), query terms (73/75, 97%), and only considering one search tool (62/75, 83%), single platform (67/75, 89%). A total of 20 studies assessed content in languages other than English, including Chinese (n=6), German (n=5), Arabic (n=3), Japanese (n=2), French (n=2), and Spanish (n=2), and several other languages were examined in single studies, such as Korean, Italian, Turkish, Swedish, and Danish. During the rating process, 73 (97%) studies used a nonblinded rating process, 66 (88%) studies documented the number of raters, and 69 (92%) studies relied on more than one rater. However, across studies that reported content distributor information, 14 (19%) studies defined their rating criteria a priori, and 42 (56%) studies included a rater with a medical background. In the reporting process, more than 60% of studies reported “views,” “like,” or “unlike,” and “comment” behaviors, while a few tracked posts sharing behavior (11/75, 15%). About 70% of studies reported the posters’ identity, but few provided information on their demographics (17/75, 23%). Among the included studies, 67 (89%) reported the contents or topics discussed in the posts.

Table 1. Summary of characteristics across search, rating, and reporting processes of included studies (n=75).Study characteristicsYes, n (%)No, n (%)Other, n (%)Search process
Date or period mentioned70 (93)5 (7)N/Aa
Search tools mentioned69 (92)6 (8)N/A
More than one search tool used13 (17)62 (83)N/A
Search terms mentioned73 (97)2 (3)N/A
Initial hits reported41 (55)34 (45)N/A
Assessed language other than English20 (27)55 (73)N/A
More than one social media platform examined8 (11)67 (89)N/A
Cancer type more than one30 (40)45 (60)N/ARating process
Raters blinded for the source2 (3)73 (97)N/A
Number of raters reported66 (88)9 (12)N/A
More than one rater69 (92)6 (8)N/A
Rater independently38 (51)37 (49)N/A
Interrater reliability figures for evaluation determined34 (45)41 (55)N/A
Process graph contained34 (45)41 (55)N/A
Medical professional background for rater42 (56)17 (23)16 (21)b
A priori criteria defined for quality14 (19)52 (69)9 (12)cReporting process
Engagement: view46 (61)29 (39)N/A
Engagement: like or unlike56 (75)19 (25)N/A
Engagement: forward or share11 (15)64 (85)N/A
Engagement: comment47 (63)28 (37)N/A
Poster characteristics reported (gender, or age, or ethnicity, or country)17 (23)58 (77)N/A
Poster identity reported (personal, or institute, or medical professional)53 (71)22 (29)N/A
Contents or topics mentioned67 (89)8 (11)N/A

aNot applicable.

bRaters are authors of the study, and it was unclear if they were medical professionals.

cGenerally mentioned using literature or publication as criteria.

Results of Risk of Bias, Quality, and Certainty Assessments

A majority of studies demonstrated strong methodological quality, with 74 (98%) studies appropriately describing subjects and setting, 74 (98%) using standard condition measurement, all using valid outcome measurements, and using appropriate statistical analyses. However, we identified a considerable risk of bias in the handling of confounding factors; only 43 (57%) studies identified potential confounders and implemented strategies to address them (A). Most studies reported key elements, while 22 (29%) studies did not include other analyses (eg, analysis of subgroup and interactions and sensitivity analysis), and 23 (31%) studies did not list funding sources (B). Of the included studies, 34 (45%) reported ethical approval consideration, whereas 41 (55%) did not mention ethical approval. Among those that did report ethical approval, most stated that it was not required as the research only involved publicly available data, excluded human or animal participants, and removed identifiable information (eg, user IDs, links, and contact details), as is common with social media research. Only 3 studies obtained formal approval from research ethics committees [-]. Tables S2-S152 in [,-] present details of the risk of bias and quality evaluation for individual studies.

Figure 2. Distribution of risk of bias ratings and quality evaluation outcomes. (A) Risk of bias was assessed by the JBI Checklist. (B) Quality evaluation was assessed by the STROBE checklist. JBI: Joanna Briggs Institute; STROBE: Strengthening the Reporting of Observational Studies in Epidemiology.

Funnel plot and Egger test revealed significant asymmetry (bias coefficient –5.67, 95% CI –9.63 to –1.71; P=.006) only for studies evaluating misinformation, suggesting that studies could be missing in the literature that reported null or negative findings of misinformation or small study effects. No bias was detected for other indicators (Figures S1-S11 in ).

Since all included studies were observational in design, they were initially rated as low certainty according to the GRADE framework. Subsequently, the certainty of evidence for all evaluated outcomes was downgraded to very low due to one or more of the following reasons: high inconsistency (I²>75% in most outcomes), risk of indirectness (due to heterogeneous platforms or measurement definitions and tools), and, in some cases, imprecision (wide CIs and small sample sizes). These limitations significantly reduce confidence in the pooled estimates based on estimates. Details for each assessment and the reason for downgrading are in Table S153 in .

Individual Study CharacteristicsMedia Platforms

A illustrates the frequency of considered media platforms and changes in included platforms over time. A total of 20 studies investigated content from various text and graph platforms, including Twitter (n=11) [,,-], Facebook (n=6) [,-,,], Pinterest (n=5) [-,,], Reddit (n=4) [-,], and Instagram (n=2) [,], and 1 study each on WeChat [] and Weibo []. A total of 51 records studied video-based media; of these, 44 examined YouTube [,-], 6 focused on TikTok [-], and 1 study each on Bilibili [] and Xigua []. Overall, 4 records studied data from generative AI-chatbot media, all of which specifically included ChatGPT [-]; 2 studies also included Perplexity, Chatsonic, and Bing AI [,].

We observed changes in the studied platforms over time. From 2014 to 2016, the number of included social media platforms was limited, with only 1 study investigating content from YouTube and 2 on Twitter. The scope expanded in 2018 and 2019, with increased research on other text- and graphic-based media, including 3 Facebook studies, 1 Pinterest study, 1 Reddit study, and 1 Weibo study. From 2020 to 2022, the number of studies increased, with 26 YouTube studies, while each of the other text- and graphic-based media platforms maintained approximately 3-4 publications. In 2023, this trend persisted, with video-based media, especially YouTube, continuing to be the primary focus of cancer information quality research, as evidenced by 14 studies. This period also marked the emergence of research on generative AI chatbot media, which was the subject of 4 studies.

Figure 3. Frequencies and percentages of studies evaluating information quality across (A) media platforms, (B) cancer types, and (C) assessment tools (2014-2023). GQS: Global Quality Score; HONcode: Health on the Net Foundation Code of Conduct; JAMA-BC: Journal of the American Medical Association Benchmark Criteria; PEMAT-A: Patient Education Materials Assessment Tool for Actionability; PEMAT-U: Patient Education Materials Assessment Tool for Understandability. Cancer Types

We captured 17 unique types of cancer within our data with varying levels of emphasis (B). There were 15 research studied information about prostate cancer [,,,,,-,,,,,,,]; 13 about breast cancer; 9 about skin cancer [,,,,,,,,]; 9 about colorectal cancer [,,,,,,,,]; 7 about bladder cancer [,,,,,,]; 5 about testicular cancer [,,,,]; 4 each about lung cancer [,,,], thyroid cancer [,,,], and kidney cancer [,,,]; 2 each about liver cancer [,], leukemia [,], and brain cancer [,]; and 1 each about cervical [], gastric [], larynx [], pancreatic [], and spine [] cancers. Another 15 studies were about cancer information in general or did not specify the cancer type [,,,,,,,,,,,,,,]. Several studies were not limited to a single type of cancer but examined multiple types. Therefore, those studies were counted more than once.

Quality Assessment Tools

C highlights the presence of formal and informal information quality tools used to evaluate data in a given study. We observed that a majority of studies used DISCERN (44/75, 59%) or evaluated misinformation (44/75, 59%), followed by Global Quality Score (GQS; n=26) and Patient Education Materials Assessment Tool (PEMAT; n=16). Tool use evolved over time. In 2014-2019, misinformation was the most common instrument, used in 10 studies. From 2020 to 2022, both misinformation and DISCERN saw increased use, each featuring in 24 studies. This trend persisted in 2023, with DISCERN remaining prominent in 20 studies published that year.

Content Distributors

Across 54 studies that reported content distributor information, medical professionals, such as surgeons, physicians, endocrinologists, sonographers, radiologists, and dietitians, were frequently identified as key contributors. Video-based studies consistently reported that the most-watched or -engaged videos were published by such medical professionals. Notably, they produced 80% of top TikTok videos on thyroid cancer [,] and 80% of leading YouTube posts on spine tumors [] and hepatocellular carcinoma []. They also contributed over half of the top YouTube posts on rectal cancer surgery [], nutrition [], the mental health of patients with prostate cancer [], radioactive iodine therapy [], and larynx cancer [], as well as on liver cancer [] and gastric cancer [] in Chinese on TikTok.

Institutional contributors, such as hospitals, clinics, academic centers, and universities, also played a role. For instance, they produced 92% of the top YouTube videos on pediatric cancer clinical trials [] and 54% on Merkel cell carcinoma [], though their contributions were lower across other topics.

In contrast, two studies of TikTok revealed that nonmedical individuals, including patients and their families or friends, accounted for over 93% of top videos on gastric cancer in English and Japanese [] and 83% of prostate cancer videos []. Nonmedical contributors, such as media agencies, for-profit companies, herbalists, health websites, or lifestyle vloggers, contributed less than 61.1% of all topics.

Engagement Metrics

Studies analyzing video-based platforms frequently reported engagement metrics such as views, likes or dislikes, shares, and comments. Among 51 such studies, view counts ranged as low as 3 [] to as high as 11 million [], the latter for a thyroid cancer video posted by a medical professional on TikTok. YouTube videos on breast cancer and leukemia also reached high viewership, each exceeding 7 million views [,,]. In contrast, videos on prostate cancer (TikTok) and pediatric cancer clinical trials (YouTube) averaged around 2000 views [,].

Likes and comments were lower than view counts, as has been noted previously. The most-liked video was a TikTok post on thyroid cancer, which garnered 308,000 likes [,], followed by a breast cancer video on YouTube with approximately 226,000 likes []. The most “disliked” video addressed herbal cancer treatments in Arabic, receiving an average of 994 dislikes on YouTube []. Comment activity varied widely. Thyroid cancer videos on TikTok averaged 1252 comments, with the most-commented video amassing over 73,000 comments [,]. Nutrition and lung cancer videos each had over 450 comments [,].

Of nonvideo, text-based entries, Facebook demonstrated the highest overall engagement, particularly in posts related to common cancers, including breast, prostate, colorectal, and lung cancers, as well as dermatological and genitourinary cancers [-].

Factors Impacting Quality Conclusion

(RQ2) What factors influence the conclusion of quality assessments, and how do they vary across platforms, assessment tools, and cancer types?

The ordinal logistic regression analysis identified several factors associated with reporting higher quality conclusions (). Video-based media (OR 0.02, 95% CI 0.01-0.12), studies on rare cancers (OR 0.32, 95% CI 0.16-0.65), and studies on combined cancer types (OR 0.04, 95% CI 0.01-0.14) were less likely to yield high-quality conclusions than text-based media and studies on common cancers.

During the search process, studies using multiple search tools (OR 0.30, 95% CI 0.13-0.73), mentioning search terms (OR 0.13, 95% CI 0.02-0.81), reporting initial hits (OR 0.14, 95% CI 0.07-0.28), sourcing content in languages other than English (OR 0.35, 95% CI 0.16-0.76), and analyzing multiple media platforms (OR 0.06, 95% CI 0.02-0.27) were less likely to report higher quality conclusions.

During the rating process, studies disclosing the number of raters (OR 0.02, 95% CI 0.00-0.14) were less likely to report high-quality conclusions. In contrast, studies including process graphs (OR 3.06, 95% CI 1.61-5.79) and using literature-based assessments (OR 2.93, 95% CI 1.04-8.20) were more likely to report higher quality conclusions.

During the report process, studies reporting engagement metrics such as likes or dislikes (OR 3.35, 95% CI 1.20-9.38), forwards or shares (OR 6.17, 95% CI 1.61-23.65), and mentioning content or topics (OR 3.77, 95% CI 1.43-9.94), were associated with higher odds of reporting high-quality conclusions.

Table 2. Associations between study characteristics and quality of conclusions.Characteristics of the studyORa (95% CI)P valueMedia type (text-based media as reference)
Video-based0.02 (0.01-0.12)<.001
AIb-based0.70 (0.10-5.01).72Cancer typec (common cancer as reference)
Rare cancer0.32 (0.16-0.65) .002
Combined cancer0.04 (0.01-0.14)<.001Search process
Date or period mentioned1.75 (0.32-9.47).52
Search tools mentioned2.01 (0.46-8.89).36
More than one search tool used0.30 (0.13-0.73).008
Search terms mentioned0.13 (0.02-0.81).03
Initial hits reported0.14 (0.07-0.28)<.001
Assessed language other than English0.35 (0.16-0.76).009
Sites on more than one social media platform0.06 (0.02-0.27)<.001
Cancer type more than one1.41 (0.55-3.63).47Rating process
Raters blinded for the source11.16 (0.60-207.53).11
Number of raters reported0.02 (0.00-0.14)<.001
More than one rater5.21 (0.76-35.75).09
Rater working independently0.88 (0.46-1.67).69
Interrater reliability figures for evaluation determined1.22 (0.62-2.43).56
Process graph contained3.06 (1.61-5.79)<.001Medical professional background for rater (no professional as reference)
Author as raters0.43 (0.14-1.30).14
Professional0.52 (0.17-1.60).26A priori criteria defined for quality (no criteria mentioned as reference)
Based on literature2.93 (1.04-8.20).04
Specific criteria mentioned0.94 (0.41-2.14).88Reporting process
Engagement: view1.64 (0.40-6.70).49
Engagement: like or dislike3.35 (1.20-9.38).02
Engagement: forward or share6.17 (1.61-23.65).008
Engagement: comment1.97 (0.91-4.29).09
Poster characteristics reported (gender, or age, or ethnicity, or country)1.51 (0.63-3.64).36
Poster identity reported (personal, or institute, or medical professional)1.81 (0.77-4.28).18
Contents or topics mentioned3.77 (1.43-9.94).007

aOR: odds ratio.

bAI: artificial intelligence.

cUsing GLOBOCAN 2020 statistics, we coded the top 10 most common cancers as “common cancer,” and others as “rare cancer” [].

Information Quality Criteria

(RQ3) What patterns emerge in the assessment findings of new media cancer-related information quality?

Overall Quality

Overall quality refers to a holistic assessment of content, considering its alignment with current scientific standards and whether content achieves educational aims. The most commonly used tools were DISCERN (44/75, 59%) and GQS (26/75, 35%). DISCERN is a 16-item tool evaluating publication reliability, quality of information on treatment choices, and a singular overall quality rating. GQS uses a 5-point Likert scale from 1=poor to 5=excellent.

Overall, content provided by medical individuals and institutions, such as hospitals, physicians, and dietitians, received higher DISCERN scores than that from nonprofessional sources [,,,,,,,,,,,-]. Among these medical groups, hospitals provided higher-quality information than health organizations [], and doctors specializing in modern medicine consistently scored higher than those in traditional medicine []. However, an exception was noted in one study on bladder cancer information on YouTube, where content from medical professionals scored lower []. Interestingly, TikTok videos by news agencies sometimes outperformed medical providers in quality, attributed to the absence of confusing jargon []. Comparisons between for-profit and nonprofit sources yielded mixed results: some studies reported higher DISCERN scores for for-profit sources [,], whereas others, particularly on colorectal cancer on YouTube, found no significant differences [,].

Studies using the DISCERN tool identified varying scores along different criteria. A total of 8 studies reported the highest scores for “explicit aims” [,,,,,,,], 6 for “aims achieved” [,,,,,], and 4 for “benefits of treatments” [,,,]. The most common reason for score deductions was the lack of “additional sources of information,” reported in 7 studies [,,,,,,]. In total, 4 studies identified the lowest scores for failing to “describe what would happen if any treatment is not used” [,,,], and 3 studies noted deficiencies in “providing information source” [,,]. Additionally, 2 studies each noted the lowest scores for reporting “currency of information” [,], “reference to areas of uncertainty” [,], “risks of treatment” [,], and “quality of life” [,].

DISCERN-based assessments also revealed regional and linguistic variations in content quality. A study of gastric cancer TikTok videos found that Chinese-language content was of higher quality than English and Japanese videos []. However, no significant quality differences were observed across prostate and thyroid cancer videos in English, French, German, Italian, and Turkish on YouTube [,]. Geographic comparisons of English-language videos showed that content from the United States consistently ranked higher in quality than that from other locations [,].

Three studies comparing AI-chatbot media on a range of platforms found that they generally provided moderate to high-quality information [,,]. These chatbots frequently cited reputable sources such as the American Cancer Society and the Mayo Clinic.

Studies using GQS also found that quality was typically higher for videos produced by medical professionals [,,,,,,] and for-profit medical providers [,]. For instance, YouTube content posted by medical providers on pediatric cancer clinical trials [], liver cancer [], and skin cancer [] received GQS ratings of 4 or higher, indicating good quality. While YouTube videos on Merkel cell carcinoma [], breast cancer originating from Australia [], and breast cancer videos uploaded by medical advertisers on Xigua [] received GQS scores below 2, indicating poor quality.

Technical Quality

Technical quality is the evaluation of disclosure ethics. The commonly used Journal of the American Medical Association Benchmark Criteria (JAMA-BC) critically assesses web content based on four key “transparency criteria”: authorship, attribution, disclosure, and currency. This tool, which has a high score of 4.0, was applied in five studies of YouTube content. The highest JAMA-BC score, 2.6, was reported for spine tumor videos [], while the lowest score, 1.0, was for nutrition videos posted by independent users [], indicating minimal adherence to reliability standards. Across included studies, content was rated high for “authorship” when clear information about contributors and their credentials was provided; the “disclosure” and “currency” indicators were rated lowest, reflecting a lack of transparency regarding sponsorship, commercial funding, potential conflicts of interest, and dates of posted and updated information [,,].

Transparency of ethics for content was measured using other tools. Three studies used the Health on the Net Foundation Code of Conduct [], which uses eight constructs: (1) authority, (2) complementarity, (3) privacy, (4) attribution, (5) justification, (6) contact details, (7) financial disclosure, and (8) advertising policy. Studies found that most cancer content videos disclosed authority but few disclosed source information, conflicts of interest, financial sources, or advertisement policy [,,]. The Quality Evaluation Scoring Tool, used in one study, measures six aspects of web-based health information: (1) authorship, (2) attribution, (3) conflicts of interest, (4) currency, (5) complementarity, and (6) tone []. The study using the Quality Evaluation Scoring Tool examined TikTok videos on gastric cancer and found that Chinese-language videos scored higher than Japanese- and English-language videos []. Additionally, the Audiovisual Quality Score, which assesses the viewability, precision, and editing of audiovisual materials, revealed that larynx cancer videos from university sources showed clearer and more professional editing [].

Readability and Understandability

Readability and understandability are metrics used to determine how effectively audiences can process information. PEMAT-U is the first part of the PEMAT toolkit to assess the understandability of print and audiovisual materials, which consists of 13 questions measuring the understandability of content’s language, organization, and visual design []. A total of 18 studies adapted PEMAT-U scores for digital content, with a majority reporting high understandability (above 70%). In general, higher scores were attributed to content with a clear purpose and use of accessible language [,]; and lower scores were attributed to content that lacked summaries or educational visual aids to help people understand the content [,,,,]. The highest PEMAT-U score, 88%, was awarded to research on imaging information about prostate cancer on Instagram [], followed by thyroid cancer videos from TikTok []. The lowest PEMAT-U scores (below 30%) were found in Arabic-language YouTube videos on herbal cancer treatments [] and immunotherapy for renal cell cancer and prostate cancers []. Two studies using PEMAT-U found that content generated by AI chatbots often included medical jargon and concise terminology, making it difficult for lay audiences to understand [,].

Three studies measuring readability of content used the Flesch-Kincaid scale, which determines the average reading level needed to comprehend a written document on a continuum from 5 (signifying a fifth-grade reading level) to 16 (indicating a postgraduate reading level). One of these studies examined prostate cancer information on YouTube, reporting a 12th-grade readability level overall []. The other two focused on AI chatbots’ responses to cancer-related inquiries and found that the content was at a college-level readability [,].

Accuracy and Misinformation

Accuracy refers to the extent to which information aligns with established scientific or medical evidence. Terms such as misinformation, misleading content, false claims, and nonevidence-based claims are sometimes used to describe a lack of accuracy. Most studies did not use a standardized tool to evaluate accuracy. Rather, misinformation was typically assessed by identifying the proportion of content that deviated from scientific standards. Some studies applied predefined criteria, while others relied on expert reviewers to assess and classify content with subject experts.

One study analyzing the top 10 most-viewed YouTube videos on tetrahydrocannabinol oil and skin cancer concluded that all contained misinformation []. Similarly, a study of YouTube videos on prostate cancer flagged 76.25% as containing misinformation, with radiotherapy videos demonstrating less misinformation than surgery videos []. Over half of the videos on prostate cancer in Arabic [], and in English on skin cancer [], breast cancer [], immunotherapy for urological tumors [], and postsurgical exercise for breast cancer [] were also identified to contain misinformation.

The misuse or discrediting of health services was the most common type of misinformation flagged by 17 studies, encompassing inappropriate use or dismissal of treatments [,,,,,,,], vaccines [,,,], screenings [,,,,], and diagnostic tests [,,,]. For example, 25% of Facebook posts on acute lymphoblastic leukemia included disapproved treatment protocols and health services [], and 16.5% of Pinterest pins undermined the accuracy and safety of mammograms, advocating instead for alternatives such as ultrasound or thermography, and spreading false claims about bioidentical hormones and breast tumors []. One study found that 12.5% of AI chatbot-generated cancer treatment responses contained hallucinated therapies, such as immunotherapy, which was not clinically recommended []. Additionally, 42% of Arabic-language videos on breast cancer called for inadequate screening and treatment protocols [].

Another common type of misinformation mentioned in 15 studies involved unproven prevention and treatment modalities. For example, a study on Facebook pages about acute lymphoblastic leukemia found that all references to alternative and complementary therapies were related to unproven treatment modalities []. Similarly, 74.2% of alternative medicine content in dermatology-related posts was found to be misleading []. On Twitter, mentions of “alternative treatments” are often linked to external sources, such as hyperlinks, books, videos, or movies, without assessing the credibility of those materials []. Cannabis was frequently portrayed on Facebook and Twitter as an alternative cancer treatment, with 43.8% of such posts relying on anecdotal patient stories and over half using invalid scientific reasoning to support these claims [,,]. One TikTok study found that about 5% of videos featured supernatural or heroic powers as potential cancer treatments []. Other content touting laetrile and colloidal silver on Twitter, despite their potential toxicity and lack of cancer-fighting benefits [,], as well as spiritual healing [], acupuncture, chiropractic care, yoga [], and escharotic black salve [] as cancer treatments on social media.

Eight studies reported misinformation that overstated the effectiveness of certain foods and supplements in preventing or curing cancer. Information on social media falsely claimed some diet [,,,], various supplements [,,] (eg, flaxseed, turmeric, IGF-1, vitamin D, slippery elm, probiotics, and coconut oil), specific fruits [,] (eg, pomegranates and mushroom), herbs [,,] (eg, dandelions, curcumin, slippery elm, blood root, and gumby gumby), drinks [,,] (eg, green tea, miracle beverage, and apple cider vinegar) as “natural remedies,” “cancer pills,” or “cures” for cancers.

Scope and Completeness

Scope refers to the range of topics covered. illustrates the topics identified across the included studies. The most frequently covered topic was treatment (50 studies), encompassing content related to surgery, medication, technique, side effects, and alternative therapies. Background knowledge was the focus of 36 studies and included cancer definitions, pathology, etiology, anatomy, epidemiology, and prognosis. Prevention, mentioned in 35 studies, focused on strategies to reduce cancer risk, such as screening, lifestyle changes, dietary modifications, vaccination, and raising awareness. Diagnosis, which was addressed in 30 studies, refers to identifying cancer through symptoms, tests, staging, and clinical manifestations. Personal experiences and others were the focus of 25 studies, featuring patient stories, psychological support, relationships, news coverage, and emotional responses such as fear, anxiety, and depression.

Figure 4. Overview of cancer-related topics examined across media platforms. Larger circles denote topics that were more frequently examined. The topics are not mutually exclusive and may overlap.

Completeness refers to how thoroughly health-related content is presented on new media platforms. Nine studies assessed completeness using predefined criteria within each study. One YouTube study used the Sahin critical appraisal tool, which consists of 12 questions covering primary, secondary, and tertiary prevention levels for pancreatic cancer []. Another study used Hexagonal Radar Charts to illustrate content balance and found TikTok videos on genitourinary cancer content adequately covered symptoms and examinations but lacked information on definitions and outcomes []. Several studies suggested the need to include adverse outcomes to ensure completeness. An evaluation of ChatGPT responses applied an informed consent measurement [] found frequent omissions regarding treatments, risks, complications, quality-of-life impacts, and consequences of forgoing treatment for urological cancer []. Similarly, a study of YouTube videos on surgical treatments for spine tumors reported a greater emphasis on benefits than on complications or posttreatment sequelae, potentially biasing patients’ perceptions [].

A study of completeness based on creators found that academic YouTube channels provide more complete information in their videos about colorectal cancer screening than other publisher types []. Completeness also varied by language. In a study of gastric cancer videos on TikTok, Chinese-language content from educational and health professionals was more comprehensive, while English-language videos by individual creators were more complete than their Chinese or Japanese counterparts [].

Actionability and Usefulness

Actionability measures how well information enables individuals to take informed action, and usefulness indicates the extent to which the information benefits personal decision-making processes. The PEMAT-Actionability (PEMAT-A) tool, part of the PEMAT assessment, includes four strategies to evaluate this aspect, with higher scores indicating increased actionability []. PEMAT-A was used in 16 studies, 11 of which reported poor actionability (scores below 50%). Common reasons for low scores included missing figure interpretations and unclear or overly complex instructions [,]. There was some variation by cancer type. YouTube videos on urological and breast cancer [,] and TikTok videos on prostate cancer [] found no actionable content. In contrast, a study of YouTube videos on testicular cancer yielded 100% actionability []. Further, prostate cancer videos on YouTube reported 75% [], outperforming their TikTok counterparts, citing little or no actionability [].

Further, 14 studies assessed the usefulness of new media information, with 4 studies reporting that over 80% of videos on hepatocellular carcinoma [], Merkel cell carcinoma [], Arabic-language prostate cancer [], and Chinese-language gastric cancer [] were deemed ‘useful’. In contrast, only 20.51% of English and 17.46% of Japanese TikTok videos on gastric cancer were rated as ‘useful’ [], with similarly low rates (around 20%) for English and Japanese videos on skin, larynx, and gastric cancers [,,]. Usefulness also varied by content source. Among YouTube videos on rectal cancer surgery and colorectal cancer screening, approximately 15% of videos by nonprofit posters and only 5.39% of videos by for-profit posters were rated as ‘useful’, according to the criteria [,].

Harmfulness

Studies evaluating the harmfulness of media content usually consider four harm-related constructs: (1) harmful actions, (2) harmful inaction, (3) harmful interactions, and (4) economic harm. Five studies used an “informative harm” assessment, calculating the ratio of harmful to positive messages. Harmful inaction was the most commonly identified issue. In one study, it accounted for 73% of harmful content in Japanese tweets []. Another study found that 31% of harmful messages on widely shared social media studies promoted rejecting conventional cancer treatments in favor of unproven alternatives []. The same study reported that economic harm (eg, out-of-pocket costs for unproven treatments or travel) comprised 27.7% of harmful content, while harmful actions (eg, suggesting potentially toxic tests or treatments) accounted for 17% []. In a study on YouTube videos on basal cell carcinoma, all harmful messages originated from laypersons [].

Commercial Bias

Eight studies used commercial bias as a quality criterion by measuring the proportion of content originating from commercial, for-profit, or agenda-driven sources. On Pinterest, commercial bias was most prevalent in prostate cancer posts (14%), followed by bladder (7%) and kidney cancer (1%) []. On video-based platforms, commercial bias ranged from 10% to 27.33% [,,,]. Two studies specifically reported commercial bias in 13.2% and 17% of popular YouTube

Comments (0)

No login
gif