Unmet Needs in Acute Hepatic Porphyria Diagnosis: A Comparative Big Data Analysis of an AI-based Human-in-the-Loop Screening Versus Standard of Care

Summary

Background Acute Hepatic Porphyria (AHP) is a rare genetic disease characterized by unpredictable life-threatening attacks. There is no reliable biochemical screening test for patients outside of an attack and diagnosis is delayed on average by 15 (!) years. AI screening systems can assist in detecting AHP patients, but validating such systems is challenging, due to the limited number of suspected candidates and/or success of recalling such candidates for testing. At the same time no study to date has highlighted human oversight of AI screening tools, while governing bodies and medical device regulations call for it to be allowed for clinical use. Our primary goal was to demonstrate the feasibility of an AI-based Human-in-the-Loop screening (HAI) approach and quantifying the added value by comparing the rate and number of clinically plausible cases found through it with the current Standard of Care (SOC).

Methods This retrospective cohort study included data collected from 899,862 electronic health records (EHR) of patients who were treated at the University Hospital Salzburg (SALK) between December 2007 and December 2021. For our HAI approach we used an AI-tool for disease screening (Dx EHRs v2022.11) provided by Symptoma GmbH, that has been validated for Pompe Disease in a previous study. All historically suspected and diagnosed AHP cases retrieved from the collected data served as the reference standard representing the SOC. All suspected AHP cases were first triaged by generalist physicians (GP) without a specialization for AHP representing the “Humans in the Loop”. Specialized physicians (SP) determined the clinical plausibility of cases by reviewing the complete EHRs of the triaged cases. The primary outcome were the rates of clinically plausible cases (=precision) in the HAI and SOC cohorts and its sub-cohorts. Additionally, we investigated the differences in phenotypes in those cohorts. Historically diagnosed AHP cases were reviewed by SP for the reliability of their diagnosis.

Findings Of a total of 899,862 EHRs, 191 EHRs were triaged into the HAI cohort and 107 filtered into the SOC cohort. 74 (38.74%) and 28 (27.72%) cases were deemed clinically plausible, for HAI and SOC respectively. Of those 74 clinically plausible cases in HAI, 46 were de-novo cases missed by SOC. The sub-analysis on the phenotypical features indicated that psychological and psychosomatic symptoms (Restlessness, Confusion, Anxiety, Depression, Mood swings, Palpitations) are significantly underrepresented within historically suspected AHP cases. As well were some common and subtle symptoms (Pain, Nausea, Vomiting, Fatigue). Among 16 historically diagnosed cases, four were reclassified as misdiagnosed, and seven lacked conclusive evaluation by current diagnostic standards. Notably, two new AHP cases were identified “incidentally” during the study, with a Poisson probability of 8.34% for this event to happen, suggesting this occurrence was unlikely to be random.

Interpretation AHP is incredibly hard to diagnose and even already made AHP diagnoses are unreliable. Additionally, certain phenotypes are especially challenging to identify via the current standard of care. HAI managed to reach a higher precision compared to the SOC and found an additional 46 clinically plausible de-novo cases. Both showing feasibility and added value of HAI. “Incidentally” newly diagnosed AHP patients strongly suggest an increase in awareness through the AI screening project. All our findings suggest that HAI is a viable approach addressing the challenge of early diagnosis of AHP and its adherent issues. Prospective studies in a setting as real-time decision support at the point of care are warranted as a next step to implementing HAI as part of the new standard of care.

Funding Alnylam Pharmaceuticals

Evidence before this study We systematically searched PubMed for articles published from database inception up to November 11, 2024, using the terms: ("Artificial Intelligence" OR "AI" OR "Machine Learning" OR "Deep Learning" OR "Neural Networks") AND ("Screening" OR "Diagnosis" OR "Detection") AND ("Rare Disease" OR "Orphan Disease" OR "Uncommon Condition" OR "Low Prevalence Disease" OR "Acute Hepatic Porphyria" OR "AHP" OR "Porphyria") AND ("Human-in-the-loop" OR "Human-assisted" OR "Human-centered" OR "Augmented intelligence" OR "Human-machine collaboration" OR "Human-computer interaction" OR "Hybrid intelligence" OR "Human-supervised" OR "Human oversight" OR "Collaborative AI" OR "Human-guided"). This search yielded no matches, suggesting that no dedicated studies investigating human-in-the loop AI approaches have been performed to date, neither for AHP, nor for rare diseases as a whole.

We further adapted the search to look for AI screening approaches in general for Acute Hepatic Porphyria (AHP) specifically, by eliminating the search terms for rare diseases and human-in-the-loop. This search yielded 23 articles of which only two were actually related to the disease AHP. Those two studies aimed at testing AI screening systems for AHP patients but rather focused on successfully detecting AHP patients by using AI only, than comparing performances to the standard of care or highlighting human oversight. In those studies, no new cases could be found which appeared to be mainly due to the limited number of suspected candidates and/or success of recalling such candidates for testing which combined with the ultra-rare prevalence of AHP created unfavorable odds of finding de-novo cases. This stresses the incredible challenges when diagnosing AHP, but also in extension when validating new AI systems supporting such diagnosis. As such more validation in general, but much more direct comparison with the current state of the art is needed to ensure the effectiveness and safety of AI-driven diagnostic support systems for AHP. Hence, we did a retrospective cohort study aimed at evaluating the effectiveness of an AI-based Human-in-the-Loop screening (HAI) approach compared to the standard of care (SOC) in diagnosing AHP cases. Our study design addresses the challenges in validation and simultaneously puts a spotlight on human oversight, which should both contribute accelerating the adoption of AI screening to assist in rare disease diagnosis.

Added value of this study To the best of our knowledge this is the first study to date to investigate the added value of a Human-in-the-Loop AI screening concept compared to the SOC for AHP or rare diseases. Explainable AI and particularly human oversight have been urgently demanded by governing bodies and regulators. The EU Artificial Intelligence Act even prescribes human oversight for any AI system designed to be used as a medical device in the European Union, which is founded in the desire to make AI applications as safe and reliable as possible. We present the results of our current study to highlight the feasibility and added value of the HAI concept as well as its potential as prospective real-time screening at the point of care. We further present findings of additional in-depth analyses delineating shortcomings regarding the diagnosis of AHP in the current SOC. Providing such evidence should serve as the bedrock to justify the considerable efforts necessary for implementation, but most of all, shorten the time to diagnosis effectively by using AI for rare disease patients suffering from diseases like AHP.

Implications of all the available evidence Real-time decision support at the point of care has been identified as a pivotal lever to improve early diagnosis of AHP. At the same time eHealth infrastructure is being reformed and many initiatives around the world strive for higher levels of maturity. This paves the foundation for centrally orchestrated digital and AI-driven tools. Thus, scalable support systems exploiting this infrastructure are required and need to be validated. In our study, our proposed HAI screening approach showed a higher plausibility rate in suspected cases compared to the SOC and statistical testing could not find a statistically significant difference between the screening methods. Auxiliary findings in our study suggested that AI screening might create additional awareness at the point of care, which proves to be an effective agent in improving the diagnosis of AHP. Further, we found that there might be certain phenotypes that are harder to identify as AHP cases and that already made AHP diagnoses are unreliable. All our findings suggest that HAI is a viable approach addressing the challenge of early diagnosis and its adherent issues. They warrant further prospective studies in a setting as real-time decision support at the point of care while strongly advocating for an implementation into the clinical routine.

Competing Interest Statement

SL, MK, and KB are current or former employees of Symptoma GmbH, the provider of the technology used in this study. JN and TL hold shares in Symptoma GmbH. GS has received support for attending meetings and travel from Alnylam Pharmaceuticals. VP declares no commercial or financial relationships that could be construed as a potential conflict of interest.

Funding Statement

This study was funded by Alnylam Pharmaceuticals. Alnylam Pharmaceuticals was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The Institutional Review Board of the Federal State of Salzburg ("Ethikkommission Land Salzburg") gave ethical approval for this work (EK Nr. 1006/2023).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The source data used for this study comprise electronic health record (EHR) data containing protected health information under European GDPR from patients under care at University Hospital Salzburg (SALK). Due to legal agreements with the providing institution (SALK), the anonymized datasets analyzed during the study are not publicly available. Aggregated data of anonymized datasets in tabular format can be made available upon reasonable request to the corresponding author, subject to institutional approval. Requests to access these datasets should be directed at: sciencesymptoma.com.

Full details regarding the implementation and setup of the AI tool (Dx EHRs v2022.11) are available from Symptoma GmbH upon request. Investigators with expertise in the field should be able to reproduce the described methods on their own data to validate the results of this study.

Comments (0)

No login
gif