Application of Generative Artificial Intelligence to Utilise Unstructured Clinical Data for Acceleration of Inflammatory Bowel Disease Research

Abstract

ABSTRACT: Background: Inflammatory bowel disease (IBD) research is a dynamic field. However, the growing volume of electronic health records (EHRs) and research data presents significant challenges. Traditional methods for structuring unstructured medical records are labour-intensive and lack scalability. Large language models (LLMs) may present a solution, yet their usefulness in data standardisation in the context of IBD remains unknown. Objective: To evaluate the use of LLMs in structuring free-text histology and radiology reports from IBD patients, compare their performance to manual clinician curation, and assess the usefulness of fine-tuning and retrieval-augmented generation (RAG). Design: We developed an IBD-specialised LLM-based framework utilising structured prompt engineering and fine-tuning. Reports were manually curated and processed using various LLMs. Performance was assessed and RAG was used to enhance model responses with clinical guidelines from European Crohn's and Colitis Organisation (ECCO) and the European Society for Paediatric Gastroenterology Hepatology and Nutrition (ESPGHAN). Results: Overall, Llama 3.3 achieved the highest F1 for histology and imaging (1 ± 0 and 0.85 ± 0.29, respectively) in extracting findings and anatomical regions, surpassing other models in structured data generation. Fine-tuning improved the performance of the smaller Llama 3.1 8B model for imaging reports (0.7 ± 0.46 vs 0.82 ± 0.35), enabling better extraction with reduced computational requirements. Conclusion: Our findings demonstrate the feasibility of LLM-based automated structuring of IBD-related medical records. Unstructured data from free text reports can be reliably converted to standardised ontologies with location, severity, and qualifiers. These advancements enable scalable, privacy-compliant AI-driven solutions for data standardisation.

Competing Interest Statement

JJA is a SAB member for Orchard Therapeutics.

Funding Statement

This study was supported by the Institute for Life Sciences, University of Southampton, and the NIHR Southampton Biomedical Research Centre and EPSRC (EP/Y01720X/1). JJA is funded by a NIHR advanced Fellowship (NIHR302478). ZG is funded by a CICRA research training fellowship.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethics committee/IRB of University Hospital Southampton gave ethical approval for this work (REC 09/H0504/125)

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Comments (0)

No login
gif