Evaluating large language models on hospital health data for automated emergency triage

Huang F, Sun X, Mei A, Wang Y, Ding H, Zhu T (2024) LLM plus machine learning outperform expert rating to predict life satisfaction from self-statement text. In: IEEE Transactions on Computational Social Systems. pp 1–8. https://ieeexplore.ieee.org/document/10729240

Singh G, Bali K K (2024) Enhancing Decision-Making in Optimization through LLM-Assisted Inference: A Neural Networks Perspective. In: 2024 International Joint Conference on Neural Networks (IJCNN). IEEE, Yokohama, pp 1–7

Costello TH, Pennycook G, Rand DG (2024) Durably reducing conspiracy beliefs through dialogues with AI. Science 385:eadq1814. https://doi.org/10.1126/science.adq1814

Article  CAS  PubMed  Google Scholar 

Johnson A, Bulgarelli L, Pollard T, Horng S, Celi LA, Mark R (2023) MIMIC-IV (version 2.2). https://physionet.org/content/mimiciv/2.2/

Goldberger AL, Amaral L, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101:e215–e220

CAS  PubMed  Google Scholar 

Ma MD, Ye C, Yan Y, Wang X, Ping P, Chang TS, Wang W (2024) CliBench: Multifaceted evaluation of large language models in clinical decisions on diagnoses, procedures, lab tests orders and prescriptions . http://arxiv.org/abs/2406.09923. ArXiv:2406.09923 [cs]

Wu C, Qiu P, Liu J, Gu H, Li N, Zhang Y, Wang Y, Xie W (2024) Towards evaluating and building versatile large language models for medicine . http://arxiv.org/abs/2408.12547. ArXiv:2408.12547

Aali A, Van Veen D, Arefeen YI, Hom J, Bluethgen C, Reis EP, Gatidis S, Clifford N, Daws J, Tehrani AS, Kim J, Chaudhari AS (2024) A dataset and benchmark for hospital course summarization with adapted large language models. J Am Med Inf Assoc 32:ocae312. https://doi.org/10.1093/jamia/ocae312

Article  Google Scholar 

Savage T, Wang J, Shieh L (2023) A large language model screening tool to target patients for best practice alerts: development and validation. JMIR Med Inform 11:e49886

PubMed  PubMed Central  Google Scholar 

Bi B, Liu L, Perez-Concha O (2024) Adapting large language models for automated summarisation of electronic medical records in clinical coding. Stud Health Technol Inf 318:24–29

Masanneck L, Schmidt L, Seifert A, Kölsche T, Huntemann N, Jansen R, Mehsin M, Bernhard M, Meuth SG, Böhm L, Pawlitzki M (2023) Triage performance across large language models, ChatGPT, and untrained doctors in emergency medicine: comparative study (Preprint). http://preprints.jmir.org/preprint/53297

Williams C, Zack T, Miao BY, Sushil M, Wang M, Kornblith AE, Butte AJ (2024) Use of a large language model to assess clinical acuity of adults in the emergency department. JAMA Netw Open 7:e248895. https://doi.org/10.1001/jamanetworkopen.2024.8895

Article  PubMed  PubMed Central  Google Scholar 

Haim GB, Saban M, Barash Y, Cirulnik D, Shaham A, Eisenman BZ, Burshtein L, Mymon O, Klang E (2024) Evaluating large language model-assisted emergency triage: a comparison of acuity assessments by GPT-4 and medical experts. J Clin Nurs. https://doi.org/10.1111/jocn.17490

Article  PubMed  Google Scholar 

Gaber F, Shaik M, Allega F, Bilecz AJ, Busch F, Goon K, Franke V, Akalin A (2025) Evaluating large language model workflows in clinical decision support for triage and referral and diagnosis. NPJ Digit Med 8:263. https://doi.org/10.1038/s41746-025-01684-1

Article  PubMed  PubMed Central  Google Scholar 

Johnson A, Bulgarelli L, Pollard T, Celi LA, Mark R, Horng S (2023) MIMIC-IV-ED. https://physionet.org/content/mimic-iv-ed/2.2/

Kim MJ, Grinsztajn L, Varoquaux G (2024) CARTE: pretraining and transfer for tabular learning. In: Proceedings of the 41st International Conference on Machine Learning, ICML’24 (JMLR.org, 2024). Vienna, Austria

Hollmann N, Müller S, Purucker L, Krishnakumar A, Körfer M, Hoo SB, Schirrmeister RT, Hutter F (2025) Accurate predictions on small data with a tabular foundation model. Nature 637:319–326. https://doi.org/10.1038/s41586-024-08328-6

Article  CAS  PubMed  PubMed Central  Google Scholar 

Radwan A, Amarneh M, Alawneh H, Ashqar HI, AlSobeh A, Magableh A (2024) Predictive analytics in mental health leveraging LLM embeddings and machine learning models for social media analysis. Int J Web Serv Res 21:1–22. https://doi.org/10.4018/IJWSR.338222

Article  Google Scholar 

Mahdisoltani F, Biega J, Suchanek FM (2015) YAGO3: A knowledge base from multilingual wikipedias. CIDR. https://asiabiega.github.io/papers/yago3_cidr2015.pdf

Comments (0)

No login
gif