Using machine learning to identify key subject categories predicting the pre-clerkship and clerkship performance: 8-year cohort study

1. INTRODUCTION

To become physicians, medical students must undergo a maturation process that is long and complicated; over this course, the pathway is paved with grit.1,2 A broad knowledge base in pre-med, basic medical science, and clinical medicine is necessary for the ideal doctor to provide effective patient care.3–5 To build a solid foundation, junior doctors should sequentially complete courses on medical humanities and basic medical science before entering their clerkships.6,7 In addition, the practice of clinical medicine requires a strong understanding of basic medical science, such as the pathophysiology of disease, thereby distinguishing physicians from other healthcare professionals.8 As clerkship is often considered the first transition point in medical education, clerkship performance is essential for their development.9 In this context, medical training curricula should aim to facilitate students’ learning, thus equipping future physicians for a smooth entry into clerkship.10

However, a previous survey showed that medical clerks were unclear about which subjects in the pre-med and basic medical science stages students needed to master to achieve high clerkship performance.11 Similarly, medical educators have found it challenging to determine which subjects should be included in pre-clerkship training to ensure a smooth transition into clerkship.12 In this context, subjects that could predict the clerkship performance are important for medical students to learn more efficiently to relieve their workload during an academic year.13

For medical students, clerkship performance appears to be related to academic performance at different stages.14 While previous study has identified predictive factors based on summative academic performance scores, few have explored the implications of individual subjects across medical programs.14,15 Still, there is insufficient evidence on whether and how clerkship performance can be predicted based on subjects that are taught during pre-med and basic medical science.15 As mentioned above, the relationships between different subjects, student performance, and the roles of those subjects in the maturation process experienced by medical students may have important implications for ensuring high clerkship performance. In this study, we set the following aims:

We wanted to identify whether the individual subject would affect medical student’s pre-clerkship performance and clerkship performance. We also wanted to explore the predictive ability of combining different sets of significant subjects in predicting the pre-clerkship performance and clerkship performance. 2. METHODS 2.1. Study design and data collection

This cohort study was conducted at the National Yang Ming Chiao Tung University and its formal teaching hospital, Taipei Veteran General Hospital. Data for this study were collected from the database of the institution. The dataset included records of 1162 medical students graduated from 2011 to 2019, and each contained their background characteristics, including age of graduation, gender, admission by examination, experience of failing a subject, experience of a suspension, and experience of failing the entrance exam to the medical school. Each dataset also included students’ academic performance from the pre-med stage to clinical clerkship. We collected students’ background characteristics and the individual score of all subjects (number of subjects = 21) in the pre-med stage, divided into two categories of the social science and basic science, and all subjects (number of subjects = 10) in the basic medical science stage, which belonged to the basic medical science category, as features for the machine learning. The workflow of machine learning consisted of three phases, including data collection, data preprocessing before machine learning, and techniques and performance of machine learning (Fig. 1).

Fig. 1:

Workflow of the ML process. AUC = area under the ROC curve; ML = machine learning; SMOTE = synthetic minority over-sampling technique.

2.2. Data preprocessing and content of the raw dataset

We used these features to predict two labels, including the average score of subjects in the basic medical science stage (pre-clerkship performance) and clinical clerkship stage (clerkship performance) by models based on different machine learning techniques. The pre-clerkship performance was predicted by selected subjects in the pre-med stage. On the other hand, the clerkship performance was predicted by combination of different grouped selected subjects (subjects in the pre-med stage, the basic medical science stage, and both the pre-med and basic medical science stages). Each label was divided into two groups according to its percentile score. Students who performed in the upper 50th percentile on the average score belonged to the high-performance group, and the other students belonged to the low-performance group. In a previous study, resampling was applied to enhance prediction model performance using imbalanced data from medical students.16 Because our collected data may have an imbalanced distribution, with most medical students distributed in the high-performance range, the data preprocessing focused on resampling methods and features selection to improve the performance of prediction models. Resampling methods of synthetic minority over-sampling technique (SMOTE) and Tomek link were used to address the inaccuracy generated by potential imbalanced data problems.17,18

2.3. Features selection based on the average AUC by 10-fold cross-validation

Each feature was used to predict two labels by prediction models based on different machine learning techniques. Because the average area under the ROC curve (AUC) was commonly used to assess model efficacy, filter methods were applied to reduce the number of features by a single variable classifier based on the AUC value.19 The features with an AUC >0.70 for either pre-clerkship performance or clerkship performance were selected. The selected features from the pre-med and basic medical science stages were combined using 10-fold cross-validation to generate new AUC values and enhance the performance of prediction models. The 10-fold cross-validation method was used to improve the estimated performance of the machine learning model. The dataset was divided into 90% training and 10% testing sets to optimally select features and calculate the performance of prediction models. The average AUC for each feature was calculated by 10 runs of the 10-fold cross-validation, where the classifier was performed on 90% of the data (training set), and the AUC was calculated on the remaining 10% of the data (testing set) in each run.

2.4. Techniques and performance of machine learning for prediction models and statistical analysis

During the techniques and performance of machine learning, we applied multiple machine learning techniques that included support vector machine (SVM), Gaussian Naïve Bayes (GNB), logistic regression (LR), random forest (RF), decision tree (DT), and eXtreme Gradient Boosting (XGBoost). Five metrics, including accuracy, precision, recall, the F1-measure, and the AUC were calculated by the 10-fold cross-validation method to measure the performance of prediction models based on different machine learning techniques. We plan to examine the performance of prediction models by selecting sets of relevant subjects in the pre-med stage to predict pre-clerkship performance and selecting sets of relevant subjects in the pre-med stage and basic medical science stage to predict clerkship performance, respectively. The machine learning models were developed by Scikit-learn version 1.0.2 with Python 9.1 in Visual Studio Code version 1.66.0 for Windows.

3. RESULTS 3.1. Background characteristics of medical students

The study included 1162 medical students who graduated from 2011 to 2019. The mean age of the graduating medical students was 26 years, and 63% of them were men (Table 1). Approximately one-fifth of the students had the experience of failing a subject (21%), as well as the entrance exam to the medical school (23%). Few students had experience of a suspension (3%).

Table 1 - Background characteristics of medical students
Background characteristics All students
(N = 1162) Mean ± SD Age of graduation 25.93 ± 2.42 No. (%) Gender Male 734 (63) Female 428 (37) Admission by examination a Yes 586 (50) Experience of failing a subjectb Yes 245 (21) Experience of a suspension Yes 37 (3) Experience of failing the entrance exam to the medical schoolc Yes 270 (23)

aIn Taiwan, the admission processes for most medical schools are the national entrance examination or interview admission processes by applications.

bThe experience of failing a subject was defined as failure in at least one course before the clinical clerkship.

cThe experience of failing the entrance exam to the medical school was defined as experience of either a re-examination or a re-application to the medical school.

3.2. Selected significant subjects with AUC >0.7 in the pre-med and basic medical science stage

After data preprocessing, features with an AUC >0.7 were selected from the initial dataset to construct the prediction models. For background characteristics, there were no features with AUC >0.7. For academic performance, 13 subjects were selected from the initial 21 subjects in the pre-med stage, and 10 subjects were selected from the initial 10 subjects in the basic medical science stage with AUC >0.7 (Table 2). Among the selected subjects, those with an AUC value over the top tertile within each subject category were identified as the selected significant subjects. For pre-clerkship performance, five selected significant subjects were identified from 13 selected subjects, which were chosen from the initial 21 subjects in the pre-med stage. These subjects included medical humanities (0.748), language (0.737), and economics (0.736) in the social science category of the pre-med stage, as well as chemistry (0.885) and physician scientist-related training (0.884) in the basic science category of the pre-med stage. For clerkship performance, seven selected significant subjects were identified from 23 selected subjects, which were chosen from the initial 31 subjects in the pre-med and basic medical science stages. These included medical humanities (0.719) and medical sociology (0.711) in the social science category, physician scientist-related training (0.865) and chemistry (0.793) in the basic science category of the pre-med stage, and pharmacology (0.922), immunology-microbiology (0.915), and histology (0.904) in the basic medical science category (stage).

Table 2 - Predictive ability of selected subjects for pre-clerkship performance and clerkship performance Selected subjects in different categories and stages AUC for predicting pre-clerkship performance AUC for predicting clerkship performance Pre-med (N = 13) Social science category Medical humanities 0.748 0.719 Medical sociology 0.735 0.711 Music-art-drama 0.715 0.669 Cultural anthropology 0.707 0.672 Language 0.737 0.686 Economics 0.736 0.686 Law course 0.690 0.678 Literature 0.731 0.682 Basic science category Chemistry 0.885 0.793 Physics 0.774 0.736 Calculus 0.762 0.702 Mathematics 0.735 0.702 Physician scientist-related training 0.884 0.865 Basic medical science (N = 10) Basic medical science category Pharmacology 0.922 Immunology-microbiology 0.915 Histology 0.904 Pathology 0.891 Embryology 0.880 Physiology 0.876 Anatomy 0.863 Neuroanatomy 0.853 Parasitology 0.852 Biochemistry 0.841

AUC = area under the ROC curve.

3.3. Prediction models using grouped selected significant subjects to predict academic performance

To predict students’ pre-clerkship performance and clerkship performance, prediction models using different machine learning techniques were created by combining selected subjects (Table 3). The performance of each model was evaluated by five metrics, and the predictive ability of each model was measured by AUC and accuracy.

Table 3 - Performance of the machine learning techniques using selected grouped subjects in the pre-med or basic medical science stage to predict pre-clerkship performance or clerkship performance Machine learning techniques of prediction models Precision Recall F1-score Predictive ability AUC Accuracy Grouped selected subjects in the pre-med stage (subjects = 13) to predict pre-clerkship performance SVM 0.759 0.804 0.748 0.883 0.785 GNB 0.842 0.825 0.795 0.833 0.840 LR 0.542 0.538 0.489 0.433 0.565 RF 0.788 0.817 0.762 0.775 0.780 DT 0.658 0.692 0.635 0.642 0.670 XGBOOST 0.750 0.792 0.716 0.782 0.730 Grouped selected subjects in the pre-med stage (subjects = 13) to predict clerkship performance SVM 0.739 0.746 0.726 0.803 0.747 GNB 0.774 0.752 0.735 0.838 0.765 LR 0.782 0.784 0.764 0.829 0.785 RF 0.764 0.755 0.741 0.825 0.765 DT 0.718 0.704 0.693 0.704 0.705 XGBOOST 0.778 0.772 0.753 0.815 0.773 Grouped selected subjects in the basic medical science stage (subjects = 10) to predict clerkship performance SVM 0.848 0.867 0.853 0.923 0.871 GNB 0.824 0.847 0.828 0.931 0.846 LR 0.767 0.773 0.763 0.851 0.787 RF 0.835 0.854 0.841 0.935 0.860 DT 0.769 0.789 0.774 0.789 0.800 XGBOOST 0.823 0.850 0.831 0.925 0.849 Grouped selected subjects in the pre-med stage (subjects = 13) and the basic medical science stage (subjects = 10) to predict clerkship performance SVM 0.764 0.755 0.746 0.711 0.888 GNB 0.852 0.868 0.852 0.848 0.929 LR 0.772 0.787 0.780 0.765 0.838 RF 0.882 0.888 0.887 0.881 0.950 DT 0.872 0.892 0.880 0.863 0.880 XGBOOST 0.872 0.879 0.876 0.865 0.938

Bold values indicate p < 0.05.

AUC = area under the ROC curve; DT = decision tree; GNB = Gaussian Naïve Bayes; LR = logistic regression; RF = random forest; SVM = support vector machine; XGBOOST = eXtreme Gradient Boosting.

In predicting pre-clerkship performance, machine learning technique of GNB achieved the highest accuracy (0.840) and AUC (0.833) using combined thirteen selected subjects in the pre-med stage. In addition, other metrics, including precision, recall, and the F1 measure, were the highest compared with other methods.

In predicting clerkship performance, machine learning technique of LR had the highest accuracy (0.785) and highest AUC (0.829) using combined 13 selected subjects in the pre-med stage, while methods of SVM achieved the highest accuracy (0.871) and highest AUC (0.923) using combined 10 selected subjects in the basic medical science stage. Notably, machine learning technique based on RF that combined 13 selected subjects from the pre-med and 10 selected subjects from the basic medical science stages achieved the highest accuracy (0.950) and AUC (0.881). For other metrics, including precision, recall, and F1-measure, the above-mentioned machine learning techniques all had the highest values compared with other methods.

4. DISCUSSION

As shown in Fig. 2, the selected subjects with an AUC >0.7 for either pre-clerkship performance or clerkship performance were found. Using a machine learning technique based on RF, the prediction model predicted clerkship performance with 95% accuracy and 88% AUC after combining thirteen selected subjects from the pre-med stage and 10 subjects selected from the basic medical science stage. In each subject category, medical humanities and sociology in social science, chemistry, and physician scientist-related training in basic science, and pharmacology, immunology-microbiology, and histology in basic medical science have predictive abilities for clerkship performance above the top tertile.

Fig. 2:

Summary for the process of screening grouped subject for performance prediction. AUC = area under the ROC curve.

Some of the studies have investigated several predictors of medical students’ clerkship performance, including their summative scores such as the grade point average during pre-med and pre-clerkship,14 scores on the medical school entrance exam,20,21 and the United States Medical Licensing Examination.20,22 Our findings demonstrated that many individual subjects in the pre-med and basic medical science stages could also be a significant predictor of clerkship performance. Furthermore, combinations of different sets of subjects in either stage or both stages achieved better predictive ability. We take the individual subjects with the highest predictability from different categories as examples. Medical humanities, a selected significant subject in the social science part of pre-med, helps medical students acquire the habit of humanism, gain familiarity with patient care, and develop resilience to deal with uncertainty during clerkship.23,24 In addition, physician scientist-related training mainly included epidemiology and biostatistics in our medical program, which were considered important in previous research for enhancing the practice and appraisal of evidence-based medicine (EBM) in clinical medicine.25 Pharmacology, a selected significant subject in the basic medical sciences, was deemed essential before beginning the clerkship because most specialties deal with common drugs and drug classes.26 Regarding background characteristics, our findings were in line with previous studies, which found that nonacademic factors did not significantly predict academic performance in medical school.21,27

Predictions in academic performance have recently drawn substantial attention in the medical education field, especially due to the potential of machine learning techniques that use advanced learning analytics.15,28,29 For different machine learning techniques, natural language processing was the most commonly used machine learning technique to predict medical students’ academic performance based on data extracted mostly from clinical notes with 75% positive prediction.15,30 A previous study has explored the prediction of national objective structured clinical examinations (OSCEs) during clerkship by machine learning technique of DTs with prediction accuracy of 76.7%.31 Our study applied multiple machine learning techniques, such as RF, SVM, and LR with data from score of individual subjects. Regarding the predictive ability of different machine learning techniques, a previous study investigated veterinary students to identify key subjects in predicting their final semester’s performance.32 The results revealed that the prediction model predicted students’ performance with 92% accuracy using a machine learning technique of SVM.32 In line with this study, our study showed comparable accuracy (95%) using RF to predict clerkship performance. To the best of our knowledge, we know of no other studies that have demonstrated whether clerkship performance can be predicted based on scores from single or grouped significant subjects during pre-med and basic medical science stages. Previous study reported that medical students were unclear on the knowledge that were necessary for clerkship, adding to student concern regarding the adequacy of their preparedness.9 Therefore, the finding of this study is important for medical students to understand how subjects or categories relate to clerkship performance and may increase their preparedness for the clerkship.

4.1. Future

With the help of machine learning, we evaluated the predictive ability of single or grouped significant subjects during pre-med and basic medical science for clerkship performance. The demonstrated predictive ability of subjects or categories in the medical program may enhance students’ preparedness for medical clerkships by facilitating their understanding of how subjects or categories of the medical program relate to their performance in the clerkship.

The data used in this study were obtained from a single medical school and may therefore not represent students at other medical schools in Taiwan. However, given the large sample size (N = 1162) analyzed by machine learning techniques, the results are still generalizable. A further limitation is that the predicted labels didn’t include the clinical performance after graduation. To the best of our knowledge, limited studies demonstrated the predictive ability of individual or grouped subjects for the academic performance of medical students. This may have a great impact on the curricular design of the medical program because the importance of different subjects was quantified by the predictive ability.

In conclusion, the most significant subjects in the pre-med and basic medical science stages were initially selected by their predictive ability measured with AUC values and further identified by whether their AUC values were above the tertile within each category. In each subject category, medical humanities and sociology in social science, chemistry, and physician scientist-related training in basic science, and pharmacology, immunology-microbiology, and histology in basic medical science have predictive abilities for clerkship performance above the top tertile. Clerkship performance was also predicted by the combination of different subject categories in the pre-med and basic medical science stages. Based on our findings, the demonstrated predictive ability of subjects or categories in the medical program may enhance students’ preparedness for medical clerkships by facilitating their understanding of how these subjects or categories of the medical program relate to their performance in the clerkship.

ACKNOWLEDGMENTS

The reported research was funded by grants from Taipei Veterans General Hospital (Grant number: V113C-024, V113D701-001-MY2-1, VTA113-A-4-2, and V113EA-005), Ministry of Science and Technology (Taiwan) (Grant number: NSTC 112-2314-B-A049-043-MY3, MOST-110-2511-H-A49A-504-MY3, MOST 110-2314-B-001-003, and MOST 111-2314-B-001-009).

We wish to express our gratitude to our diligent staff in the Department of Medical Education, Taipei Veterans General Hospital.

REFERENCES 1. Stern DT, Papadakis M. The developing physician—becoming a professional. N Engl J Med. 2006;355:1794–9. 2. Brennan N, Corrigan O, Allard J, Archer J, Barnes R, Bleakley A, et al. The transition from medical student to junior doctor: today’s experiences of Tomorrow’s Doctors. Med Educ. 2010;44:449–58. 3. O’Donnabhain R, Friedman ND. What makes a good doctor? Intern Med J. 2018;48:879–82. 4. Miles S, Leinster SJ. Identifying professional characteristics of the ideal medical doctor: The laddering technique. Med Teach. 2010;32:136–40. 5. Dickinson BL, Gibson K, VanDerKolk K, Greene J, Rosu CA, Navedo DD, et al. “It is this very knowledge that makes us doctors”: an applied thematic analysis of how medical students perceive the relevance of biomedical science knowledge to clinical medicine. BMC Med Educ. 2020;20:356. 6. Finnerty EP, Chauvin S, Bonaminio G, Andrews M, Carroll RG, Pangaro LN. Flexner revisited: the role and value of the basic sciences in medical education. Acad Med. 2010;85:349–55. 7. Ousager J, Johannessen H. Humanities in undergraduate medical education: a literature review. Acad Med. 2010;85:988–98. 8. Buja LM. Medical education today: all that glitters is not gold. BMC Med Educ. 2019;19:110. 9. Surmon L, Bialocerkowski A, Hu W. Perceptions of preparedness for the first medical clerkship: a systematic review and synthesis. BMC Med Educ. 2016;16:89. 10. Shacklady J, Holmes E, Mason G, Davies I, Dornan T. Maturity and medical students’ ease of transition into the clinical environment. Med Teach. 2009;31:621–6. 11. Cohen R. Preparation for clinical practice: a survey of medical students’ and graduates’ perception of the effectiveness of their medical school curriculum. Med Teach. 2006;28:e162–70. 12. Sharma M, Murphy R, Doody GA. Do we need a core curriculum for medical students? A scoping review. BMJ open. 2019;9:e027369. 13. Picton A. Work-life balance in medical students: self-care in a culture of self-sacrifice. BMC Med Educ. 2021;21:8. 14. Salem RO, Al-Mously N, AlFadil S, Baalash A. Pre-admission criteria and pre-clinical achievement: can they predict medical students performance in the clinical phase? Med Teach. 2016;38:S26–30. 15. Dias RD, Gupta A, Yule SJ. Using machine learning to assess physician competence: a systematic review. Acad Med. 2019;94:427–39. 16. Mohseni Z, Martins RM, Milrad M, Masiello I. Improving classification in imbalanced educational datasets using over-sampling. In: Proceedings of the 28th International Conference on Computer in Education. Asia-Pacific Society for Computers in Education; 2020;1;278–83. 17. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57. 18. Swana EF, Doorsamy W, Bokoro P. Tomek Link and SMOTE approaches for machine fault classification with an imbalanced dataset. Sensors. 2022;22:3246. 19. Zaffar M, Hashmani MA, Savita K, Rizvi SSH. A study of feature selection algorithms for predicting students academic performance. Int J Adv Comput Sci Appl. 2018;9:541–9. 20. Casey PM, Palmer BA, Thompson GB, Laack TA, Thomas MR, Hartz MF, et al. Predictors of medical school clerkship performance: a multispecialty longitudinal analysis of standardized examination scores and clinical assessments. BMC Med Educ. 2016;16:128. 21. Žuljević MF, Buljan I. Academic and non-academic predictors of academic performance in medical school: an exploratory cohort study. BMC Med Educ. 2022;22:366. 22. Cortez AR, Winer LK, Kim Y, Hanseman DJ, Athota KP, Quillin III RC. Predictors of medical student success on the surgery clerkship. Am J Surg. 2019;217:169–74. 23. Cohen LG, Sherif YA. Twelve tips on teaching and learning humanism in medical education. Med Teach. 2014;36:680–4. 24. Wald HS, McFarland J, Markovina I. Medical humanities in medical education and practice. Med Teach. 2019;41:492–6. 25. Abou Dargham N, Sultan Y, Mourad O, Baidoun M, Hosn OA, Abou El Naga A, et al. Perception of biostatistics by Lebanese medical students: a cross-sectional study. Alex J Med. 2021;57:103–9. 26. Norris ME, Cachia MA, Johnson MI, Rogers KA, Martin CM. Expectations and perceptions of students’ basic science knowledge: through the lens of clerkship directors. Med Sci Educ. 2020;30:355–65. 27. Urlings-Strop LC, Stegers-Jager KM, Stijnen T, Themmen AP. Academic and non-academic selection criteria in predicting medical school performance. Med Teach. 2013;35:497–502. 28. Albreiki B, Zaki N, Alashwal H. A systematic literature review of student’ performance prediction using machine learning techniques. Educ Sci. 2021;11:552. 29. Chan T, Sebok-Syer S, Thoma B, Wise A, Sherbino J, Pusic M. Learning analytics in medical education assessment: the past, the present, and the future. AEM Educ Train. 2018;2:178–87. 30. Spickard III A, Ridinger H, Wrenn J, O’brien N, Shpigel A, Wolf M, et al. Automatic scoring of medical students’ clinical notes

View original article

JOURNAL OF THE CHINESE MEDICAL ASSOCIATION

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Using machine learning to identify key subject categories predicting the pre-clerkship and clerkship performance: 8-year cohort study

Comments (0)