Objective Assessing medical student performance in Objective Structured Clinical Examinations (OSCEs) is labor intensive, requiring trained evaluators to review 15 minute long videos. The physical examination period constitutes only a small portion of these videos. Automated segmentation of OSCE videos could significantly streamline the evaluation process by detecting this physical exam portion for targeted evaluation. Current video analysis approaches struggle with these long recordings due to computational constraints and challenges in maintaining temporal context. This study tests whether multimodal large language models (MMLLMs) can segment physical examination periods in OSCE videos without prior training, potentially easing the burden on both human graders and automated systems. Methods We analyzed 500 videos from five OSCE stations at UT Southwestern Simulation Center, each 15 minutes long, using hand-labeled physical examination periods as ground truth. MMLLMs processed video frames at one frame per second, classifying them into discrete activity states. A hidden Markov model with Viterbi decoding ensured temporal consistency across segments, addressing the inherent challenges of frame by frame classification. Results Using Viterbi decoding trained on just 50 hand-labeled videos (10 from each station), zero-shot GPT-4o achieved 99.8% recall and 78.3% intersection over union (IOU), effectively capturing physical examinations with an average duration of 175 seconds from 900-second videos, an 81% reduction in frames requiring review. Conclusions Integrating multimodal large language models with temporal modeling effectively segments physical examination periods in OSCE videos without requiring extensive training data. This approach significantly reduces review time while maintaining clinical assessment integrity, demonstrating that zero shot AI methods can be optimized for medical education's specific requirements. The technique establishes a foundation for more efficient and scalable clinical skills assessment across diverse medical education settings.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementAzure compute credits were provided to Dr. Jamieson by Microsoft as part of the Accelerating Foundation Models Research initiative.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
IRB of UT Southwestern Medical Center gave ethical approval for this work
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityData will not be available
Comments (0)