How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study

dc.contributor.authorAzizoglu, Mustafa
dc.contributor.authorAydogdu, Bahattin
dc.date.accessioned2024-04-24T17:20:46Z
dc.date.available2024-04-24T17:20:46Z
dc.date.issued2024
dc.departmentDicle Üniversitesien_US
dc.description.abstractPurpose: The purpose of this study was to conduct a detailed comparison of the accuracy and responsiveness of GPT-3.5 and GPT-4 in the realm of pediatric surgery. Specifically, we sought to assess their ability to correctly answer a series of sample questions of European Board of Pediatric Surgery (EBPS) exam. Methods: This study was conducted between 20 May 2023 and 30 May 2023. This study undertook a comparative analysis of two AI language models, GPT-3.5 and GPT-4, in the field of pediatric surgery, particularly in the context of EBPS exam sample questions. Two sets of 105 (total 210) sample questions each, derived from the EBPS sample questions, were collated. Results: In General Pediatric Surgery, GPT-3.5 provided correct answers for 7 questions (46.7%), and GPT-4 had a higher accuracy with 13 correct responses (86.7%) (p=0.020). For Newborn Surgery and Pediatric Urology, GPT-3.5 correctly answered 6 questions (40.0%), and GPT-4, however, correctly answered 12 questions (80.0%) (p= 0.025). In total, GPT-3.5 correctly answered 46 questions out of 105 (43.8%), and GPT-4 showed significantly better performance, correctly answering 80 questions (76.2%) (p<0.001). Given the total responses, when GPT-4 was compared with GPT-3.5, the Odds Ratio was found to be 4.1. This suggests that GPT-4 was 4.1 times more likely to provide a correct answer to the pediatric surgery questions compared to GPT-3.5. Conclusion: This comparative study concludes that GPT-4 significantly outperforms GPT-3.5 in responding to EBPS exam questions.en_US
dc.identifier.doi10.3306/AJHS.2024.39.01.23
dc.identifier.endpage26en_US
dc.identifier.issn1579-5853
dc.identifier.issn2255-0569
dc.identifier.issue1en_US
dc.identifier.startpage23en_US
dc.identifier.urihttps://doi.org/10.3306/AJHS.2024.39.01.23
dc.identifier.urihttps://hdl.handle.net/11468/19240
dc.identifier.volume39en_US
dc.identifier.wosWOS:001157947600003
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.language.isoenen_US
dc.publisherReial Acad Medicina Illes Balearsen_US
dc.relation.ispartofMedicina Balear
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectChatgpten_US
dc.subjectPediatric Surgeryen_US
dc.subjectExamen_US
dc.subjectQuestionsen_US
dc.subjectArtificial Intelligenceen_US
dc.titleHow does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative studyen_US
dc.titleHow does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study
dc.typeArticleen_US

Dosyalar