How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study

Azizoglu, Mustafa; Aydogdu, Bahattin

How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study

dc.contributor.author	Azizoglu, Mustafa
dc.contributor.author	Aydogdu, Bahattin
dc.date.accessioned	2024-04-24T17:20:46Z
dc.date.available	2024-04-24T17:20:46Z
dc.date.issued	2024
dc.department	Dicle Üniversitesi	en_US
dc.description.abstract	Purpose: The purpose of this study was to conduct a detailed comparison of the accuracy and responsiveness of GPT-3.5 and GPT-4 in the realm of pediatric surgery. Specifically, we sought to assess their ability to correctly answer a series of sample questions of European Board of Pediatric Surgery (EBPS) exam. Methods: This study was conducted between 20 May 2023 and 30 May 2023. This study undertook a comparative analysis of two AI language models, GPT-3.5 and GPT-4, in the field of pediatric surgery, particularly in the context of EBPS exam sample questions. Two sets of 105 (total 210) sample questions each, derived from the EBPS sample questions, were collated. Results: In General Pediatric Surgery, GPT-3.5 provided correct answers for 7 questions (46.7%), and GPT-4 had a higher accuracy with 13 correct responses (86.7%) (p=0.020). For Newborn Surgery and Pediatric Urology, GPT-3.5 correctly answered 6 questions (40.0%), and GPT-4, however, correctly answered 12 questions (80.0%) (p= 0.025). In total, GPT-3.5 correctly answered 46 questions out of 105 (43.8%), and GPT-4 showed significantly better performance, correctly answering 80 questions (76.2%) (p<0.001). Given the total responses, when GPT-4 was compared with GPT-3.5, the Odds Ratio was found to be 4.1. This suggests that GPT-4 was 4.1 times more likely to provide a correct answer to the pediatric surgery questions compared to GPT-3.5. Conclusion: This comparative study concludes that GPT-4 significantly outperforms GPT-3.5 in responding to EBPS exam questions.	en_US
dc.identifier.doi	10.3306/AJHS.2024.39.01.23
dc.identifier.endpage	26	en_US
dc.identifier.issn	1579-5853
dc.identifier.issn	2255-0569
dc.identifier.issue	1	en_US
dc.identifier.startpage	23	en_US
dc.identifier.uri	https://doi.org/10.3306/AJHS.2024.39.01.23
dc.identifier.uri	https://hdl.handle.net/11468/19240
dc.identifier.volume	39	en_US
dc.identifier.wos	WOS:001157947600003
dc.identifier.wosquality	N/A
dc.indekslendigikaynak	Web of Science
dc.language.iso	en	en_US
dc.publisher	Reial Acad Medicina Illes Balears	en_US
dc.relation.ispartof	Medicina Balear
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Chatgpt	en_US
dc.subject	Pediatric Surgery	en_US
dc.subject	Exam	en_US
dc.subject	Questions	en_US
dc.subject	Artificial Intelligence	en_US
dc.title	How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study	en_US
dc.title	How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study
dc.type	Article	en_US

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu

How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study

Dosyalar

Koleksiyon