Comparative Evaluation of Multiplatform AI Performance on Practical Ophthalmology Exam Questions: Insights from the Brazilian Council of Ophthalmology Exam
Déborah Silva Nunes
State University of Pará/UEPA, Brazil and Bettina Ferro de Souza University Hospital/UFPA, Brazil.
Joacy Pedro Franco David
Federal University of the State of Pará/UFPA, Brazil and Ophthalmology at the Bettina Ferro de Souza Hospital/UFPA, Brazil.
José Jesu Sisnando D'Araujo Filho
Federal University of Pará (UFPA), Brazil, Federal University of the State of Rio de Janeiro, Brazil and Bettina Ferro de Souza Hospital, UFPA, Brazil.
Kelly Cristina Costa Guedes Nascimento
Federal University of the State of Pará/UFPA, Brazil and Bettina Ferro de Souza Hospital/UFPA, Brazil.
Igor Jordan Barbosa Coutinho
Federal University of the State of Pará/UFPA, Brazil.
Rebeca Andrade Ferraz
State University of Pará/UEPA, Brazil and Bettina Ferro de Souza University Hospital/UFPA, Brazil.
Maria Isabel Muniz Zemero
State University of Pará/UEPA, Brazil and Bettina Ferro de Souza University Hospital/UFPA, Brazil.
Syenne Pimentel Fayal
State University of Pará/UEPA, Brazil and Bettina Ferro de Souza University Hospital/UFPA, Brazil.
Ana Caroline Coelho dos Passos
Pará State University Center, Bettina Ferro de Souza University Hospital / UFPA, Brazil.
Luis Eduardo de Carvalho Barros
Federal University of Pará / UFPA, Brazil and Bettina Ferro de Souza University Hospital / UFPA, Brazil.
Rodrigo Rodrigues Virgolino
Federal University of Pará / UFPA, Brazil and Institute of Biological Sciences of the Federal University of Pará / UFPA, Brazil.
George de Almeida Marques
Estácio de Sá University, Brazil and Medical Skills at the AFYA Faculty of Medical Sciences (AFYA), Brazil.
Vitor Hugo Auzier Lima *
Federal University of Pará / UFPA, Brazil and Faculty of Education and Technology of Pará /FAETE, Brazil.
*Author to whom correspondence should be addressed.
Abstract
In recent years, advances in artificial intelligence (AI), especially with the emergence of natural language models and deep neural networks, have revolutionised medical practice, offering tools with the potential to assist both in diagnosis and specialised medical training. The main objective of this study was to evaluate the accuracy and agreement of different artificial intelligence (AI) models in solving practical questions from the Brazilian Council of Ophthalmology (CBO) Exam. To this end, the performances of 5 AI models (ChatGPT, Gemini, DeepSeek, Google AI Studio, and GROK) were analyzed in a set of 560 questions, distributed in eight thematic blocks of ophthalmology (Cornea, Cataract, Retina, Glaucoma, Neuro-ophthalmology, Optics and Refraction, Strabismus, and Plastic Surgery/Lacrimal Duct/Orbit). The answers were compared to the official answer key by calculating the percentage of correct answers and the Cohen's Kappa and Fleiss's coefficients of agreement. Cohen's Kappa coefficient was used to measure the agreement between the AI responses and the official template, as well as Fleiss's Kappa to measure the overall agreement between the different AIs. The most evident finding was that the Gemini model presented the highest accuracy rate (77.6%) and the highest overall agreement with the official answer key. Significant variation in performance between blocks was also observed, with greater accuracy in the Retina and Glaucoma themes, and lower accuracy in the Strabismus and Plastic Surgery blocks. The thematic analysis allowed us to identify the pattern of correct answers by speciality, revealing weaknesses of the models in areas with greater dependence on visual assessment and clinical subjectivity. In addition to a probable educational applicability of AIs, it proved to be viable as a complementary tool in medical training, especially when used under supervision and with defined pedagogical objectives. Therefore, it was concluded that, despite the limitations, the most up-to-date models trained based on specific clinical data were able to faithfully reproduce diagnostic reasoning in several areas of ophthalmology, evidencing their potential for integration into specialised education, as long as they are used with technical and ethical criteria. These findings suggest AI can serve as a supplementary tool in ophthalmic education, with caution in subjective specialities.
Keywords: Artificial intelligence, ophthalmology, medical education, board exam, computer-assisted diagnosis