GPT-4 performed close to the level of expert doctors in eye assessments

As learning language models (LLMs) continue to advance, so do questions about how they can benefit society in areas such as the medical field. A recent study from the University of Cambridge's School of Clinical Medicine found that OpenAI's GPT-4 performed nearly as well in an ophthalmology assessment as experts in the field, the Financial Times first reported.

In the study, published in PLOS Digital Health, researchers tested the LLM, its predecessor GPT-3.5, Google's PaLM 2 and Meta's LLaMA with 87 multiple choice questions. Five expert ophthalmologists, three trainee ophthalmologists and two unspecialized junior doctors received the same mock exam. The questions came from a textbook for trialing trainees on everything from light sensitivity to lesions. The contents aren't publicly available, so the researchers believe LLMs couldn't have been trained on them previously. ChatGPT, equipped with GPT-4 or GPT-3.5, was given three chances to answer definitively or its response was marked as null. 

GPT-4 scored higher than the trainees and junior doctors, getting 60 of the 87 questions right. While this was significantly higher than the junior doctors' average of 37 correct answers, it just beat out the three trainees' average of 59.7. While one expert ophthalmologist only answered 56 questions accurately, the five had an average score of 66.4 right answers, beating the machine. PaLM 2 scored a 49, and GPT-3.5 scored a 42. LLaMa scored the lowest at 28, falling below the junior doctors. Notably, these trials occurred in mid-2023. 

While these results have potential benefits, there are also quite a few risks and concerns. Researchers noted that the study offered a limited number of questions, especially in certain categories, meaning the actual results might be varied. LLMs also have a tendency to "hallucinate" or make things up. That's one thing if its an irrelevant fact but claiming there's a cataract or cancer is another story. As is the case in many instances of LLM use, the systems also lack nuance, creating further opportunities for inaccuracy.

This article originally appeared on Engadget at https://www.engadget.com/gpt-4-performed-close-to-the-level-of-expert-doctors-in-eye-assessments-131517436.html?src=rss https://www.engadget.com/gpt-4-performed-close-to-the-level-of-expert-doctors-in-eye-assessments-131517436.html?src=rss
Établi 1y | 18 avr. 2024, 13:30:13


Connectez-vous pour ajouter un commentaire

Autres messages de ce groupe

Ayn reveals a Nintendo DS-style handheld that comes in the classic Game Boy Color purple

Ayn added more than just a touch of nostalgia with its upcoming dual-screen handheld that gives us modern-day Nintendo DS vibes. After teasing the device in a

24 août 2025, 21:20:23 | Engadget
You can now download and tweak Grok 2.5 for yourself as it goes open source

">Unhinged as Grok may be, it's now open source. xAI'

24 août 2025, 19:10:07 | Engadget
Sonos back-to-school sale: Headphones and speakers are up to 25 percent off

The back-to-school season isn't only a good time to save on things like a new laptop. Case in point: Sonos' bac

24 août 2025, 16:40:25 | Engadget
Get up to 35 percent off Anker wireless chargers ahead of Labor Day

Anker makes some of our favorite charging gear, and now you can save on a bunch of wireless power accessories from the brand. Whether you're going back to school soon or want a new charging station

24 août 2025, 16:40:23 | Engadget
The best Labor Day sales for 2025: Get up to 50 percent off tech from Apple, Anker, Shark and others

Labor Day marks the unofficial end to summer as the weather starts to get crisper and students head back to school f

24 août 2025, 14:20:27 | Engadget