The proof-of-concept study demonstrated that ChatGPT 4.0, the most recent iteration of OpenAI’s large language model (LLM), achieved a passing score of 85% on a clinical neurology exam.
LLMs may have “significant applications” in clinical neurology, according to the authors of the study, contingent upon some refinement. A group of researchers from the German Cancer Research Center in Heidelberg and University Hospital Heidelberg published the results of the experiment on December 7.
ChatGPT 3.5 and ChatGPT 4.0, two LLMs, were evaluated in the examination conducted on May 31. The American Board of Psychiatry and Neurology provided a small subset of the questions from the European Board of Neurology for the neurology exam question bank.
“These findings suggest that with further refinements, large language models could have significant applications in clinical neurology.”
In contrast to its predecessor, ChatGPT 4.0, which achieved a score of 85% with 1662 accurate responses to 1956 questions, the latter achieved a score of 66.8%. Humans achieved a mean score of 73.8%. In cognitive, psychological, and behavioral-related queries, ChatGPT 4.0 exhibited superior performance compared to human users.
Given that academic institutions typically accept a passing score of 70%, it successfully “passed” the neurology exam by doing so. Nevertheless, the performance of both models was comparatively inferior when confronted with duties that demanded “higher-order thinking” as opposed to inquiries that were solely intended to encourage “lower-order thinking.”
Prior to their use in clinical neurology, the LLMs should undergo the following modifications, according to the study’s research team, There are still a few reservations.
We see our study more as a proof of concept for the capabilities of LLMs. There is still development needed and probably even specific fine-tuning of LLMs to make them properly applicable for clinical neurology.
Although the documentation and decision-making support systems could benefit from the application of LLMs, neurologists should exercise caution when implementing them in practice due to their current limitations when it comes to high-order cognitive tasks.
AI is already at work on significant healthcare duties, such as discovering a cure for cancer for AstraZeneca or combating the overprescription of antibiotics in Hong Kong, according to one of the study’s authors, Dr. Varun Venkataramani.