Optimized prompts and GPT-4 enable ChatGPT to achieve passable scores on Japan's National Medical Examination! -January 12, 2010 Kanazawa University_森林舞会

Associate Professor Akihiro Nomura from Faculty of Transdisciplinary Sciences for Innovation, Institute of Transdisciplinary Sciences for Innovation, Kanazawa University and MICIN Corporation has developed a prompt (※1) optimized for having ChatGPT solve the Japanese National Medical Examination. Furthermore, by using this prompt and GPT (※2)-4, they succeeded in exceeding the minimum passing score ratio.

In early 2023, after the publication of a paper in a journal in which ChatGPT was used to solve the U.S. National Medical Examination (USMLE), the potential use of ChatGPT in the field of medicine and healthcare attracted significant attention worldwide, but research on non-English speaking national medical examinations was still in its infancy. However, research on national medical examinations outside of English-speaking countries was still in its infancy.

In this study, we first determined the prompts with the highest percentage of correct answers using GPT-3.5 and GPT-4 based on 290 questions without image data from the 116th National Medical Examination (administered in February 2022). Next, ChatGPT equipped with the GPT-4 model was tested on the 117th National Medical Practitioners' Examination (February 2023) using the optimized prompts, and it scored 82.7% for the required questions and 77.2% for the basic and clinical questions, exceeding the minimum passing score rates.

Furthermore, we conducted a detailed analysis of the reasons why ChatGPT output incorrect answers. As a result, we found that insufficient medical knowledge, lack of information about Japan's unique medical system, and errors in calculation questions were the main reasons for wrong answers.

The results of this study indicate that ChatGPT has the potential to exceed the minimum passing score of the national medical examination in Japan, although there are still some issues to be solved in actual medical practice. It is also expected that the large-scale language model will be one of the basic models for medical AI used in the medical field in Japan in the near future.

The results of this research were published online in the international journal PLOS Digital Health on January 23, 2024.

Figure 1: Study Design

【Glossary】

※1: Prompt
A prompt is a set of instructions given by a human to an interactive generative AI. By devising the content of prompts, it may be possible to have the model perform the task intended by the human side and improve its performance, while leaving the large-scale language model intact.

※：GPT (Generative Pretraind Transformer)
2 GPT (Generative Pretraind Transformer) is a language model based on a machine learning algorithm called Transformer developed by OpenAI.

Click here to see the press release【Japanese only】

Journal: PLOS Digital Health

Researcher Information: Akihiro NOMURA