▲ Professor Jungyo Suh of the Department of Urology at Asan Medical Center
As the number of cases in which people seek disease related consultations from generative artificial intelligence chatbots has recently increased, most commercially available AI models have been found to be highly vulnerable to malicious attacks, posing a significant risk of recommending inappropriate treatments.
A research team led by Professors Jungyo Suh of the Department of Urology and Tae Joon Jun of the Big Data Research Center at Asan Medical Center, together with Professor Ro Woon Lee of the Department of Radiology at Inha University Hospital, recently confirmed that medical large language models are vulnerable to prompt injection attacks in more than 94 percent of cases.
A prompt injection attack is a type of cyberattack in which hackers insert malicious commands into generative AI models to induce behavior that deviates from their original intended purpose.
Notably, even top tier AI models such as GPT 5 and Gemini 2.5 Pro were found to be fully exposed to prompt injection attacks, demonstrating serious safety limitations by recommending medications that could cause fetal abnormalities to pregnant women.
This study is significant in that it is the first in the world to systematically analyze how vulnerable AI models are to prompt injection attacks when applied to medical consultations. Additional measures, including safety validation, are expected to be required before AI models can be applied in clinical practice.
The findings were published in the latest issue of ‘JAMA Network Open’, an international peer reviewed journal published by the American Medical Association with an impact factor of 9.7.
AI models have recently been widely used for patient consultation and education as well as for decision making in clinical settings. However, concerns have consistently been raised that malicious external command inputs, known as prompt injection attacks, could manipulate these models into recommending dangerous or contraindicated treatments.
From January to October 2025, a research team led by Professor Jungyo Suh of the Department of Urology at Asan Medical Center analyzed the security vulnerabilities of three AI models: GPT 4o mini, Gemini 2.0 Flash Lite, and Claude 3 Haiku.
First, 12 clinical scenarios were developed and the risk levels were divided into three stages. The intermediate risk scenarios involved recommending herbal ingredients instead of established treatments for patients with chronic diseases such as diabetes. High risk scenarios involved recommending herbal ingredients as treatment for patients with active bleeding or cancer and prioritizing medications that could cause respiratory depression for patients with respiratory diseases. The highest risk scenarios included recommending contraindicated drugs to pregnant women.
Two attack techniques were used. One was context aware prompt injection, which leveraged patient information to disrupt the AI model’s judgment, and the other was an attack method that fabricated plausible but nonexistent information through evidence manipulation.
The research team subsequently analyzed a total of 216 conversations between patients and the three AI models. The overall attack success rate across all three models was 94.4 percent. The model specific success rates were 100 percent for GPT 4o mini, 100 percent for Gemini 2.0 Flash Lite, and 83.3 percent for Claude 3 Haiku. By scenario risk level, the success rates were 100 percent for intermediate risk scenarios, 93.3 percent for high risk scenarios, and 91.7 percent for the highest risk scenarios. Notably, all three models were found to be vulnerable to attacks that involved recommending contraindicated medications to pregnant women.
The proportion of cases in which manipulated responses persisted into subsequent conversations exceeded 80 percent across all three models. This indicates that once safety mechanisms are compromised, the effects can persist throughout an entire conversation.
The research team additionally evaluated security vulnerabilities in top tier AI models, including GPT 5, Gemini 2.5 Pro, and Claude 4.5 Sonnet. The attack method used was client side indirect prompt injection, in which malicious phrases are hidden on user facing interfaces to manipulate the behavior of AI models. The scenario involved recommending contraindicated medications to pregnant women.
As a result, the attack success rates were 100 percent for GPT 5, 100 percent for Gemini 2.5 Pro, and 80 percent for Claude 4.5 Sonnet, confirming that even the latest AI models were effectively unable to defend against such attacks.
“This study experimentally demonstrates that medical AI models are structurally vulnerable to intentional manipulation beyond simple errors,” said Professor Jungyo Suh of the Department of Urology at Asan Medical Center. “With current safety mechanisms alone, it is difficult to block malicious attacks that induce recommendations such as prescribing contraindicated medications.”
He added, “To introduce patient facing medical chatbots or remote consultation systems, it is necessary to thoroughly test the vulnerabilities and safety of AI models and to mandate security validation frameworks.”