AI Excels in Patient Interaction and Diagnosis in Pilot Study

Michael van den Heuvel

The Articulate Medical Intelligence Explorer (AMIE), an artificial intelligence algorithm, has been trained to conduct and evaluate medical conversations. In dialogues with "patients" who were trained actors, it matched or even surpassed the performance of human doctors.

Based on a large language model (LLM) developed by Google, the chatbot demonstrated superior accuracy in diagnosing respiratory issues, cardiovascular conditions, and various ailments than a general practitioner. Furthermore, compared with human doctors, the algorithm extracted a similar amount of information from dialogues, displaying a higher level of empathy, as reported in a preprint study by Tao Tu and colleagues from Google Research and Google DeepMind.

Developing the Algorithm

Doctor-patient conversations (anamnesis) are one of the most important approaches to diagnosing and treating illnesses. Artificial intelligence (AI) systems capable of conducting and interpreting such dialogues could enhance medical accessibility and improve quality.

Developers faced a challenge due to a shortage of medical conversation data. Consequently, scientists adopted a strategy where, in the initial phase, they fine-tuned the LLM base model using real datasets, such as electronic health records and transcribed medical conversations.

To further train the model, researchers tasked the LLM with playing the role of a person with a specific ailment or an empathetic doctor. In addition, their algorithm assumed the role of a critical colleague, evaluating the doctor's interaction with the patient and providing feedback for improvement.

Successful Pilot Study 

Moving into the testing phase, researchers enrolled 20 participants in a study — not actual patients but actors trained to simulate specific symptoms. These individuals engaged in online text-based consultations with AMIE and 20 general practitioners. Participants were unaware of whether they were chatting with a human or a bot and simulated 149 clinical scenarios as instructed by the scientists. Subsequently, they recorded their experiences, and experts evaluated the performance of AMIE and the doctors.

The AI system matched or surpassed the accuracy of doctors' diagnoses across all six examined medical specialties. In 24 out of 26 criteria for conversation quality, including politeness, symptom explanation, treatment, honesty, thoroughness, and engagement, AMIE outperformed human doctors.

Alan Karthikesalingam, clinical researcher at Google Health in London and coauthor of the study, acknowledged that participating general practitioners might not be accustomed to interacting with patients through text-based chat, potentially affecting their performance. He also noted that doctors may tire more quickly than a bot when providing lengthy, structured responses.

Challenges to Implementation

Following the successful pilot study, researchers plan more detailed studies to identify potential biases and ensure consistent results across different patient subgroups. The Google team is also starting to address ethical requirements for tests involving real patients.

The privacy of chatbot users is a crucial question that needs consideration. Daniel Ting from Duke-NUS Medical School in Singapore emphasized the need for transparency regarding data storage and analysis in commercial language model platforms.

Ensuring Quality Care 

While acknowledging that the chatbot is far from being deployed in clinical care, the authors argued that it eventually could play a role in democratizing healthcare. Adam Rodman, MD, instructor of medicine at Harvard Medical School in Boston, Massachusetts, emphasized that despite its utility, the tool should not replace interaction with doctors as medicine encompasses more than just information gathering — it revolves around human relationships. 

This article was translated from the Medscape German edition

TOP PICKS FOR YOU
Recommendations

3090D553-9492-4563-8681-AD288FA52ACE