Can AI Replace Physicians?

Comparing Accuracy of AI Engines in Differential Diagnoses

by W Sullivan
AI Robot

Can AI Replace Physicians in the Healthcare Industry?


Note: This is a long article because it includes output from several AI engines when asked to diagnose diseases based on symptoms provided. If you want to skip the AI output, scroll down to the section titled “Thoughts on Using AI in Healthcare” or click here for a shorter read.


Background

In December 2023, Members of the European Parliament agreed upon the terms of the Artificial Intelligence Act. This Act bans certain applications of artificial intelligence, including social scoring, certain biometric categorization, internet scraping of facial images for facial recognition databases, and systems that manipulate human behavior or exploit vulnerabilities of certain classes of people.
At the same time, SAG-AFTRA voted to end a six-month strike by signing a new contract with studios that limited AI use in creating movie content. One major issue in negotiations was the studios’ use of AI to replace voice actors and to “populate screens with ‘synthetic fakes.‘”

Artificial Intelligence is Threatening Jobs in Other Industries, but is Use of AI a Threat to the Medical Field?

Could AI pose a similar threat to the jobs of healthcare workers? AI is proving to be a game-changer in medical diagnostics and in improving some patient outcomes by quickly analyzing vast amounts of data in medical records and providing insights to health care professionals. In some cases, AI may even be able to aid in differential diagnosis, distinguishing between similar conditions based on subtle differences in symptoms, physical findings, or lab results.

A recent study in JAMA Pediatrics doesn’t seem to think that ChatGPT 4 is up to the task, claiming that it only rendered a correct diagnosis in 39% of pediatric case challenges. An Annals of Emergency Medicine study suggested that “ChatGPT Equals Doctors in Diagnosing Emergency Department Patients” according to this US News and World Report article. Another study in the Journal of Medical Internet Research showed that ChatGPT had only a 60% accuracy rate in creating an initial differential diagnosis while it achieved a 77% success rate in making a final diagnosis. These studies depended upon input of an accurate medical history, ROS and physical exam to create a differential diagnosis. I suspect that typing in just a few symptoms or providing a rambling, confusing history and looking for a final diagnosis would likely yield much lower accuracy.

“Dr. Google” has often been the go-to consultant for medical issues used by both patients and human physicians. However, Google is limited by its function. Google uses an algorithm to display web pages containing search terms based upon many factors including the authoritativeness of a web site, user engagement, links from other sites, keywords in title tags, in addition to multiple other variables. Search engines were never intended to provide comprehensive answers to complex medical questions. Patients regularly seek medical care in the emergency department when Google search results suggest that their symptoms could be due to serious underlying diseases.

As AI creeps into the search engine realm, companies are creating chatbots to augment and refine search engine queries. Elon Musk’s X has created a chatbot named “Grok” (https://grok.x.ai/) and even Google has created its own AI tool named Gemini (https://gemini.google.com/). Some AI solutions were created specifically to address medical diagnostic dilemmas. With these new tools, both physicians and patients can obtain a comprehensive differential diagnosis by typing in a few symptoms. Are the results good enough that machine learning can replace doctors?

Testing AI’s Ability to Provide a Differential Diagnosis Based on Symptoms

To better determine how far AI has come in the medical diagnostic process, I compared Dr. Google with four AI programs including ChatGPT 4o (lower-case letter, not the number “0”) (http://chatpt.com/?model=gpt-4o), Claude 3.5 (https://claude.ai), Isabel Healthcare (https://www.isabelhealthcare.com/), and Glass Helath (https://glass.health/). ChatGPT and Claude are general AI chatbots while Isabel Healthcare and Glass Health provide healthcare-focused information. The terms of use for both Isabel Healthcare and Glass Health limit the platforms to use by a clinician or clinician-in-training and make it clear that the tools are not intended for use by the general public. The terms of service for both Isabel Healthcare and Glass Health also contain indemnification clauses, but that’s a topic for another article.

Testing AI’s Ability to Diagnose Endocarditis

Doctor Listening to Heart Endocarditis

As a first test, I used a case presentation from an old copy of the Internal Medicine Casebook I used many years ago in my internal medicine residency. I picked a case presentation for endocarditis. The input I included into the various platforms was as follows:

A 27-year-old white man presents to the emergency room with a chief complaint of fevers, shaking chills, cough, and headache of 2 days’ duration. He denies nausea, vomiting, diarrhea, or dysuria. He smokes one pack of cigarettes per day, drinks 6 beers per day, and has recently started injecting cocaine into his skin. He has had no previous hospitalizations and has not undergone any surgical procedures. Physical examination shows temperature 39.0°C, BP 120/80, pulse 114, and respiratory rate of 18 per minute. He does have a global headache with no meningismus. No sinus tenderness. Eyes and conjunctiva are normal. Oral mucosa are moist. Normal dentition. Lungs have coarse rhonchi bilaterally. Heart has a soft systolic murmur at the left sternal border. Abdominal exam normal. Extremity findings are normal. Neurologic examination is normal. Laboratory values include WBC 18,000/mm (85% polymorphonuclear cells, 10% bands, and 5% lymphocytes); hematocrit 38%; and platelets 170.000/mm3. A chest x-ray study reveals bilateral nodular infiltrates. What is your differential diagnosis?

ChatGPT 4o’s Differential Diagnosis for Symptoms of Endocarditis

ChatGPT 4o came up with the following list of possibilities:

  1. Bacterial Pneumonia: The patient’s fever, chills, and cough are consistent with pneumonia. The bilateral nodular infiltrates seen on the chest x-ray could suggest a more atypical or severe form, potentially due to aspiration or hospital-acquired organisms.
  2. Septic Pulmonary Embolism: This is a consideration given the patient’s recent history of injecting cocaine into his skin. This practice can introduce bacteria into the bloodstream, leading to septic emboli that lodge in the lungs, causing nodular infiltrates and systemic symptoms like fever and chills.
  3. Infective Endocarditis: The presence of a systolic murmur raises the possibility of infective endocarditis, especially in an IV drug user. Endocarditis can cause septic emboli to the lungs, resulting in nodular infiltrates and systemic infection signs.
  4. Pulmonary Tuberculosis: Although less likely given the acute presentation, tuberculosis should be considered, especially in patients with risk factors such as substance abuse and potential immunocompromised status.
  5. Viral Pneumonia: Less likely given the elevated white blood cell count with a predominance of neutrophils, but still possible, particularly with certain viruses that can cause more severe pulmonary involvement.
  6. Fungal Infections: Particularly in the context of IV drug use, the patient could be at risk for fungal infections like aspergillosis, which can present with pulmonary nodules.
  7. Non-Infectious Causes: Less likely given the clinical presentation, but conditions such as vasculitis (e.g., granulomatosis with polyangiitis) or malignancy (e.g., lymphoma) could also present with similar symptoms and radiographic findings.

Given the patient’s history of injecting drugs, bacterial causes (pneumonia, septic pulmonary embolism, infective endocarditis) are high on the differential list. Blood cultures, sputum cultures, and possibly echocardiography should be part of the initial workup.

Claude 3.5’s Differential Diagnosis for Symptoms of Endocarditis

Claude 3.5 gave the following output:

  1. Septic Pulmonary Emboli (SPE). This is a top consideration given the patient’s history of intravenous drug use (IVDU), fever, respiratory symptoms, and bilateral nodular infiltrates on chest X-ray. SPE can occur due to infected thrombi from contaminated needles or endocarditis.
  2. Infective Endocarditis. The patient’s IVDU history, fever, and soft systolic murmur raise suspicion for endocarditis. This could be the source of septic emboli if present.
  3. Community-Acquired Pneumonia. The fever, cough, and bilateral infiltrates could indicate pneumonia. The patient’s smoking history increases his risk.
  4. HIV-related opportunistic infection. Given the patient’s high-risk behaviors (IVDU), HIV should be considered. Opportunistic infections like Pneumocystis jirovecii pneumonia could present similarly.
  5. Tuberculosis. Although less likely, TB should be considered given the patient’s risk factors and bilateral infiltrates.
  6. Fungal pneumonia. Such as histoplasmosis or coccidioidomycosis, depending on geographic location.
  7. Non-infectious causes including Vasculitis, Sarcoidosis, Malignancy (less likely given the acute onset
  8. Influenza or other viral pneumonia:
  9. Septicemia from another source:
  10. Toxic effects of cocaine use. Cocaine can cause various pulmonary complications, including pulmonary hemorrhage or “crack lung.”

Further diagnostic steps would include blood cultures, sputum cultures, echocardiogram, and possibly CT scan of the chest to better characterize the nodular infiltrates. HIV testing and screening for other sexually transmitted infections would also be appropriate given the patient’s risk factors.

Glass Health’s Differential Diagnosis for Symptoms of Endocarditis

Most Likely Dx: The most likely diagnosis for this patient is Infective Endocarditis (IE). The patient’s history of intravenous drug use, presence of fever, and new murmur are suggestive of IE. The presence of bilateral nodular infiltrates on chest x-ray could be indicative of septic emboli, a common complication of IE. The leukocytosis could be a response to the infection. The presence of positive blood cultures and vegetations on echocardiography could further suggest this diagnosis.
Expanded DDx:

  1. Pneumonia: The patient’s symptoms of fever, chills, and cough along with the finding of bilateral nodular infiltrates on chest x-ray could be indicative of pneumonia. The patient’s history of smoking and recent drug use could have predisposed him to a respiratory infection. The presence of a productive cough with purulent sputum and a positive sputum culture could further suggest this diagnosis.
  2. Pulmonary Tuberculosis (TB): Given the patient’s history of substance abuse and the finding of bilateral nodular infiltrates on chest x-ray, TB could be a possibility. The presence of night sweats, weight loss, and a positive sputum smear for acid-fast bacilli could further suggest this diagnosis.
    Alternative DDx:
  3. Disseminated Gonococcal Infection: This could explain the patient’s symptoms of fever, chills, and headache. The presence of skin lesions, arthralgia, and a positive culture for Neisseria gonorrhoeae could further suggest this diagnosis.
  4. HIV Infection: Given the patient’s history of intravenous drug use, HIV infection could be a possibility. The presence of oral thrush, lymphadenopathy, and a positive HIV test could further suggest this diagnosis.
  5. Sepsis: The patient’s symptoms and leukocytosis could be indicative of sepsis, possibly secondary to a skin infection from injecting drugs. The presence of hypotension, altered mental status, and positive blood cultures could further suggest this diagnosis.
  6. Meningitis: The patient’s symptoms of fever, headache, and the absence of meningismus do not rule out meningitis. The presence of neck stiffness, photophobia, and a positive lumbar puncture could further suggest this diagnosis.
  7. Lung Abscess: Given the patient’s history of substance abuse and the finding of bilateral nodular infiltrates on chest x-ray, a lung abscess could be a possibility. The presence of foul-smelling sputum and a cavitary lesion on chest imaging could further suggest this diagnosis.
  8. Drug-induced Lung Disease: The patient’s recent history of cocaine injection could have led to a drug-induced lung disease. The presence of dyspnea, hypoxia, and diffuse infiltrates on chest imaging could further suggest this diagnosis.
  9. Bacterial Meningitis: The patient’s symptoms of fever, headache, and the absence of meningismus do not rule out bacterial meningitis. The presence of neck stiffness, photophobia, and a positive lumbar puncture could further suggest this diagnosis.
  10. Septic Emboli: The patient’s symptoms and the finding of bilateral nodular infiltrates on chest x-ray could be indicative of septic emboli, possibly secondary to infective endocarditis. The presence of positive blood cultures and vegetations on echocardiography could further suggest this diagnosis.

Glass Health also allowed users the option to generate a separate diagnostic and treatment plan based upon the clinical findings.

Isabel Healthcare’s Differential Diagnosis for Symptoms of Endocarditis

Isabel Healthcare didn’t allow long form input, so I had to manually enter each positive symptom. For the entry “fevers, shaking chills, cough, injection drug use, fever, tachycardia, rhonchi, systolic heart murmur, headache” Isabel gave me a total of 39 potential diagnoses (in order of likelihood). Endocarditis was 29th on the list:
COVID-19, Swine Influenza, Pneumococcal Pneumonia, Relapsing Fever, Sepsis and Shock, Common Cold/Nasopharyngitis, Asthma, Coccidioidomycosis, Bronchogenic Carcinoma, Non-Hodgkin Lymphoma, Viral Hepatitis, Hepatitis A, HIV/AIDS, Pulmonary Embolism and Infarction, Mycoplasma Pneumonia, Hypersensitivity Pneumonitis, Viral Pneumonia, Aspergillosis, Aspiration Pneumonia, Nitrous Oxide Toxicity, Meningococcal Disease, Otogenic Brain Abscess, Human Granulocytic Anaplasmosis, Leptospirosis, Necrotizing Fasciitis, Brucellosis, Relapsing Polychondritis, Eosinophilic Lung Disease, Babesiosis, Endocarditis, Lyme Disease, Viral Meningoencephalitis, Pulmonary Tuberculosis, MRSA, Spinal Epidural Abscess, Melioidosis, Sarcoidosis, Typhoid, Acute Mountain Sickness, Cardiac Myxoma

Google’s Differential Diagnosis for Symptoms of Endocarditis

Dr. Google was confused with the long form question. Although it did perform a search for the terms, it noted that search queries are limited to 32 words, so nothing after the patient’s “6 beers per day” was included in the query. The resulting search came up with 37,100 results. The first result was a case study about a 32 year old with diverticulitis. Ironically, the first page of search results found a digital copy of the Internal Medicine Casebook from which I originally obtained the patient scenario – which obviously contained the correct diagnosis.
When I entered the symptoms I provided to Isabel, Google came up with 11,800 results. These included multiple test questions, clinical vignettes, treatment guidelines, and case discussions, but did not provide a clear answer. Determining whether the listed pages were pertinent to the question required clicking each link and reading the page content. The search page also had a “Related health conditions” box including information on substance intoxication, embolism, high blood pressure, intoxication, and rheumatic fever.

Testing AI’s Ability to Diagnose COVIDCOVID virus

When I provided the AI programs and Google with a second query using a list of more generalized symptoms without any physical exam findings, the differential diagnosis was quite broad. For the second query, I entered typical symptoms of COVID: “fever, chills, cough, pharyngitis, headache, myalgias, nausea.”

Google’s Differential Diagnosis for Symptoms of COVID

Google directed me toward influenza and highlighted other “related health conditions” such as the common cold, influenza, strep throat, meningitis, and tonsillitis. Clicking on articles that listed most common causes of “sore throat and headache” give a differential diagnosis of COVID, influenza, strep throat, allergies, bacterial meningitis, infectious mononucleosis, peritonsillar abscess, throat cancer, Lemierre’s syndrome and early HIV.

Isabel Healthcare’s Differential Diagnosis for Symptoms of COVID

Isabel Healthcare came up with a long list of differential diagnoses including Infectious Mononucleosis, Coronavirus, Babesiosis, Influenza, Relapsing Fever, Lyme Disease, Human Granulocytic Anaplasmosis, Rocky Mountain Spotted Fever, Viral Meningoencephalitis, Dengue Fever, Viral Hepatitis and other infectious causes such as meningococcal disease, toxoplasmosis, leptospirosis, Lemierre’s Syndrome, intracranial abscess, HIV, and Kikuchi Disease. Clicking on each disease would open up a corresponding entry in Wolters Kluwer’s 5MinuteConsult, the MSD Manual, or other online resources. When I clicked on Kikuchi Disease, I was brought to a MSD Manual page for “Lymphadenitis.” Ironically, I had to do a Google Search to learn more about Kikuchi Disease.

Glass Health’s Differential Diagnosis for Symptoms of COVID

Glass Health was more finicky, requiring patient age and gender before proceeding. When I added that the patient was a 35-year-old male, the differential diagnosis module initially froze, then created a paragraph stating that there was no information provided regarding patient demographics, past medical entry, medications, onset and tempo of illness, associated signs and symptoms, laboratory data, imaging data, etc. It then froze again for a couple of minutes before generating a differential diagnosis including infectious mononucleosis, streptococcal pharyngitis, acute HIV, bacterial meningitis, viral hepatitis, bacterial sinusitis, Lyme disease, Dengue fever, leptospirosis, Q fever, typhoid fever, and Rocky Mountain spotted fever. Glass Health also recommended testing to help confirm each of the listed differentials.

ChatGPT 4o’s Differential Diagnosis for Symptoms of COVID

Infectious Causes: Influenza (classic presentation with sudden onset of fever, chills, myalgias, headache, cough, and sore throat), COVID-19 (symptoms overlap significantly with influenza, including fever, cough, headache, sore throat, myalgias, and nausea), viral URI, streptococcal pharyngitis, mononucleosis, bacterial pneumonia, mycoplasma pneumonia, acute HIV Infection, sinusitis.

Non-Infectious Causes: Seasonal Allergies (while less likely to cause fever, they can cause sore throat, headache, cough, and myalgias), autoimmune conditions (e.g., lupus, rheumatoid arthritis), medication side effects.

When I asked what drugs could cause these symptoms, ChatGPT gave me a list of 20 medications, although the likelihood of many of the listed medications (fluoroquinolones, ibuprofen, acyclovir, SSRIs, etc.) causing the symptoms was quite low. On the other hand, I learned that hydralazine can cause fever in up to 8% of patients.

ChatGPT also suggested physical findings to look for, potential laboratory testing, and other testing to perform, stating that “Given the overlapping symptoms, it is important to use the clinical context, patient history, and targeted diagnostic tests to narrow down the diagnosis.”

Claude 3.5’s Differential Diagnosis for Symptoms of COVID

Claude 3.5 gave a “concise differential diagnosis” including influenza (flu), COVID-19, streptococcal pharyngitis (strep throat), pneumonia, infectious mononucleosis (mono), common cold (less likely due to severity of symptoms), acute bronchitis, mycoplasma pneumonia, Legionnaires’ disease, and early stages of meningitis

Thoughts on Using AI in Healthcare

Google and each of the AI programs have their pros and cons. “Dr. Google” is free, but was clearly outperformed by the other programs. Using Google required clicking on links and reading through random web pages in an attempt to gain necessary information. Often the articles were less than helpful or contained only a cursory mention or differential diagnosis of symptoms. However, Google is intended as a search engine, not as a differential diagnosis program. Top Google pages for a given search are not necessarily the most likely diagnoses, but instead simply have Google’s optimum density of keywords contained on the linked pages.

Each of the other programs compiled pertinent results in an easy-to-read format. Isabel Healthcare was geared more towards providing a differential diagnosis with input of symptoms, but getting information to help narrow down the differential diagnosis was cumbersome and often amounted to a glorified Google search. Also, Isabel seemed to create too wide of a differential diagnosis. That wide net may be useful when “zebra hunting,” but I think it makes things more difficult for the average doctor trying to find a quick differential diagnosis in the everyday practice of medicine.

GlassHealth provided a comprehensive differential diagnosis, potential alternative diagnoses, recommended diagnostic workup, and a treatment plan based upon the most likely diagnosis. It also included a bibliography with links to medical articles relating to the diagnosis, plan, and treatment. However, in each case the articles were only tangentially related to the suggested diagnoses. For example, when I did another query using symptoms of Addison’s disease, none of the linked articles actually discussed Addison’s disease itself and most included broad titles such as “What is anemia,” “Acute diarrhea,” and “Practical selection of antiemetics” (with a publication date of 2004).

AI provided comprehensive results in an easy-to-read format. Both Claude and ChatGPT provided excellent differential diagnoses based on the symptoms provided. Both programs also provided reasoning why they included certain diagnoses in the differential, although I preferred Claude’s reasoning over ChatGPT. Both platforms also suggested studies that could be performed to narrow down the differential diagnoses, but I preferred Claude’s output over ChatGPT in this regard as well. Unfortunately, one of the downsides of generative AI is that it is known to “hallucinate,” creating false statements and even making up false studies by nonexistent authors to support its false statements. Recall the attorney who had ChatGPT write a legal brief that he submitted to the court, only to later learn that six of the cases cited in the brief didn’t exist.

Ability to Use Longform Questions

A benefit of ChatGPT 3.5, Claude 2.0, and GlassHealth is that each accepts longform information input. One could literally cut and paste a patient’s history and physical exam into any of these programs and the programs would create a detailed differential diagnosis and treatment options. Google accepts longform information input – up to 32 words. However, such a lengthy “search term” seems to only confuse the search engine as it engages in blind term matching, rather than analyzing the input. Isabel does not accept longform information input at all and requires that users enter single word or short phrase subjective findings, objective findings, and lab data to create a differential diagnosis.

Using AI for Medical Image Analysis

By uploading medical images, such as X-rays, MRIs, or CT scans, AI algorithms can analyze the images and provide insights to aid physicians in their decision-making. For example, AI can help identify specific abnormalities or patterns that are indicative of certain conditions or diseases. This can save time for physicians and ensure a more accurate diagnosis.

Claude and ChatGPT both allow upload of images for analysis. I didn’t include image analysis in the comparison since not all programs allow image upload. AI-powered image analysis may be able to assist in detecting early signs of diseases such as lung cancer or breast cancers that may not be easily visible to the human eye, leading to more effective treatments and improved patient outcomes.

Unfortunately, based on my few uploads, AI image classification and interpretation doesn’t appear trustworthy enough to assist medical professionals in making clinical decisions. I couldn’t get the upload function to work on ChatGPT, but Claude 3.5 was able to review a limited number of images. I gave Claude deidentified pictures from my teaching files (never published online before) of disseminated varicella in an HIV patient and of erythema nodosum. It nailed both diagnoses, including suggesting that the patient with varicella may have HIV.

However when I uploaded a CT image of a large femoral artery aneurysm, it said that there were lesions in the liver (there weren’t), that there was splenomegaly (the spleen wasn’t visible on the scan), that there was ascites around the liver and the spleen (there wasn’t) and that the left kidney had an abnormal texture/density (the left kidney wasn’t visible on the scan either). I assume that AI was trying to interpret coronal views of the abdomen using its training on axial images. Unfortunately, rather than acknowledging that the image was unable to be interpreted, AI created multiple abnormalities that were not present. AI’s hallucinations definitely won’t replace experienced radiologists any time in the near future.

AI Image Analysis Femoral Artery Aneurysm

My Rankings of Programs for Providing Differential Diagnosis of Symptoms

  1. Glass Health is my pick for creating the best differential diagnoses and the most accurate treatment plans.
  2. Claude 3.5. I favored Claude’s ouput just a little more than ChatGPT. I was also able to get the image upload button to work and was impressed with its ability to analyze skin disease images.
  3. ChatGPT 4o is just behind Claude, but the differences between the two are minor. ChatGPT 4o also allows the upload of images for analysis, but I couldn’t get it to work.
  4. Isabel Healthcare was in fourth place, but readers should note that Isabel was designed to create differential diagnoses, not to suggest testing or create treatment plans. I think that the list of potential diagnoses created by Isabel was inclusive, but sometimes too long.
  5. “Dr. Google.” Although Google excels at its intended purpose of finding information on the Internet, searches often include much extraneous information and Google lacks the ability to analyze and compile the information into a succinct format. Keep in mind, though, that even Google has developed an AI engine named “Gemini” (https://gemini.google.com/).

Conclusions

While AI has the potential to significantly enhance healthcare delivery, it won’t replace healthcare providers. Medicine involves far more than data analysis. Medical professionals must also be adept at human interaction, asking the right questions in a history, must have situational awareness, must consider socioeconomic issues, and must demonstrate hands-on skills that even the most advanced healthcare AI system would have difficulty adequately addressing. AI lacks human touch and the human connection required to practice medicine. AI can’t show empathy. AI has limited contextual understanding. It cannot interpret nuances in a patient history or non-verbal cues.

Instead of being replaced by AI, medical providers will likely learn to rely on AI as a supportive tool for patient care to help digest large amounts of data into a quickly accessible format that will benefit both providers and patients.

Can AI replace physicians? I think our jobs are pretty safe … for now.

You may also like

Leave a Comment

12 + 11 =

Terms of use/privacy policy

As a condition of entering this web site you agree that you will NOT to use the information contained on this site as medical advice or as legal advice and that all information on this site is for informational purposes only. You agree that any information you post on this site may be used or republished without further attribution or compensation. You also acknowledge and agree to the terms of the privacy policy.