Many leaders in healthcare and technology view artificial intelligence (AI) as a potential cure for a system plagued by administrative burden, extreme cost, and limited access to high-quality care. Hospital systems have taken this idea and run with it, using it to anticipate epidemics, flag patients at high-risk for certain health outcomes, and more. Because of this, it is no surprise that its next target is the issue of overuse. Could AI hold the key to this age-old problem?
The issue of overuse
Overdiagnosis and overtreatment are issues that have troubled the healthcare system for years. The provision of care that is ineffective and even harmful to patients can lead to physical injury, wasteful spending, and emotional distress for patients.
For example, certain types of cancer screening have the potential to prevent death from cancer (although most don’t have a proven impact on all-cause mortality). But because these tests are not perfect, they have a high rate of false positives, which lead to overtreatment and other harmful cascade events. According to a UC Davis Health study, 50% of women receiving annual breast cancer screenings for ten years experience at least one false positive mammogram.
But what if we could better target screening and reduce the rate of false positive results, ultimately improving the patient experience? That’s where AI comes in. The speed and accuracy of machine learning models trained on diverse data, coupled with the ability to recognize nuances in mammogram results, hold great promise.
How AI could reduce false positives
AI in the field of breast cancer is expansive; it has been used to read and interpret mammograms, conduct cancer risk assessments, and identify nuanced features of breast tissue like density. These tactics are primarily achieved through a form of AI called radiomics. Based on the idea that there are physical characteristics of the disease process that cannot be captured by the human eye, radiomics relies on mathematical formulas to quantify the intensity, shape, and texture of an image.
Here’s a glimpse into what some researchers have found on the potential for AI to target screening and reduce false positives:
- 250 mammograms conducted in 2013 were re-evaluated using an AI-based, computer-aided detection (CAD) software, reviewing marked regions of interest for sensitivity and specificity rates as well as false-positive marks per image (FPPI). The findings revealed an overall 69% reduction in FPPI using the AI-based CAD versus the traditional CAD.
- An AI system applied to 9,581 mammograms was used to categorize cancer risk scores on a scale of one to ten, with higher scores indicating a greater likelihood of malignancy. For those scored between three and ten, the AI system revealed the potential to have eliminated nearly 2,000 exams and ten false positives.
- Houston Methodist Hospital created iBRISK, a breast cancer risk assessment tool supported by AI. iBRISK takes into account patients’ demographics and medical history before recommending future diagnostic testing. By targeting screening toward patients with the highest risk, we improve the chance of benefit from screening.
How AI could make overuse worse
However, even if AI were to reduce false positives in breast cancer screening, it could still exacerbate overdiagnosis. Screening tools are often judged by how successfully they identify cancers, but it’s hard to determine whether a cancer that’s detected is one that is actually harmful. In fact, pathologists often disagree amongst themselves about whether a small abnormality of cells constitutes a cancer diagnosis or not.
As Dr. Adewole Adamson and Dr. Gil Welch wrote in a 2019 NEJM Perspective, “Diagnoses of early-stage cancer made using machine-learning algorithms will undoubtedly be more consistent and more replicable than those based on human interpretation. But they won’t necessarily be closer to the truth — that is, algorithms may not be any better than humans at determining which tumors are destined to cause symptoms or death.” Instead, these AI tools may just be able to overdiagnose at a faster rate.
If AI detects more low-grade, non-invasive “cancers” like ductal carcinoma in situ (DCIS), that could lead to overtreatment. New research shows that outcome is likely. A 2023 study in The Lancet Digital Health examined how AI could be used to assist screening in a national breast cancer screening program to reduce workload for radiologists. In a randomized controlled trial, researchers found that screening supported by AI had a similar cancer detection rate to a standard reading by radiologists. However, AI-supported screening also detected nearly double the number of DCIS cases than standard screening.
But it’s not all bad news! There are ways that AI could help reduce the problem of overdiagnosis. Adamson and Welch suggest that AI algorithms could be trained to uncover “gray area” cases where pathologists disagree about whether an abnormality is cancer, to increase attention to diagnostic uncertainty, improve efficiency, and help researchers understand more about how cancers develop.
Researchers from the Australian Centre for Health Engagement, Evidence and Values suggest that autopsy studies could provide data to train AI to distinguish between fatal and non-fatal cancers, given that many older people die “with” cancer but not “because” of cancer. Data from trials of “watchful waiting” could also potentially be used to build AI algorithms that detect overdiagnosis.
Moving towards an equitable future
In a virtual event hosted by Stat News and the Gordon and Betty Moore Foundation, Putting AI to the Test featured AI experts Dr. Vincent Liu (Kaiser Permanente), Dr. Melissa McCradden (Hospital for Sick Children), and Dr. Ziad Obermeyer (UC Berkeley) who spoke to what’s next for an ethical and equitable future of AI.
Ultimately, the panelists reminded us that AI is merely a tool to make doctors better at what they do. AI is not a one-size-fits-all approach and it must be tailored to the settings in which it is used. For the world of breast cancer screening, this means prioritizing the creation of AI systems that perform not only well, but equitably by targeting the known racial and ethnic disparities in breast cancer. To avoid the common problem of perpetuating health disparities through AI, we should incorporate patient experiences and outcomes into algorithmic formation, said Dr. Obermeyer.
“Rather than training the AI to replicate human judgment, train the algorithm to replicate patient outcomes and longitudinal things that happen to that patient that are sources of truth.”
Dr. Ziad Obermeyer, “Putting AI to the Test”
Pew Research’s finding that three-quarters of Americans are concerned about physicians adopting AI into their practice too quickly without fully assessing its potential risks underscores the importance of putting AI to the test before adopting it on a global scale. The provision of grants – such as the one received by Rensselaer Polytechnic Institute (RPI) and Albany Medical College to study the use of AI in breast cancer treatment – and ongoing evaluation of screening and diagnostic tools ahead of their widespread adoption will be key to AI’s success. Moving forward, we must work to address the issues at the core of overdiagnosis and overtreatment, including improper risk assessment, ill-informed screening referrals, and breakdowns in communication between patients and physicians. Applying AI to a broken system will only exacerbate the issue, not solve it.