Curbside Consult with Dr. Jayne 2/26/24
In the US, our love of technology often overtakes our trust of people’s knowledge and expertise. I encountered this on a regular basis in the urgent care setting, where patients demanded testing for conditions that were well-suited to the use of clinical decision support rules. In other countries, clinical decision support rules are accepted – and even expected – as a way of helping patients avoid unnecessary testing and healthcare costs. Some of the most useful and validated CDS rules are those around probability of strep throat, ankle fractures, and pediatric head injuries. However, testing has become a proxy for caring, and if physicians don’t order tests for patients with applicable conditions, those physicians are likely to wind up on the receiving end of low patient satisfaction scores or even hostile online reviews.
I had been thinking about this when I stumbled across a recent article in the Journal of the American Medical Informatics Association that looked at whether explainable artificial intelligence (XAI) could be used to optimize CDS. The authors looked at alerts generated in the EHR at Vanderbilt University Medical Center from January 2019 to December 2020. The goal was to develop machine learning models that could be applied to predict user behavior when those alerts surfaced. AI was used to generate both global and local explanations, and the authors compared those explanations to historical data for alert management. When suggestions were aligned with clinically correct responses, they were marked as helpful. Ultimately, they found that 9% of the alerts could have been eliminated.
In this case, the results of using XAI to generate suggestions to improve alert criteria was two-fold. The process could be used to identify improvements that might be missed or that might take too long to find in a manual review. The study also showed that using AI could improve quality through identification of situations where CDS was not accepted due to issues with workflow, training, and staffing. In digging deeper into the paper, the authors make some very important points. First, that despite the focus of federal requirements on CDS, the alerts that are live in the field have low acceptance rates (in the neighborhood of 10%), which causes so-called “alert fatigue” and makes users more likely to ignore alerts even if they’re of higher importance. Alerts are also often found in the wrong place on the care continuum – they cite the examples of a weight-loss alert firing during a resuscitation event and a cholesterol screening alert on a hospice patient.
They note that alerts are often built on limited facts – such as screening patients of a certain age who haven’t had a given test in a certain amount of time. While helpful in some situations, these need to include additional facts in order to be truly useful; for example, excluding hospice patients from cholesterol screenings. I’d personally note that expanding criteria that underlie alerts would not only make them more useful but would avoid hurtful alerts – for example, sending boilerplate mammogram reminders to patients who have had mastectomies and the like. I’ve written about this before, having personally received reminders that were not only unhelpful but led to additional work on my part to ensure that my scheduled screenings had not been lost somewhere in the registration system. There’s also the element of emotional distress when patients receive unhelpful (and possibly hurtful) care reminders. Can you imagine how the family of a hospice patient feels when they receive a cholesterol screening message? They feel like their care team has no idea what is going on and isn’t communicating with each other.
The authors also summarized previous research about how users respond to alerts, which can differ based on users’ training, experience, role, complexity of the work they’re doing, and the presence of repetitive alerts. Bringing AI into play to help process the vast trove of EHR data around alerts and user behavior should theoretically be helpful, if it can successfully create recommendations for which alerts should be targeted. The authors prescreened alerts by excluding those that fired less than 100 times, as well as those that were accepted less than 10 times during the study period. They then categorized the remaining alerts depending on whether they were accepted or not, then going further to look at features of alerts that were not accepted including patient age, diagnoses, lab results, and more before beginning the XAI magic.
Once suggestions were generated, they were evaluated against change logs that showed whether the alerts in question had been modified during the study period. They also interviewed stakeholders to understand whether proposed alert changes were helpful. The authors found that 76 of the suggestions matched (at least to some degree) changes that had already been made to the system, which is great for showing that the suggestions were valid. The stakeholder process yielded an additional 20 helpful suggestions. Together, those 96 suggestions were tied to 18 alerts; doing the math revealed that 9% could have been eliminated by incorporating the suggestions. For those interested in the specific alerts and suggestions made, they’re included in a table within the article.
In the Discussion part of the article, the authors address the idea of whether their work can be applied at other institutions. From a clinical standpoint, they address conditions and findings that are seen across the board. However, if an organization hasn’t yet built an alert around a given condition, there might not be anything to try to refine. They do note that the institution where the study was performed has a robust alert review process that has been in place for a number of years – a factor that might actually underestimate the effectiveness of the XAI approach. For institutions that aren’t looking closely at alerts, there might be many more found that could be eliminated. The institution also has strong governance of its CDS technology, which isn’t the case everywhere. The authors also note that due to the nature of the study, its impact on patient outcomes and user behavior isn’t defined.
As is with most studies, the authors conclude that more research is needed. In particular, findings need to be explored at a number of organizations or by using a multi-center setup. It would also be helpful to those responsible for maintaining CDS to have a user-friendly way to visualize the suggestions coming out of the model as they’re rendered. It will be interesting to see if the EHR vendors that already have alert management tools will embrace the idea of incorporating AI to make those tools better or whether they’ll choose to leverage AI in other more predictable ways.
Is your organization looking closely at alerts, and trying to minimize fatigue? Have users noticed a difference in their daily work? Leave a comment or email me.
Email Dr. Jayne.
I believe they do still sell tobacco products (CVS doesn't), at least in some areas. I used the term "pack…