Home » Dr. Jayne » Currently Reading:

Curbside Consult with Dr. Jayne 3/23/26

March 23, 2026 Dr. Jayne No Comments

I’ve been playing catch-up this weekend on journal articles, continuing medical education requirements, and maintenance of certification activities. It’s not exactly what I would describe as a good time, but it seemed like the thing to do since I’m approaching deadlines on some of it.

From the journal stack, I was most taken with this article from the Journal of the American Medical Informatics Association that summarized a randomized crossover clinical trial that looked at the impact of two ambient scribe solutions on physician burnout.

The authors are from Duke University, its medical school, and affiliated practices. It’s a safe bet that the research was performed there, although the study describes it as an open-label randomized crossover trial that involved 160 ambulatory clinicians at a tertiary academic medical center in the southeast US.

The clinicians were randomized to two groups with two crossover periods. They were assessed on workflow satisfaction and efficiency measures, such as work outside of office hours and length of documentation time. Some participants were excluded, leading the team to analyze survey results from 136 respondents.

They found notable improvements in satisfaction and note time for one of the products compared to the other. However, differences between the tools were not meaningful with respect to burnout scores or after-hours documentation.

The study involved an open-label randomized crossover. Each phase lasted about a month, separated by a 10-day period when users trained on the next tool while still using the current one.

Users received a baseline survey prior to the trial and a follow-up survey after each of the interventions. They were asked to use the ambient documentation solutions as much as possible. Those who showed low adoption  were offered additional training or were asked if they wanted to withdraw from the trial.

The team based the sample size on the number of software licenses that were available. I wonder if the vendors were aware that their products were part of this project, whether they would have provided additional licenses to enrich the pool of participants, or if they were concerned about the trial at all.

Participants were selected on somewhat of a first-come, first-served timeline, with the first 160 users who submitted the baseline study being chosen. That may have biased the sample toward those who kept up with whatever method of communication the researchers used. It also would have favored those who were interested in adopting new technologies.

Participants were assessed by clinic time, gender, and prior experience with ambient documentation tools. The participants knew which tool they were using, which potentially introduced bias.

Five participants reported moderate safety concerns such as challenges with speaker attribution, over-summarization, and omissions in the assessment and plan sections of the note. Concerns were more common in subspecialty notes, although the authors acknowledge that sample sizes in some specialties were small, which might increase the likelihood that the findings weren’t representative of the specialties as a whole.

The authors also noted that the study period included holidays, which may have impacted documentation patterns. They suggest that a longer observation period with a larger user pool would be beneficial for future research.

The authors also wondered if future studies will find a greater improvement in users who have a longer baseline documentation time. The early adopters who were selected for the study might have been using efficiency mechanisms that would not have been influenced by the documentation tool. They also note that the lack of a true washout period in which users didn’t use an AI-powered scribe between reporting periods may have impacted the results.

I would be interested to hear from readers who may have participated in the study as users, IT support team members, or authors. I’m happy to keep your comments anonymous.

I am also interested in which tools were used for the study. A quick search found that Duke is using Abridge in a number of locations, so I assume it was one of the players. I also found a couple of articles that describe how Duke researchers created a framework to evaluate AI-powered scribe tools. I didn’t find anything published after last summer, when researchers found that using such a framework could be challenging since human reviewers didn’t always agree on how to score the AI tool’s output. That led them to use LLMs to score the output of other LLMs, which is an interesting detail.

One write-up of that work used a scribe tool that was developed in house. It noted that the evaluation tool was able to find problems with AI scribes. AI tools failed 60% of the time to detect nonsensical information that was included in the conversation. Sometimes the tools changed the nonsensical values to make sense, but failed to notify the user. The documentation tool identified nonsensical values only 4% of the time. Results like that illustrate the value of evaluating the performance of AI-powered scribes.

I worked with human scribes for years, and the quality varied. Most of our scribes were premedical students who were committed to doing a great job to earn positive letters of reference, and their work was excellent. However, others were not similarly motivated, such as scribes who hadn’t been admitted to medical school and stayed on the job while they figured out what they wanted to do with the rest of their lives.

The clinician who signs the chart is responsible for ensuring the accuracy of the scribe they use, whether human or AI. I still see too many people who obviously aren’t proofreading their charts, although I have no way of knowing whether that phenomenon is worse with AI scribes than it was with human scribes or even back in the days of dictation and transcription. Most of my physician colleagues agree that it’s only a matter of time before significant legal judgment is entered against someone who failed to properly read or edit a note, regardless of how it was created.

If you’ve used multiple ambient documentation tools, what are your thoughts on the differences? Is one a clear standout? Leave a comment or email me.

Email Dr. Jayne.



HIStalk Featured Sponsors

     







Text Ads


RECENT COMMENTS

  1. Weird that Google can acknowledge its crowdsourced medical advice was wrong, but escape penalties for doing it wantonly at scale.

  2. I'm a little curious about the possibility of this lawsuit having larger political rammifications. Texas and its AG have used…

  3. Re: Fischman v. Epic Systems Corporation Seems to me there's already a story to tell here. The case appears to…

  4. My theory is that Gallit is using the EHR case to try and get discovery to refile the Texas Health…

Founding Sponsors


 

Platinum Sponsors


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Gold Sponsors


 

 

 

 

 

 

 

 

RSS Industry Events

  • An error has occurred, which probably means the feed is down. Try again later.

RSS Webinars

  • An error has occurred, which probably means the feed is down. Try again later.