EPtalk by Dr. Jayne 9/19/24
The Journal of the American Medical Informatics Association recently published an article that looked at whether generative AI can create discharge summaries and appropriately assign diagnosis codes for the conditions that are addressed during a hospital stay.
For readers who might not be close to direct patient care in the inpatient setting, the discharge summary is a document that should be created at the time the patient leaves the hospital. It should contain information about why the patient was admitted, what happened during their hospital course, what treatments were administered, and their outcomes. It should also include plans for follow-up care. It helps the post-hospital team understand what has been going on and what they need to do next.
Some clinicians are incredibly diligent about creating these in a timely fashion, and the outpatient world appreciates their efforts. Others do it in a haphazard manner, ranging from versions that are timely but missing information to those that don’t get created until the medical staff office threatens to revoke someone’s hospital privileges if they don’t complete their overdue charts.
For patients with shorter and more straightforward hospital stays such as uncomplicated orthopedic surgery or obstetrics, they can be created quickly using templates, dictation, or virtual scribe services. For patients who have long and/or complicated hospital stays, creating a discharge summary can be challenging since it often involves digging through scads of daily notes from everyone involved in care – the admitting physician, consultants, nurses, social workers, therapists, and pharmacists. Especially when notes have had a lot of cut and paste, it can be mind-numbing to try to pull together a coherent summary that explains what actually happened during the hospital stay.
The AMIA article looks at whether GPT 3.5 could be used to generate discharge summaries and assign diagnosis codes using ICD-10. Researchers used standardized patient data that included descriptions of patient conditions and procedures as well as history elements such as social and family history. The prompt limited the discharge summary to 4,000 words, which could either be considered long or short depending on the complexity of the hospital stay. Outputs were assessed for their level of correctness, informativeness, authenticity of the hospital course, and acceptability of the document for clinical use.
Clinical evaluators who reviewed the generated documents found some challenging areas. The tool struggled with eliminating unimportant information, such as noting a normal body mass index. It phrased diagnoses in an unnatural style and included vague phrases without supporting detail. It failed to include details, such as the nature of a traumatic event when mentioning that something occurred following it. It introduced “spurious supporting information,” such as focusing on a patient’s anxiety diagnosis when they had a facial fracture following a fainting episode. Lastly, it failed to recognize the interconnected nature of diagnoses and failed to draw attention to critical diagnoses.
As someone who has been on the receiving end of thousands of discharge summaries in her career, you come to rely on them to present the highlight reel and help you quickly get up to speed on a patient who might be coming to see you same day or very soon. A good one reduces the need to go digging in the electronic health record to figure out what happened, but a bad one will make you want to tear your hair out.
The authors conclude that the GPT-created documents “showed correctness in individual codes, yet lacked naturalness and coherence compared to real data, resulting in lower informativeness, authenticity, and acceptability scores. Synthetic summaries failed to represent holistic patient narratives or prioritize critical diagnoses.” The take-home message is that it’s an interesting concept that is not ready for prime time.
I have to admit that some of the discussion in the article is beyond my expertise in the area of large language models. It sounds like the standardized data used might have been of reasonable quality. It would be interesting to see what kinds of summaries would be created from the more monstrous examples of patient documentation that I’ve seen over the years.
Clinicians are often in a hurry, managing multiple interruptions while trying to document, and may also be struggling with computer systems and stressed out care teams. Notes may be dictated but not reviewed or edited, adding a level of junkiness to the garbage in/garbage out flow that we’ve all experienced. It would be interesting to see what is created when using real-world data rather than standardized examples. The authors mention this as a way to also add in-context support for the generation process. They also note the possibility that asking the system to organize diagnoses chronologically may help add context.
I would be interested to hear what others who are deeper into the LLM world than I am might think about the article, or what other promising work might be on the horizon. If you’re doing that kind of work, and are interested in sharing your impressions, let me know.
This year, the medical school I attended encouraged alumni to contribute a “white coat note” to be placed in the pocket of an incoming first-year medical student. During their orientation phase, new students write a class oath, receive monogrammed white coats with the school’s crest, receive their stethoscopes, and experience significantly more pomp and circumstance than we did when I started medical school. We had to buy our own stethoscopes when we got to second year, buy our own plain white coats when we got to third year – no monograms allowed and definitely no institutional logo – and were basically thrown straight into hours and hours of lectures each day with no hope of any patient interaction in sight.
I have to say I’m a little jealous of some of the experiences that today’s students have compared to what we did (advanced clinical simulators, anyone?) I wonder if there’s a way to quantify how these changes impact student education.
I asked Google Gemini to give me a picture of a white coat ceremony for reference, which it declined to do because I asked for people. However, it was happy to give me some cute animals in white coats instead.
I like the idea of giving people encouraging notes, even if they are generic. Maybe a few weeks or months down the line, one will help a student hang in there when they might otherwise be ready to give up. Maybe we should consider a similar approach in the workplace with inspiring welcome notes.
What would you write to a new person joining your company? Would you paint a rosy picture or offer specific advice? Leave a comment or email me.
Email Dr. Jayne.
Going to ask again about HealWell - they are on an acquisition tear and seem to be very AI-focused. Has…