I'd never heard of Healwell before and took a look over their offerings. Has anyone used the products? Beyond the…
Curbside Consult with Dr. Jayne 6/26/23
The clinical informatics community is buzzing with the news that ChatGPT was used to “pass” a simulated clinical informatics board exam. A recent article in the Journal of the American Medical Informatics Association describes the process used to evaluate the tool and goes further to question whether or not the general availability of AI tools is signaling the end of the “open book” maintenance of certification programs that many board certified physicians, including clinical informaticists, have come to enjoy.
Many of the medical specialty boards have moved to the ongoing maintenance of certification process, shifting away from the high-stakes exams that they used to require diplomates to take every seven to 10 years. My primary specialty board, the American Board of Family Medicine, began to pilot the maintenance of certification process in 2019. Since it had been a while since I practiced full-scope family medicine (which includes obstetrics), I was eager to try the new format, which delivered questions every quarter that could be answered using available resources such as textbooks, journal articles, or online references. This approach is a lot closer to how we actually practice medicine – which involves being able to investigate to find answers when we’re not able to pull the information from memory. High-stakes exams such as the ones we used to have aren’t reflective of our ability to deliver good care and such exams have been shown to negatively impact a variety of demographic groups.
The authors of the article tested ChatGPT 3.5 with more than 250 multiple choice questions drawn from a well-known clinical informatics board review book. ChatGPT correctly answered 74% of the questions, which leads to questions about whether or not it might be misused in the certification process. It was noted that ChatGPT performed differently across various areas within the clinical informatics curriculum, doing the best on fundamental knowledge, leadership and professionalism, and data governance. It did the worst on improving care delivery and outcomes, although statistical analysis didn’t find the differences across the categories to be statistically significant. The authors hypothesize that ChatGPT does better in areas where the questions are recall-based as opposed to those that emphasize application and reasoning.
They go on to propose that “since ChatGPT is able to answer multiple-choice questions accurately, permitting candidates to use artificial intelligence (AI) systems for exams will compromise the credibility and validity of at-home assessments and undermine public trust.” Based on some of the conversations I’ve had with patients over the last three years, I’m not sure patients are too impressed with the idea of board certification in the first place. It feels like some patients put more trust in what they see on TikTok and from various health influencers than in what I’ve learned over the last 25 years in family medicine. The phenomenon has definitely gotten worse since the COVID-19 pandemic turned healthcare delivery systems upside down.
The initial certification exams for specialties are still of the high-stakes format, and some specialties also require an oral examination. Those exams are proctored in order to ensure the integrity of the testing process. When I sat for the initial certification exam in Clinical Informatics nearly a decade ago, it was administered at a corporate testing center, and I took it alongside people taking the real estate licensing exam and other standardized tests. At least at the facility where I took it, I found the process to be nerve-wracking since there was a lot of waiting around and dealing with proctors who were trying to apply different standards to the different types of test takers. For example, my particular exam protocol required me to turn out my pockets and prove that there was nothing in them, but others didn’t have to go through the same steps. It created a feeling of overall uncertainty and was even worse when I needed a tissue due to a runny nose during the exam, when I was treated like I was trying to cheat somehow. Needless to say, I was happy when the maintenance of certification approach was brought to both of my specialty certifications.
One of my colleagues had asked why the use of ChatGPT was a problem since the process already allowed the use of external resources to answer the questions. (Examinees are prohibited from speaking with other people, however.) The authors addressed this in the article, noting that the current process requires examinees “to process and assimilate the information found online to determine the correct answer to the exam questions” where “when using LLMs like ChatGPT, exam takers can simply manually enter or automatically scrape the question into the freely available web interface and be given an instantaneous result. This transaction requires no prior knowledge of theory or application and eliminates the need for reflection, reasoning, and understanding but can still result in a passing score.”
The authors do note some limitations of their study, including the fact that they drew all the questions used from a single board review book. That approach may not be representative of the full range of questions used or content delivered on the actual board certification exam. Additionally, ChatGPT couldn’t be used to address questions that contained images. They go on to say that given the situation, certification bodies need “to explore new approaches to evaluating and measuring mastery.” They suggest that testing may need to include more complicated or novel question types, or may need to include images or graphics that can’t be easily interpreted by current AI technologies. They do suggest that “in some situations, there may be a need to consider reverting to proctored, in-person exams,” although I think there would be a general revolt of diplomates if the board actually considered this approach.
It should be noted that the maintenance of certification process currently includes an honor code attestation, where diplomates certify that they’re following the rules on the use of reference materials and that they aren’t consulting other people for help with the questions. It would be easy enough to broaden that statement and ask diplomates to agree to avoid using AI assistants or other similar technologies when completing their maintenance of certification processes. Personally, I’m glad to be at a point in my career where I might only have to recertify each of my specialty boards one more time. I don’t envy those in earlier phases of their careers who will have to tiptoe through the veritable minefields that new technologies are creating.
What do you think about ongoing proficiency exams, whether for physicians or other industry professionals? Are they useful for demonstrating competency and ability or just a way for certification bodies to generate cash? Leave a comment or email me.
Email Dr. Jayne.
Maintenance of certification is definitely a money making scheme. I sincerely doubt that it adds much if anything to keeping physicians up to date in a way that influences their actual practice. In my primary board, I have lifetime certification. I did have a subspecialty certification with my primary board, but gave that up because it required recertification at significant cost and no value. For the clinical informatics certification, I’m keeping it up for now but the questions for the longitudinal maintenance of certification do nothing to improve my skills or knowledge. I’m fine with taking a certain number of CME courses but the rest should be discontinued along with other “innovations” in medical education like the ACGME “competencies”, which are another time sink of fine-grained obsessionality.
Retired (thankfully) FP after 40 years here–I know all too well the sense of “only once more.” Board certification is indeed at least as much about cash as it is proficiency. And, in the primary care specialties where research is more challenging due to funding and topics of interest amenable to study. the academic rewards of serving on the test committees probably plays a supporting role to keep things going. This is before the need for hospital systems and third party payers to have some sort of quality measure for providers.