John Birkmeyer, MD is chief clinical officer of Sound Physicians of Tacoma, WA.
Tell me about yourself and the company.
I’m a general surgeon and a health services researcher by training. I spent most of my scholarly life focusing on the phenomenon of variation in surgical performance and outcomes.
I am chief clinical officer of Sound Physicians, which is a national physician practice focusing on hospital-based position practices. I also serve on the advisory board for Caresyntax, which is a technology company that specializes in big data integration and offers a variety of tools for helping improve the performance of operating surgeons.
What causes surgical variation how much does it affect outcomes?
If you think about it, there’s no reason to be surprised that surgeons would vary in their performance, skill, and ultimately outcomes any more than tennis players, golfers, or musicians. It’s a pretty fine skill. Surgeons just vary in the degree to which they ultimately master it.
If you look at the scientific literature, depending on what procedure and what specialty you’re talking about, there is, give or take, a three- to five-fold spread in surgeon outcomes and costs. At the end of the day, that has enormous implications for both public health and healthcare costs, particularly as you consider that 40 or 50 million surgical procedures get done in the US alone every year. There’s a very deep and complex body of research that aims to understand what drives observed variation in surgeon outcomes.
Part of it, depending on the procedure, is driven by environmental factors and attributes of the hospital at which a surgeon is practicing. Certainly there’s aspects of the team — the skill and competence of anesthesia and critical care — that ultimately drive how well a surgeon’s patients do. However, my own work, as well as that of others, has shown that a lot of that variation is driven by the intrinsic ability of the operating surgeon. While technical skill and proficiency isn’t the only type of surgeon attribute that varies, it’s the most important and the most obvious.
My hospital experience is that surgeons are fiercely autonomous and aren’t all that interested in having others get involved in their work. How much of the issue of variation is based on surgeon psychology?
There’s no doubt that there’s a stereotype associated with surgeons, which is partly true and partly reinforced by how important surgeons are to the economics and to the smooth running of any hospital. I think part of what you’re describing about surgeons is something that is not specific to surgeons, but it’s a paradigm that’s applies to all physicians. There’s this general assumption that if you’re smart and if you do four, five, or up to seven years of post-medical school training, then you’re good to go. You’re at the flat part of the curve with regards to your abilities in your mastery of the craft.
Given how complex surgery is, and even given the scientific literature, it’s clear that surgeons continue on the learning curve for many, many years after they finish their training. My belief is that surgeons could be so much better than they are if they adapted a philosophy of deliberate practice and continuous learning and if they increasingly started to harness some of the empirical tools that are being brought to bear in many other disciplines.
Your video study of procedures found that some surgeons have easily observed poor technique, yet no surgeon thinks they are a less-than-average performer. How much of the surgical process is based on defensible, concrete standards?
Perhaps it’s not a surprise, given the stereotype associated with surgeons, that most surgeons think they’re above average. There’s no doubt that part of what made my own research feasible was the willingness of surgeons to supply videos of themselves operating, probably under the assumption that their peers could learn from watching them. We all know that it’s just a fact that in any sample, that half of all the members will be average or below average.
The things that surprised me about that particular study in The New England Journal of Medicine were, number one, just how stark the differences were in both technique and skill. Number two, it was amazing to me just how immediately obvious those variations in skill were. Not just to professional observers — surgeons watching each other operate — but if you show those 20 videos to lay observers who don’t know anything about surgery, they can almost just as easily segregate the best from the worst. In fact, there’s great research that’s recently been published showing that crowdsourcing by lay observers gets you basically to the same ratings as professional ratings by surgeon peers. Finally, I was really shocked by just how powerfully related surgeon skill was to various outcomes that are relevant either to patient outcomes or to cost.
As I watch all of those videos, as somebody who’s himself a practicing bariatric surgeon, there was not a single surgeon whose technique was outside of the standard of care. Nobody was violating accepted professional standards for how to do that procedure. It just speaks to the fact that our standards are fairly loosey goosey, to the extent that we have a very imprecise estimate of what’s optimal technique and what’s not. It also speaks to the fact that it’s not so much the technique that a surgeon deploys as it is the fidelity or the precision in the skill by which that technique is deployed.
The surgeons who contributed their videos were self-selected, which probably means that you were not seeing the worst surgeons in the US. Beyond observing voluntarily donated videos, what data elements or analysis would allow assessment of all surgeons?
You’re absolutely right that in my study, that was a self-selected group of surgeons. But it was also a group surgeons that had the luxury of being able to choose their best case. Nobody sent me videotapes of cases gone sour. They basically sent me what they thought was typical in sometimes their best work. Imagine what it would look like if it was just a random sample of everybody in all cases.
I’m sure that, for many procedures, if you really did have the universe and the entire library of all of their cases, that there’s a significant minority of surgeons that half the peers would say, “This person should not be operating or should not be doing procedures as complex as this.”
The second part of your question was about what’s a scalable strategy for vetting and providing feedback to all surgeons, not just this highly selected group of volunteers. That’s what’s attractive to me about technology approaches. Such a high percentage of surgical procedures these days, particularly those that are most complex and are the highest stakes from the perspective of patients, are done videoscopically, which means that there’s a real-time video recording of what’s going on in the surgical field and at the tips of the surgeon’s instruments.
What’s really exciting to me is to leverage all of that rich data infrastructure and convert the real-time video information to digital, empirical information that gives surgeons real-time feedback about how they’re doing relative to techniques and maneuvers that ultimately lead to the best outcomes. Google and Uber may ultimately get us to a self-driving car — with all of the externalities, in all of the craziness that has to be accounted for — and can help the car or the driver make better decisions.
I don’t think it’s a huge stretch, given how reproducible certain types of procedures are, that machine learning based on digital video-based information could do the same thing. With regard to not only providing digital analysis and giving a surgeon a report card about how well he or she did with that case that just ended, but also giving real-time information that could help those procedures be better in the first place. Like the angle of attack, how much random motion there is, the amount of force that’s being applied either to the instrument or to the tissue. All of these things that we measured holistically and by human judgment in my study could, in my belief, very readily be replicated in a much more powerful way using the data technology.
Every surgeon wants to do a good job, but nobody likes to judge or be judged by peers. Doctors are competitive enough to want their numbers to look good. Will the procedure data be acted on through self-policing or will hospitals need to get involved?
I think the answer is both. At the end of the day, there needs to be more rigorous procedures for doing two things. One, identifying and policing that small subset of surgeons that really should not be operating, or at least should be operating with a less-complex scope of practice. Number two, finding ways to make all surgeons better. In other words, not just worrying about the bad apples on one tail of the distribution, but finding a way to shift that whole performance curve to the right and make everybody better via the data-informed practice.
With regards to self-policing, there’s a whole bunch of discussion underway about the role of the American Board of Surgery and similar boards for using that as a part of the board certification. Hospitals are increasingly insisting that new surgeons submit videotapes of themselves operating as part of their hospital credentialing process. Those are all fairly important but low-tech approaches to identifying that small number of surgeons who just are not ready for prime time.
What’s most exciting to me is how you make everybody better. Certainly there are practical and sociological barriers to making everybody better purely via a paradigm of person-to-person coaching. Not just because that’s expensive, because surgeon time is expensive, but also because a lot of surgeons just are reluctant to be taught or coached by their peers. They think they’re done and it’s an admission of inferiority to accept that kind of coaching when you’re well-established in your practice.
That’s what’s so appealing to me about the more anonymous, confidential, data-driven performance feedback that I believe is eminently feasible now with both robotic surgery and other types of videoscopic surgery. There still is a lot of work to be done in terms of exactly what that feedback would look like and how to get that feedback in real time to surgeons as they’re operating in a way that does not distract them from what they’re doing, but improves what they’re doing. I think it’s really exciting. I don’t think that it’s 15 years from now. I think we’re getting very close.
As an informaticist, could the expanded information about how a patient’s surgery was performed be connected to other existing data to look at whether the surgical technique contributed to patient outcomes?
If I were chunking this up into three informatics needs, all of which need to be present to some degree to get to the outcome that I was describing earlier, I’d say that number one is there needs to be continued advances in how we collate, curate, and link very heterogeneous, very complicated sources of data that ultimately allow us to link empirical information from the procedure itself to the late outcomes of surgery. Most of which don’t occur during the operating room — they occur the next day or the next week or the next month. If you can’t link measurable aspects of skill in the procedure itself to outcomes later, you just simply don’t have all the data that you’d need for that system to learn.
Once that data platform is in place, there need to be both statistical and probably machine learning-based tools that allow you to identify a subset of high-leverage maneuvers or skills that the surgeon is deploying and to be able to measure them and link them to outcomes in the most parsimonious way.
Obviously there’s a thousand potential micro processes that a sophisticated algorithm could pick up during the course of an operation. Machine learning could help us identify the most important four, five, or six levers and avoid information saturation with the surgeon by focusing on just a small number of levers to get better. It’s much the same way when you take a golf lesson. It’s generally a bad idea for the pro to tell you 14 different things that you should be doing different on your golf swing. You typically do it one or two changes at a time. I think there’s some aspects of that muscle memory in operative surgery as well.
Finally, there is a technology need to not only identify what optimal practices are, but ultimately to get them in the hands of the surgeon in real time, allowing them to modify the course of the procedure as it is being performed. As I think about it, there’s really two ways that that could happen. One way is simply a dashboard in the corner that blinks red when something is sub-optimal and allows the surgeon to self-correct. The second option would be something akin to autopilot, whereby for certain parts of the procedure, you’re letting the technology take over and letting the surgeon guide it and override it exactly as if you’re flying a plane or you’re driving a self-driving car of the future.
What is the prevalence of robotically-assisted devices in the OR and how is that field progressing?
That field is progressing really, really fast. The vast majority of community hospitals, at least those with at least 100 beds, have at least one robot. At the hospital that I was most recently associated with before I joined Sound Physicians, there were four robots that were used virtually around the clock in thoracic surgery, general surgery, urology, and OB-Gyn. It’s really been staggering to see how quickly robotic surgery has started to take over many of the biggest surgical disciplines.
There’s lots of reasons why that is. While we’re collectively on this big learning curve, it also creates this huge opportunity for digital technology to not only make it feasible to conduct more operations through minimally invasive techniques, but also to create this new opportunity for us to do those procedures better than we had in the past.
What steps would you take if you were personally facing a significant surgery?
Unfortunately, surgical patients have very limited publicly available information on which to choose a surgeon. I’m hoping that that may change sometime in the future as a corollary to what we’ve been talking about.
Right now, if I needed some procedure, I would stick with the tried and true techniques for identifying best surgeons. The first is that for whatever type of procedure I need — particularly if it’s one that is complex and/or high-risk — I would learn which surgeon had the highest volumes and specialized in those types of procedures. Both volume and specialization are hugely correlated with better outcomes with most procedures.
Second, I would ask my primary care physician about the reputations of surgeons for the sub-specialties that attach to the procedure I needed. There’s scientific evidence showing that traditional things like the surgeon’s pedigree — in terms of medical school and training — are very poorly correlated with outcomes. Hospitals are small enough places that a physician’s reputation is usually much better than not having that information at all. Even though it’s imperfect, it certainly will help you surface and help you avoid that small number of surgeons that are known to have poor skill or poor outcomes.
I had an old physician colleague whose favorite hobby was bitching about EHRs, and one day told a story about…