Joe Petro, MSME is CTO and EVP of research and development with Nuance of Burlington, MA.
Tell me about yourself and the company.
I got into health information technology about 15 years ago. I started with Eclipsys, on the executive team running R&D. About three years after I was at Eclipsys, I got a call from Nuance. They had a smallish healthcare division and they were looking to go much deeper. I joined as healthcare senior vice-president of R&D. We’ve grown this business over time to about a billion dollars.
Nuance has two divisions. Two-thirds of the company is basically healthcare, while the remaining one-third is enterprise. About 18 months ago, I took over as a chief technology officer for everything. I do all of the products, the technology, and the research as well.
What progress has been made on ambient clinical intelligence and the exam room of the future as conceptually demonstrated at HIMSS19?
Ambient clinical intelligence is super exciting. Five or six years ago, Carl Dvorak at Epic was having a conversation with us and floated the notion of a room being able to listen. At the time, we didn’t have any necessarily tangible connection with how we were actually going to accomplish that. As conversational AI and other technologies developed, we started to get a firmer notion around what the exam room of the future could look like.
A lot has happened over the last 12 to 18 months. We have a number of clients now in beta, so we are learning from real feedback from real physicians. We have made a number of advances in terms of the state of the art and in terms of that summarized document that is produced by the conversation. From a tech point of view, that’s an intergalactic space travel problem in terms of how hard that problem actually is. We are jumping from a broad, basic, human-to-human interaction to a finely-tuned clinical document. From a tech point of view, we have advanced the state of the art.
We have also come up with the second generation of the ambient listening device that sits in the room. That second generation is being rolled out soon.
We definitely do not have a demand problem. Just about everybody in the industry has reached out to us, either as a potential partner or as a client. It’s a super exciting time.
A research article addressed the difficulty of turning an exam room conversation, especially in primary care where it might include social elements and cover multiple diagnoses, into clinical documentation. What are the technology challenges?
In the basic inside-the-tech, Russian doll part of it — getting inside and inside and inside – you are layering together accuracy levels on the entire problem. The first thing you have to do is diarize the speech, separating the multiple speech streams in the room. It might not just be the physician and the patient speaking — it might be the physician, nurse, patient, and the patient’s family.There’s a signal processing and a signal enhancement problem associated with that. That in and of itself has its own accuracy challenges. Then you have to turn that into text, and casual conversation is different from the more controlled clinical conversation.
We have 500,000 physicians on our Dragon Medical One product. That formal conversation has accuracy rates of something like 95, 96, or 97%. When it becomes more casual and conversational, it’s a different kind of a challenge because the text and the concepts aren’t necessarily well formed.
The next step is to extract facts and evidence, so you apply something like natural language processing, AI, and neural nets. You extract things like diagnoses and the active medication list. You try to associate things with the patient’s history versus the current issues that are going on with the patient.
Finally, you jump to the summarized document. That’s a big jump, because if a patient is talking about the fact that they hurt their back changing a tire, that may or may not end up in the clinical documentation at all. Based on the data we collect, we decide which things to include in the documentation and which shouldn’t be there.
The flow I just went through involves, from a Nuance point of view, the last 20 years of technology that we’ve developed. Each one of the problems alone is hard, but all the problems together are even harder.
Is the technical challenge of multiple voices and accents less of an issue than when systems needed to be trained on individual voices and users had to speak closely into a microphone?
With the introduction of artificial intelligence, a lot of things have yielded. But it’s not just the AI on the software side of things. Inside that device that hangs on the wall is a linear microphone array. There is something on the order of 17 microphones in there, lined up and separated by a small distance. When you think about the capability of each one of those microphones, think about a cone that is emitting into space from each one. The software and the signal enhancement technology behind the scenes, which is AI based as well, figures out who is in the room and who is actually talking. Then with voice biometrics, we can identify that person and keep a lock on them even if they’re moving around inside the room.
That’s one of the breakthroughs that we’ve brought to this space. We have been in multiple industries for a long time. This has been going on in the automobile industry, as an example, for quite a long time. We actually just spun out our auto business and that has had speaker diarization in it for quite some time, where you’re identifying the person in the driver’s seat versus the passenger versus the variety of children and family members who might be in the back seat. That problem was cracked some time ago and we brought that battle-hardened technology over to the healthcare space.
Wouldn’t there be easily harvested clinical value in simply capturing the full room conversation and storing it as text to support searching, either within a specific patient or across all patients?
Yes, for sure. When I’m talking to the executives here or the executives at EMR companies or even physician or hospital execs, one of the things I always try to explain is that as we get deeper into this problem, opportunities are going to reveal themselves and present themselves to us for augmenting present-day solutions with things that we learn during the ambient clinical intelligence process. We have already had discussions about making the transcript available.
There are pluses and minuses to this. You always have compliance issues and whether physicians and hospitals want this thing hanging around as part of the record. But I think we’ll get through that and figure that out with everybody. But for sure there are things that we’re going to introduce, such as making that conversation available, making the diarized speech available, making the facts and evidence that are intermediate results available. We are having these conversations in an ongoing way with all the electronic medical record vendors, just to figure out what intermediate artifacts we might be able to produce along the way that have high value.
It’s one of the things that makes this exciting because it’s almost like gold mining. You are constantly discovering these things that have tangible value and you can introduce them as part of the product offering.
The excitement over extraction of concepts and discrete data from voice in the room overshadowed the ability to control systems hands free. Is it widely accepted that voice-powered software commands could improve usability?
It’s a little lumpy, to be honest with you. From a Dragon point of view we’ve had what we call Command and Control, Select and Say, voice macros, and these types of things for quite some time. Now we’re evolving this to what we call conversational AI, which allows you to do what you just described in a more conversational way. You can say something like, “Dragon, show me the abnormal lab values,” or “Dragon, let’s pull up the latest imaging study,” or “Dragon, let’s send something to the nursing pool.” It’s more conversational and it could potentially be interactive.
Whereas in the old days, and actually in the present day for the most part, with Command and Control, you’re using a voice command to trigger some kind of a keyboard accelerator that might be available through one of the EMRs. You’re trying to execute a rigid macro that checks off a bunch of boxes. The rigidity of all that, and the brittleness of that, is evolving to something that’s quite flexible.
We’re at a tipping point now where, as you say, is there wide recognition that this could be really good? A certain segment of the population, like the advanced users of Dragon, have always been using this and think of it as rote. They’ve been using it, see the power in it, and realize how it can affect their lives.
I think what’s going to happen now is that we’re going to get past that early adopter phase that we’ve been stuck in for quite some time. There will be broader acceptance the more natural that experience becomes. It’s breathtaking how natural conversational AI can be now. Again, we’re bringing over technology from our auto business and our enterprise business that has been doing this for huge companies for a very long time. All that conversational AI expertise is coming over.
You’re going to see some really big advances here. I’m personally super excited about what’s going to happen over the next couple of years in terms of what we call virtual agents. That’s a very exciting territory.
Will consumer acceptance of voice assistants make it easier to get EHR users to use something similar?
It lowers that barrier, where someone might feel awkward interacting with artificial intelligence and doing it on a day-to-day basis in a natural way. The more that speech becomes ubiquitous as a primary modality that folks interact with, either artificial intelligence or some kind of behind-the-scenes systems, the more the barrier is lowered for us.
I was at a physician’s office the other day and someone had their phone turned up to their mouth and was dictating. The insertion of punctuation into dictation is so unnatural and awkward, but it’s amazing that the person was just sitting there doing the dictation. That type of thing creates relief on our side because it doesn’t feel so awkward for the physician to do it. It also doesn’t feel awkward for the patient to observe the physician doing it. It lowers those artificial barriers that used to be there. I think you’re right — that does create a certain luxury for us.
How do you see speech recognition and synthesized speech being used for population health management?
It can come from both sides. Voice-enabled systems allow folks who are interacting with those systems to to pull information out of them by telling the system what they want. You have a knowledge worker on one end and then the patient side, the reporting, and the things that we could capture from a social point of view that could end up in systems like this. You’re going to see a lot of territory covered in terms of what is actually available to patients.
We’re going to have to address PHI and all of that stuff in terms of what ends up in these systems, how it ends up, and how the patient opts in. But once we get through that phase of it, you will see a lot more entry points that are voice controlled. They will be on both sides of it. You’re going to get the speech side, which is pushing things in in a natural way or trying to extract something with a natural expression of a query. Then you’re going to have the interactions from the patient’s side, which are also voice enabled, but it’s all going to be conversational AI based. You’re going to be talking to a system that asks you questions.
An example of that might be if you ask the system to query something, and it’s an incomplete thought, the system can ask you using voice synthesis — what we call text-to-speech – for whatever it needs to complete the thought so that it can get the appropriate level of information. You’re going to see that all over the place. It’s a bunch of tech that sits around the periphery that will be involved as well.
What impact do you see with EHR vendors signing deals with cloud-based services from Amazon, Google, and Microsoft that give them access to development tools, and I’m thinking specifically of Amazon Transcribe Medical?
It’s another entrant. We keep track of everybody that’s out there. Google has their version of that, Microsoft has their version of that. It’s a good thing to see all the cloud players getting involved. It allows us to create clear differentiation between what we do and how we do it, the accuracy and the fidelity of the experience.
We think about the speech problem as being much bigger than just providing speech to text, and that’s what a lot of these SDKs do. In Dragon, there are literally hundreds of features that sit above the speech dial tone.
The more entrants, the better. That competition is a good thing, but it’s just another competitor type of a response from us.
What opportunities does AI create in going beyond transcription and voice commands to extracting information?
The Comprehend piece, the natural language processing piece — the ability to reach into a stream or a blob of text or documentation or whatever and extract facts and evidence — has been around for a long time. It’s not a new concept. But it allows you to make intelligence part of that natural interaction, which is so important.
For example, we’ve been generating queries to physicians in what we call Computerized Physician Documentation. That’s based on AI. It’s based on natural language processing and it’s also based on speech. It allows us to put intelligence into what we call the speech dial tone, so that as you are speaking, we are aware of the context of what is going on with the patient because we have access to that information through our EMR partners.
But we also know what you are saying. If you’re doing a progress note and you make a statement about some condition, we can connect the dots. If there is specificity missing, if a hierarchical condition category got triggered in the ambulatory setting, if there’s some piece of information missing that could lead to a different diagnosis, we can present that information to the physician in real time. This is making the experience both natural and very, very rich, because the more data we bring into it, the more it takes the burden off the physician.
Physicians are under massive cognitive overload every single day. If we can relieve that a little bit through these mechanisms, it will be a really good thing. Things like Comprehend Medical, the stuff that Microsoft has, Google, the stuff that we have — I think it will all move things in that direction.
Do you have any final thoughts?
We are really excited about the future. I’ve been doing this for a long time now, and I’ve never been more excited about what we’re doing. Ambient clinical intelligence definitely provides an opportunity for Nuance, working with EMR partners, to advance the state of the art in terms of the patient and the physician experience. We are all about the healthcare mission and we are all about relieving burden. What we’re doing here will improve life for all of us as patients, and the partnership with Microsoft and so forth definitely advances that. It will definitely accelerate our mission to get there as quickly as we possibly can. We are jazzed about it and we are really excited about the next few years.