Charles Corfield is president and CEO of nVoq of Boulder, CO.
Tell me about yourself and the company.
I am CEO of NVoq, which is a Boulder, Colorado based technology company. We are active in the HIT space. We provide end customers and technology companies with voice recognition services to help them in their workflows, or in the case of software developers, to incorporate speech recognition into their products and enhance the experience of their own users and customers.
What is left to accomplish with speech recognition now that it has become ubiquitous and of high accuracy?
There a still quite a lot to be done in terms of accuracy. You need the speech recognition to respect the domain in which somebody is talking and how they want the results to come back. That may vary from somebody who is getting the results directly back in front of them to somebody who may be a transcriptionist incorporating this into some other work. Or indeed, in the case of software developers, what they would like extracted from the recognized speech for their own programs. There is still a lot of post-processing work to be done.
I should put in the caveat that it’s easy to confuse speech recognition from mind-reading. Remember that a computer CPU sitting there has not had the social immersion that a human has had. It doesn’t watch TV, it doesn’t go to the pub, it doesn’t get into arguments. It is very easy for us to project onto a computer CPU all sorts of human attributes which it does not have. Part of our skill as a technology company is to set the appropriate expectations amongst consumers of recognized speech as to what’s really going on and how to leverage it best for their own purposes.
Do the sellers of consumer-grade voice assistants try to make them seem smarter than they really are?
That depends what your end user experience is. Many people can have the experience on the one hand that it nails something, but then it appears as dumb as bricks at the next point. It’s no fault of the technology. It’s simply there are limits to what it can do. Because it is missing social context, the recognition mistakes are still there. I don’t think humans are in any danger of being replaced by these digital assistants anytime soon.
Has technology advanced to the point that computers can mimic human interaction?
You have to be careful that you get what I might call the clever dog trick syndrome. You can train a dog to do all sorts of interesting acrobatic tricks, but it’s extremely narrow what the dog can master. You are still left with that question at the end of the day, having been very impressed by what the dog can do — how does it pay the rent? There’s a versatility that humans have. Most humans who have walked the planet, and we’re talking adults here, have got a couple of decades of social immersion under their belts.
We should be cautious that we don’t over-hype what the machines can do. If we keep the machine’s focus on fairly narrow tasks, there’s plenty of opportunity for rote tasks to be automated and, shall we say, narrow social interactions to be automated. But if I might be somewhat tongue in cheek, artificial intelligence has a ways to go before it catches up with natural stupidity. [laughs]
Will ambient clinical intelligence, like that being developed by Nuance, be able to extract data from an encounter and allow the physician to work hands free?
Again, it’s a question of the focus. In doing the speech recognition in different environments, audio environment is not really the issue. It is, what are you trying to extract from what you have recognized? In fact, if you go back to a paper that Google published a little while back, they noticed in their own tiptoeing into this arena of ambient recognition that the problem was much, much harder than they initially thought. It comes down to all those other environmental cues about what is going on.
The computer is like the proverbial story of the blind man encountering an elephant. The computer sees the trunk, or the computer sees the tail. It’s hard for the computer to get the whole picture. Whereas the human, who is apparently so much slower and less able than the computer chip, actually readily digests the social cues as to what is expected and makes very good predictions in that environment.
Computers will get better at that, but I think we should be cautious that we don’t over-hype what they can do today. The best one in the world, even the biggest GPUs out there that are used for artificial intelligence, the amount of memory and processing power at their disposal is actually quite limited. I can stack up your common garden bee against one of those GPUs and note that the bee, with a brain the size of a pinhead, is able to communicate to other bees the location of food sources, navigate to those food sources without killing too many pedestrians on the way, and land upside down in reverse on that moving parking spot. You know, not bad. [laughs] Let’s not get carried away here.
That gives you my philosophy. I grew up in a culture where it was drilled into our heads that we should focus on meat and potato problems. [laughs] In other words, don’t get carried away. Go after the real meat and don’t be too proud to tackle what may, on the surface, seem too simple a problem.
You’ve said you don’t use speech recognition yourself even though some, including me, would say you are the father of it.
In my private life, I’m pretty low tech. About as high tech as I get is when I’m in my workshop making up a new pair of running shoes. [laughs]
As a vendor in health IT, how do you view that marketplace and the recent changes in it?
It’s a marketplace which is going to evolve enormously. The acute space is undergoing a lot of changes. Then there is a growth in post-acute and ambulatory going on. What you are seeing is the issue of data privacy, where consumers are now perhaps getting a little concerned about things they find scary about state-sponsored surveillance when they read stories about what is going on in China.
What about private sector surveillance, which if anything in America, is even more intrusive than government sector surveillance? Then combine that with the hackers who are busy ransacking every healthcare clinic or HIT provided to those healthcare clinics. They’re busy trying to loot them for as much ransom as they can get their hands on.
The issue about data privacy and security has become enormously more important. Without naming any names, some notable HIT vendors themselves have run into trouble on that score. For me as a vendor, it’s certainly one of the things that I spent a lot of time thinking about – how to put as many obstacles in the way of the black hats, and when and if your day comes up, how to limit the amount of damage that they can inflict.
Health IT vendors are partnering with big tech companies like Google, Amazon, and Microsoft. Should they be worried about getting too close to technology companies that are a lot bigger and smarter than they are?
There’s always that concern that, “Are you going to be road kill on the information superhighway?” as it was once put. The concern you’ve got is where Google is trying to open up partnerships with some of the larger healthcare providers to get access to their records and Cerner went through some sort of heartache over whom to partner with on that because of this issue. Will they just get disintermediated? I can’t say that I can look in the crystal ball and give you an answer to that, but it’s a concern.
For us as a vendor, the old mantra is that you want to be outside the kill zone. In other words, do not do something which is right in the target of what the major platforms are going to be doing. Do something which is differentiated, which is not worth their while to do, but which they would like to have as part of their ecosystem. That’s our approach. You’ll get to see over the next few years whether we’re right or wrong.
What has happened in the last couple of years with NVoq and where do you see the company going in the next few years?
It’s a bit like that old BASF tagline. “We don’t make the products you buy. We make the products you buy better.” There is IP know-how and what have you that we can bring to the table for the people who wish to incorporate voice into their own product offerings, where we can save them an awful lot of ramp-up time and we can save them from a lot of missteps. So for us, the next a few years is going to be a story of the partnerships that we build out there and the value that we can add to other people’s stories.
You developed the technical document processor FrameMaker over 30 years ago and sold it Adobe 25 or so years ago. How does it feel to know that software you developed generations ago is still being used and sold today?
I had the privilege of being connected to the engineers who now maintain it a couple of few years back. As I talked to them, we eventually ended up talking about some of the algorithms in it. What I found interesting was here we were all these years later and I said, “Well, haven’t you just rewritten all that stuff for something better?” And they said, “No, actually your original algorithm is still state of the art.” [laughs]
And then I had this thought. I wonder if it’s going to be a point that, should make it into my eighties, that I’m going to be speaking to some other engineers, finding out what sorts of things I’m worrying about today are either stupid or still state of the art? [laughs]
Maybe well-designed algorithms don’t have a shelf life.
You have no idea at the time. You’re simply just trying to crack a problem. There’s no textbook you get to look it up in. You do the best you can and then you have to move on to the next thing. It was kind of an interesting calibration experience, getting decades-later feedback like that. [laughs]
I saw in the infinite font of knowledge of Wikipedia that you have a species of lizard named after you. Who makes that call to let you know?
Goodness knows. [laughs] I think somebody with a wicked sense of humor. I will take it as a compliment and hope there’s nothing too horrible about the lizard or whichever reptilian species it is. [laughs]
Do you have any final thoughts?
In the world of HIT, it’s going to be a very exciting time, technology-wise. The impact of the cloud is going to come to bear. Even within an area like speech recognition, which has been around for a very long time, we will see a lot more application of it in different workflows. It’s the proverbial, “You ain’t seen nothing yet.” If you think about it, the number of people who are actually using voice recognition within healthcare is quite limited compared to the total number of people — consumers, clinicians, therapists, or what have you — who are out there in the field.
I think we will see some very interesting value propositions emerge, and from a diversity of players as well. Keep your eye on some of the emerging startups, who may be just incorporating it into whatever newfangled clinical offering that they’re doing. It’s going to be an exciting time.