Dr. Carlo Tornatore, director of the Multiple Sclerosis Center at Georgetown University Hospital, uses a special medical version of Dragon voice recognition software to enter notes on a patient encounter.

Dr. Carlo Tornatore, director of the Multiple Sclerosis Center at Georgetown University Hospital, uses a special medical version of Dragon voice recognition software to enter notes on a patient encounter. Joshua Brockman/NPR.

August 11, 2009

By JOSHUA BROCKMAN

It’s talk-back time.

Who hasn’t spoken to their computer on occasion? I’ve heard some choice words exchanged with many a laptop, PC and even the occasional PDA. Most of the time all you get in response is silence.

If you’re tired of having a one-way conversation with your screen, relief is in sight. It’s been more than a decade since consumer versions of voice recognition software came on the scene, but there were many stumbling blocks — including limited vocabulary and the need to spend an excessive amount of time training.

But the technology has advanced to a new level and is changing how we interact with computers, cell phones and cars. And the integration of voice features could have a dramatic impact on making technology more accessible and ergonomically sound by changing the way consumer electronics are designed.

Advances And Obstacles

“We’re right on the edge of a new era of conversational computing, where in certain circumstances your primary mode of interaction with a machine will be talking to it and having it talk back,” says Paul Saffo, a technology forecaster based in Silicon Valley.

He says the building blocks of voice recognition — computing power and algorithms — are steadily improving. So is interface design.

Paul Saffo, a technology forecaster based in Silicon Valley, says computer and consumer electronics companies are spending "serious money" on developing a voice recognition breakthrough.

Paul Saffo, a technology forecaster based in Silicon Valley, says computer and consumer electronics companies are spending “serious money” on developing a voice recognition breakthrough. Courtesy of Mikkel Aaland.

“No matter how good these systems are, they’re not like talking to another human being,” he says. So the design challenge for engineers and software companies is to guide people to ask the right questions and give the right answers.

The creator of much of the voice recognition software that’s in use in devices is Nuance, a Burlington, Mass.-based technology company. There aren’t many other companies with Nuance’s reach in this sector. Yankee Group senior analyst Berge Ayvazian says Nuance became the dominant player in this arena by acquiring or partnering with “most of their former competitors,” including Scansoft, Dictaphone and Philips Speech Recognition Systems.

Nuance’s speech recognition software for PCs is called Dragon NaturallySpeaking (the Mac version is called MacSpeech Dictate and is sold through MacSpeech, which licenses Nuance’s software). The company offers a variety of versions of the software, including ones tailored to the legal community for use with court transcriptions and for medical professionals who use it to dictate notes.

Dr. Carlo Tornatore, director of the Multiple Sclerosis Center at Georgetown University Hospital, uses it to dictate electronic medical records. As a result, these records are available immediately, and there’s no delay in sharing them with other doctors.

So what’s it like talking to a computer?

“It takes a little time to get used to the idea that you’re talking to a screen,” he says.

Talking to a machine is a concept that’s rooted in the popular imagination. Think of Star Trek and Knight Rider, starring David Hasselhoff as Michael Knight, whose sidekick was a talking car named KITT.

Michael Knight, played by David Hasselhoff, helped foil many schemes in concert with his sidekick, KITT, a talking car, as part of the TV series Knight Rider. Dialogue between man and machine is on the horizon.

Michael Knight, played by David Hasselhoff, helped foil many schemes in concert with his sidekick, KITT, a talking car, as part of the TV series Knight Rider. Dialogue between man and machine is on the horizon. NBCU Photo Bank.

Some professionals are using dictation software for longer projects. Dave Farber, distinguished career professor of computer science and public policy at Carnegie Mellon University, uses MacSpeech Dictate to speed things along as he writes an oral history of his work. Even though he’s a two-fingered typist, he says he wasn’t always a fan of this kind of software.

“Up until very recently, I gave up on them,” he says. “The error rates were too high. It doesn’t do any good if you dictate and you have to correct most of it.”

Farber says he made the switch because he was able to start using this software without an extensive amount of training and because he can work without generating a lot of errors. (Read NPR’s review here).

Dragon works quickly, in part, because it uses predictive language modeling akin to Nuance’s software, T9, which is used on billions of cell phones to predict the word you’re trying to type when you send e-mail or text from a mobile phone.

Mobile Talk

Voice recognition is already integrated into a lot of things that we do with phones. Think about whom you talk to when you call directory service, when you book a flight or when you call your bank or credit card company.

Demand for more voice features has been growing especially within the cell phone industry. Voice dialing, which many people use to make hands-free calls on cell phones, is one area where the iPhone was behind the curve — until June, when Apple released its latest model, the iPhone 3GS.

Speech recognition capabilities on many of these phones, like the Samsung Instinct, are powered by Nuance’s voice control software, which enables users to press a button to begin translating their words into text for everything from sending text messages to finding a song, or surfing the Web for a nearby business.

“If you can get decent voice recognition into phones, then you can start treating them as personal assistants, and that’s going to change things,” Farber says.

There’s plenty of room for this market to expand: So far in 2009, Nuance estimates that more than 840 million phones were shipped with text messaging capabilities, compared to about 200 million that shipped with voice capabilities.

As cell phones and voice recognition software become more advanced, people are able to use voice commands without having to train the mobile phone first, says Peter Mahoney, a senior vice president at Nuance.

Read To Me

When devices read to you, they’re utilizing text-to-speech functions. Amazon’s Kindle uses Nuance’s software to enable it to read aloud a book, magazine, newspaper or even a blog. You won’t hear the voice of James Earl Jones — it’s a synthesized computer voice.

Nuance says Dragon has been used by people with some kinds of paralysis and with multiple sclerosis to open up their communication possibilities by facilitating Web searches, e-mail and word processing.

“Dragon can even read back your words to you, so if you have difficulty reading because of dyslexia or some other kind of learning disability, it really enables that capability, too,” Mahoney says.

Accessible And Ergonomic Features

The ability to use one’s voice to guide a device also makes it potentially more accessible for the blind or visually impaired, provided that the buttons and on-screen menus are also navigable.

The blind community has a lot of concerns about the prevalence of touch screen interfaces for mobile phones and other consumer electronics and appliances because many devices effectively shut out those with impaired vision. (Listen to what Stevie Wonder has to say.)

“Voice recognition technology has really enhanced or increased awareness of accessibility,” says Anne Taylor, director of access technology for the National Federation of the Blind.

It also puts blind and sighted users on “a level playing field,” she says, because there is little or no training needed to start using voice recognition features.

“We can’t fully rely currently on voice recognition technology just yet,” Taylor says. “I do hope that one day we can, but at this point we still advocate for keyboarding.”

Keyboards and mice, however, can create problems for your fingers and arms.

Alan Hedge, the director of the Human Factors and Ergonomics laboratory at Cornell University, says voice recognition technology can help reduce these kinds of workplace injuries.

“It plays an important role in reducing the load on other parts of the body so that you can work for a longer period of time on a computer system without running the risk of injury,” he says.

Many companies now pay closer attention to creating ergonomically sound workstations, and Hedge says that has contributed to injuries going “way down.” And while voice recognition technology can be part of the solution, Hedge cautions that overusing one’s voice can also lead to injury.

Let Your Lips Do The Talking

Speaking comes naturally. As a result, one common thread with many of the products that now have a voice interface is the feeling of simplicity.

“We should make it the responsibility of the computer to understand us, versus making it the responsibility of us to understand the way the computer wants to speak,” says Mahoney, the Nuance executive.

As speech recognition becomes more integrated into the devices we use on a daily basis, we may start to inch away from the keyboard and mouse. And that may foster a more collegial relationship with computers.

Now that’s something to talk about.


Shhh! More On Voice Recognition

Read about products that listen and talk back.

Read a review of Dragon NaturallySpeaking.

 

Read original article.

© NPR