April 5, 2018

Computer system transcribes words users 'speak silently'

by Larry Hardesty, Massachusetts Institute of Technology

MIT researchers have developed a computer interface that can transcribe words that the user verbalizes internally but does not actually speak aloud.

The system consists of a wearable device and an associated computing system. Electrodes in the device pick up neuromuscular signals in the jaw and face that are triggered by internal verbalizations—saying words "in your head"—but are undetectable to the human eye. The signals are fed to a machine-learning system that has been trained to correlate particular signals with particular words.

The device also includes a pair of bone-conduction headphones, which transmit vibrations through the bones of the face to the inner ear. Because they don't obstruct the ear canal, the headphones enable the system to convey information to the user without interrupting conversation or otherwise interfering with the user's auditory experience.

The device is thus part of a complete silent-computing system that lets the user undetectably pose and receive answers to difficult computational problems. In one of the researchers' experiments, for instance, subjects used the system to silently report opponents' moves in a chess game and just as silently receive computer-recommended responses.

"The motivation for this was to build an IA device—an intelligence-augmentation device," says Arnav Kapur, a graduate student at the MIT Media Lab, who led the development of the new system. "Our idea was: Could we have a computing platform that's more internal, that melds human and machine in some ways and that feels like an internal extension of our own cognition?"

"We basically can't live without our cellphones, our digital devices," says Pattie Maes, a professor of media arts and sciences and Kapur's thesis advisor. "But at the moment, the use of those devices is very disruptive. If I want to look something up that's relevant to a conversation I'm having, I have to find my phone and type in the passcode and open an app and type in some search keyword, and the whole thing requires that I completely shift attention from my environment and the people that I'm with to the phone itself. So, my students and I have for a very long time been experimenting with new form factors and new types of experience that enable people to still benefit from all the wonderful knowledge and services that these devices give us, but do it in a way that lets them remain in the present."

The researchers describe their device in a paper they presented at the Association for Computing Machinery's ACM Intelligent User Interface conference. Kapur is first author on the paper, Maes is the senior author, and they're joined by Shreyas Kapur, an undergraduate major in electrical engineering and computer science.

Subtle signals

The idea that internal verbalizations have physical correlates has been around since the 19th century, and it was seriously investigated in the 1950s. One of the goals of the speed-reading movement of the 1960s was to eliminate internal verbalization, or "subvocalization," as it's known.

But subvocalization as a computer interface is largely unexplored. The researchers' first step was to determine which locations on the face are the sources of the most reliable neuromuscular signals. So they conducted experiments in which the same subjects were asked to subvocalize the same series of words four times, with an array of 16 electrodes at different facial locations each time.

Credit: Massachusetts Institute of Technology

The researchers wrote code to analyze the resulting data and found that signals from seven particular electrode locations were consistently able to distinguish subvocalized words. In the conference paper, the researchers report a prototype of a wearable silent-speech interface, which wraps around the back of the neck like a telephone headset and has tentacle-like curved appendages that touch the face at seven locations on either side of the mouth and along the jaws.

But in current experiments, the researchers are getting comparable results using only four electrodes along one jaw, which should lead to a less obtrusive wearable device.

Once they had selected the electrode locations, the researchers began collecting data on a few computational tasks with limited vocabularies—about 20 words each. One was arithmetic, in which the user would subvocalize large addition or multiplication problems; another was the chess application, in which the user would report moves using the standard chess numbering system.

Then, for each application, they used a neural network to find correlations between particular neuromuscular signals and particular words. Like most neural networks, the one the researchers used is arranged into layers of simple processing nodes, each of which is connected to several nodes in the layers above and below. Data are fed into the bottom layer, whose nodes process it and pass them to the next layer, whose nodes process it and pass them to the next layer, and so on. The output of the final layer yields is the result of some classification task.

The basic configuration of the researchers' system includes a neural network trained to identify subvocalized words from neuromuscular signals, but it can be customized to a particular user through a process that retrains just the last two layers.

Practical matters

Using the prototype wearable interface, the researchers conducted a usability study in which 10 subjects spent about 15 minutes each customizing the arithmetic application to their own neurophysiology, then spent another 90 minutes using it to execute computations. In that study, the system had an average transcription accuracy of about 92 percent.

But, Kapur says, the system's performance should improve with more training data, which could be collected during its ordinary use. Although he hasn't crunched the numbers, he estimates that the better-trained system he uses for demonstrations has an accuracy rate higher than that reported in the usability study.

In ongoing work, the researchers are collecting a wealth of data on more elaborate conversations, in the hope of building applications with much more expansive vocabularies. "We're in the middle of collecting data, and the results look nice," Kapur says. "I think we'll achieve full conversation some day."

"I think that they're a little underselling what I think is a real potential for the work," says Thad Starner, a professor in Georgia Tech's College of Computing. "Like, say, controlling the airplanes on the tarmac at Hartsfield Airport here in Atlanta. You've got jet noise all around you, you're wearing these big ear-protection things—wouldn't it be great to communicate with voice in an environment where you normally wouldn't be able to? You can imagine all these situations where you have a high-noise environment, like the flight deck of an aircraft carrier, or even places with a lot of machinery, like a power plant or a printing press. This is a system that would make sense, especially because oftentimes in these types of or situations people are already wearing protective gear. For instance, if you're a fighter pilot, or if you're a firefighter, you're already wearing these masks."

"The other thing where this is extremely useful is special ops," Starner adds. "There's a lot of places where it's not a noisy environment but a silent environment. A lot of time, special-ops folks have hand gestures, but you can't always see those. Wouldn't it be great to have silent-speech for communication between these folks? The last one is people who have disabilities where they can't vocalize normally. For example, Roger Ebert did not have the ability to speak anymore because lost his jaw to cancer. Could he do this sort of silent speech and then have a synthesizer that would speak the words?"

More information: Arnav Kapur et al. AlterEgo, Proceedings of the 2018 Conference on Human Information Interaction&Retrieval - IUI '18 (2018). DOI: 10.1145/3172944.3172977

Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation: Computer system transcribes words users 'speak silently' (2018, April 5) retrieved 16 April 2024 from https://techxplore.com/news/2018-04-words-users-silently.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New chip reduces neural networks' power consumption by up to 95 percent

151 shares

Feedback to editors

Using sound waves for photonic machine learning: Study lays foundation for reconfigurable neuromorphic building blocks

2 hours ago

Samsung returns to top of the smartphone market: Industry tracker

3 hours ago

Safeguarding the future of online security with AI and metasurfaces

15 hours ago

Security vulnerability in browser interface allows computer access via graphics card

18 hours ago

AI's new power of persuasion: Study shows LLMs can exploit personal information to change your mind

18 hours ago

Research team manufactures the first universal, programmable and multifunctional photonic chip

18 hours ago

Researchers develop stretchable quantum dot display

19 hours ago

Mimicking fish to create the ideal deep-sea submersible

19 hours ago

Advance in light-based computing shows capabilities for future smart cameras

21 hours ago

Metasurface antenna could enable future 6G communications networks

Apr 12, 2024

Load comments (1)

Computer system transcribes words users 'speak silently'

Subtle signals

Practical matters

Using sound waves for photonic machine learning: Study lays foundation for reconfigurable neuromorphic building blocks

Samsung returns to top of the smartphone market: Industry tracker

Safeguarding the future of online security with AI and metasurfaces

Security vulnerability in browser interface allows computer access via graphics card

AI's new power of persuasion: Study shows LLMs can exploit personal information to change your mind

Research team manufactures the first universal, programmable and multifunctional photonic chip

Researchers develop stretchable quantum dot display

Mimicking fish to create the ideal deep-sea submersible

Advance in light-based computing shows capabilities for future smart cameras

Metasurface antenna could enable future 6G communications networks

New chip reduces neural networks' power consumption by up to 95 percent

Wearable computing ring allows users to write words and numbers with thumb

Memristors power quick-learning neural network

Technique illuminates the inner workings of artificial-intelligence systems that process language

Low-power special-purpose chip could make speech recognition ubiquitous in electronics

Project Telepathy: Team explores bioelectric signals produced by facial muscles during speech

Mimicking fish to create the ideal deep-sea submersible

Adding a telescopic leg beneath a quadcopter to create a hopping drone

Researchers show electrical pulses can control thermal resistance in devices

New 3D-printing method makes printing objects more affordable and eco-friendly

With inspiration from Tetris, researchers develop a better radiation detector

Engineering students convert old truck to an electrical vehicle

Phys.org

Medical Xpress

Science X

Computer system transcribes words users 'speak silently'

Subtle signals

Practical matters

Using sound waves for photonic machine learning: Study lays foundation for reconfigurable neuromorphic building blocks

Samsung returns to top of the smartphone market: Industry tracker

Safeguarding the future of online security with AI and metasurfaces

Security vulnerability in browser interface allows computer access via graphics card

AI's new power of persuasion: Study shows LLMs can exploit personal information to change your mind

Research team manufactures the first universal, programmable and multifunctional photonic chip

Researchers develop stretchable quantum dot display

Mimicking fish to create the ideal deep-sea submersible

Advance in light-based computing shows capabilities for future smart cameras

Metasurface antenna could enable future 6G communications networks

Related Stories

New chip reduces neural networks' power consumption by up to 95 percent

Wearable computing ring allows users to write words and numbers with thumb

Memristors power quick-learning neural network

Technique illuminates the inner workings of artificial-intelligence systems that process language

Low-power special-purpose chip could make speech recognition ubiquitous in electronics

Project Telepathy: Team explores bioelectric signals produced by facial muscles during speech

Recommended for you

Mimicking fish to create the ideal deep-sea submersible

Adding a telescopic leg beneath a quadcopter to create a hopping drone

Researchers show electrical pulses can control thermal resistance in devices

New 3D-printing method makes printing objects more affordable and eco-friendly

With inspiration from Tetris, researchers develop a better radiation detector

Engineering students convert old truck to an electrical vehicle

Your Privacy