Synthetic speech generated from brain signals.
Stroke, traumatic brain injury, and neurodegenerative diseases can all cause the loss of the ability to speak. Some people with severe speech disabilities learn to spell out their thoughts letter-by-letter using assistive devices which track very small eye or facial muscle movements. However, producing text or synthesized speech with such devices is laborious, error-prone, and painfully slow. Now, a study from researchers at UCSF demonstrates that it is possible to create a synthesized version of a person’s voice controlled by the activity of their brain’s speech centers. The team state their technology could restore fluent communication to individuals with severe speech disability, and reproduce some of the musicality of the human voice which conveys the speaker’s emotions and personality. The study is published in the journal Nature.
Recent studies from the lab showed how the human brain’s speech centers choreograph the movements of the lips, jaw, tongue, and other vocal tract components to produce fluent speech. This means there is a need to take into account vocal tract movements and not just linguistic features like phonemes when studying speech production. This led the team to the sensorimotor cortex in the brain which encodes vocal tract movements, and to the theory that this cortical activity can be decoded and translated via a speech prosthetic, giving a voice to people who can’t speak and have intact neural functions. The current study uses brain signals produced by cortical activity, recorded from epilepsy patients to program a computer to mimic natural speech.
The current study asks 5 patients with intact speech who had electrodes temporarily implanted in their brains to map the source of their seizures in preparation for neurosurgery, to read several hundred sentences aloud while the researchers recorded activity from a brain region known to be involved in language production. Results show the group were able to build maps of how the brain directs the vocal tract, including the lips, tongue, jaw, and vocal cords, to make different sounds; these maps were then applied to a computer program that produces synthetic speech.
The team explain this detailed mapping of sound to anatomy allows the creation of a realistic virtual vocal tract for each participant that could be controlled by their brain activity. Data findings show that this comprises two neural network machine learning algorithms, firstly a decoder that transforms brain activity patterns produced during speech into movements of the virtual vocal tract, and a synthesizer that converts these vocal tract movements into a synthetic approximation of the participant’s voice. Volunteers were then asked to listen to the synthesized sentences and to transcribe what they heard; more than half the time, the listeners were able to correctly determine the sentences being spoken by the computer.
The team surmise they have developed a neural decoder that leverages kinematic and sound representations encoded in human cortical activity to synthesize audible speech. For the future, the researchers state they plan to design a clinical trial involving paralyzed, speech-impaired patients to determine how to best gather brain signal data which can then be applied to the previously trained computer algorithm.
Source: UC San Francisco