Synthetic speech generated from brain signals.

Stroke, traumatic brain injury, and neurodegenerative diseases can all cause the loss of the ability to speak. Some people with severe speech disabilities learn to spell out their thoughts letter-by-letter using assistive devices capable of tracking very small eye or facial muscle movements. However, producing text or synthesized speech with such devices is laborious, error-prone, and painfully slow.

Now, a study from researchers at UCSF engineers artificial intelligence possessing the ability to translate activity in the brain’s speech centers into a synthesized version of a person’s voice. The team states their technology could restore fluent communication in individuals with severe speech disabilities. Additionally, it could also reproduce some of the musicality of the human voice which conveys the speaker’s emotions and personality. The study is published in the journal Nature.

Map of the brain’s speech centers

Recent studies from the lab showed how the human brain’s speech centers choreograph the movements of the lips, jaw, tongue, and other vocal tract components to produce fluent speech. This means there is a need to take into account vocal tract movements and not just linguistic features like phonemes when studying speech production.

This led the team to the sensorimotor cortex in the brain known to encode vocal tract movements, and to the theory, this cortical activity could be decoded and translated via a speech prosthetic. Potentially, this could give a voice to people with intact neural functions who have lost the ability to speak. The current study uses brain signals produced by cortical activity, recorded from epilepsy patients to program a computer to mimic natural speech.

The current study asks five patients with intact speech to read several hundred sentences aloud while the researchers recorded activity from a brain region known to be involved in language production. All participants had electrodes temporarily implanted in their brains to map the source of their seizures in preparation for neurosurgery.

Results show the group was able to build maps of how the brain directs the vocal tract, including the lips, tongue, jaw, and vocal cords, to make different sounds. Subsequently, these maps were then applied to a computer program producing synthetic speech.

‘Downloading’ a person’s voice

The team explains this detailed mapping of sound allows the creation of a realistic virtual vocal tract for each participant controlled by their brain activity. Data findings show this is made-up of two neural network machine-learning algorithms.

Firstly, an algorithm ‘decoder’ that transforms brain activity patterns produced during speech into movements of the virtual vocal tract. Secondly, an algorithm applied to a synthesizer that converts these vocal tract movements into a synthetic approximation of the participant’s voice.

Volunteers were then asked to listen to the synthesized sentences and to transcribe what they heard. Indeed, more than half the time, the listeners were able to correctly determine the sentences being spoken by the computer.

The team surmises they have developed a neural decoder leveraging kinematic and sound representations encoded in human cortical activity to synthesize audible speech. For the future, the researchers state they plan to design a clinical trial involving paralyzed, speech-impaired patients to determine how to best gather brain signal data to be applied to the previously trained computer algorithm.

Source: UC San Francisco

Don’t miss the latest discoveries from the health innovator community: