You are here: Home / Research / Neural network speech coding

Neural network speech coding

The speech signal coveys both the linguistic-symbolic and continuous-acoustic information. The former is the result of underlying cognitive speech processes, whereas the latter is the result of motor control speech processes. The gap between the cognitive and motor speech processes is narrowing by converging of speech engineering and motor control, psycholinguistics, neuropsychology and speech neuroscience, and recent deep learning approaches. The purpose of this work is to propose a novel architecture of speech coding. We attribute the novel speech coding as cognitive, for compressing the speech signal into code that can be interpreted at the linguistic level, and manipulated by the computational models of speech production, such as the Directions Into Velocities of Articulators model and the Hierarchical State Feedback Control model. Linguistically relevant transmission code brings novel functionality to speech transmission systems, performing tasks such as automatic dialect correction of the speakers, or intelligibility enhancement of speakers with motor speech disorders. The proposed speech coding facilitates an integration of speech transmission with higher level sequential speech applications, such as automatic speech recognition and synthesis, and machine translation systems.

More details are available in our white paper Cognitive speech coding.