You are here: Home / Research

Research

Phonological atoms
The constructive compositionality of speech signals can represent the speech signal as non-negative linear combinations of atomic units "atoms", which are themselves non-negative to ensure that such a combination does not result in subtraction or diminishment (Virtanen et al., 2015, http://dx.doi.org/10.1109/msp.2013.2288990). The power of the sum of uncorrelated atomic signals in any frequency band is the sum of the powers of the individual signals within that band. The central point is to define the sound atoms that are used as the compositional models. Following this line of research, we hypothesise that the acoustic representation of the phonological features, produced by a phonological vocoder, forms a set of speech signal atoms (the phonological sound components) that define the phones. We call these sound components phonological atoms. It is possible to generate the atoms for any phonological system.
File Atom syndication feed [back] phonological atom
File Atom syndication feed [high] phonological atom
Neural network speech coding
The speech signal coveys both the linguistic-symbolic and continuous-acoustic information. The former is the result of underlying cognitive speech processes, whereas the latter is the result of motor control speech processes. The gap between the cognitive and motor speech processes is narrowing by converging of speech engineering and motor control, psycholinguistics, neuropsychology and speech neuroscience, and recent deep learning approaches. The purpose of this work is to propose a novel architecture of speech coding. We attribute the novel speech coding as cognitive, for compressing the speech signal into code that can be interpreted at the linguistic level, and manipulated by the computational models of speech production, such as the Directions Into Velocities of Articulators model and the Hierarchical State Feedback Control model. Linguistically relevant transmission code brings novel functionality to speech transmission systems, performing tasks such as automatic dialect correction of the speakers, or intelligibility enhancement of speakers with motor speech disorders. The proposed speech coding facilitates an integration of speech transmission with higher level sequential speech applications, such as automatic speech recognition and synthesis, and machine translation systems.