MIT: The McGovern Institute for Brain Research


The McGovern Institute for Brain Research at MIT is led by a team of world-renowned neuroscientists committed to meeting two great challenges of modern science: understanding how the brain works and discovering new ways to prevent or treat brain disorders.

Project: Unsupervised Learning of Invariant Word Representations: Machine Learning of Speech Like Children Do

Tomaso Poggio, Georgios Evangelopoulos

Objectives: (1) Extend a theory for unsupervised, hierarchical representation learning and study its properties for the variability (transformations and deformations) in speech signals; (2) Develop machine learning algorithms for deriving representations from the raw acoustic signal and quantitatively demonstrate the invariance and small-sample complexity properties; (3) Design and collect a dataset of speech recordings, with controlled and uncontrolled signal variations, for testing the capacity and invariance of the representation; (4) Empirically evaluate a prototype system for speech sound and isolated word recognition on the collected and other, standard, speech corpora.

Innovations: (i) A representation invariant to identity-preserving transformations, stable to smooth deformations and discriminative with respect to different phonetic or vocabulary classes; (ii) An unsupervised data representation learning framework, requiring only stored templates and their transformations; (iii) A hierarchical, multilayer architecture, of increasing invariance to larger transformation ranges that can account for the invariance of parts (e.g., phonemes, morphemes) and the invariance and compositionality of entire objects (e.g., words).

100 years ago nobody would have imagined that it may make sense to talk to machines. Today, in the days of speech recognition and speech synthesis to be found in cars, computers, phones and many other devices this is already normal. But it doesn't stop there.

