Schedule Spring 2018

Place: MIT 46-5165, Time: Mondays, 5pm


2/26 Thomas Schatz (MIT/UMD)

Leveraging automatic speech recognition technology to model cross-linguistic speech perception in humans

Existing theories of cross-linguistic phonetic category perception agree that listeners perceive foreign sounds by mapping them onto their native phonetic categories. Yet, none of the available theories specify a way to compute this mapping. As a result, they cannot provide systematic quantitative predictions and remain mainly descriptive. In this talk, I will present a new approach that leverages Automatic Speech Recognition (ASR) technology to obtain fully specified mapping between foreign and native sounds. Using the machine ABX evaluation method, we derive quantitative predictions from ASR systems and compare them to empirical observations in human cross-linguistic phonetic category perception. I will present results both where the proposed model successfully predicts empirical effects (for example on the American English /r/-/l/ distinction) and where it fails (for example on the Japanese vowel length contrasts) and discuss possible interpretations.

3/5 Ishita Dasgupta (Harvard)

Evaluating Compositionality in Sentence Embeddings

An important challenge for human-like AI is compositional semantics. Recent research has attempted to address this by using deep neural networks to learn vector space embeddings of sentences, which then serve as input to other tasks. We present a new dataset for one such task, “natural language inference” (NLI), that cannot be solved using only word-level knowledge and requires some compositionality. We find that the performance of state of the art sentence embeddings (InferSent, Conneau et al. (2017)) on our new dataset is poor. We analyze some of the decision rules learned by InferSent and find that they are largely driven by simple heuristics that are ecologically valid in its training dataset. Further, we find that augmenting training with our dataset improves test performance on our dataset without loss of performance on the original training dataset. This highlights the importance of structured datasets in better understanding and improving NLP systems.

4/2 Idan Blank (MIT BCS)

When we “know the meaning” of a word, what kind of knowledge do we have?

Understanding words seems to require both linguistic knowledge (stored form-meaning pairings and ways to combine them) and world knowledge (object properties, plausibility of events, etc.). In this talk, I will pose some challenges for common distinctions between these knowledge sources. First, I will ask whether rich information about concrete objects could be, in principle, learned from just the co-occurrence statistics of different words even in the absence of non-linguistic (e.g., perceptual) information. To this end, I will introduce a domain-general approach for leveraging such statistics (as captured by distributional semantic models, DSMs) to recover context-specific human judgments such that, e.g., “dolphin” and “alligator” appear relatively similar when considering size or habitat, but different when considering aggressiveness. Second, I will probe DSMs for “syntactic”, abstract compositional knowledge of verb-argument structure (e.g., “eat”, but not “devour”, can appear without an object). I will demonstrate that these syntactic properties of verbs can often be predicted from distributional information (i.e., without explicit access to “syntax”), indicating that DSMs capture those aspects of verb meaning that correlate with verb syntax. Nevertheless, only a small fraction of distributional information is needed for predicting verb argument structure - the rest appears to capture semantic properties that are relatively divorced from syntax. In fact, the overall similarity structure across verbs in a DSM is independent from the similarity structure across verbs as determined by their syntax, and both kinds of similarity are needed for explaining human judgments. Together, these two studies attempt to push against the upper bound on the potential complexity of distributional word meanings.

4/23 Hendrik Strobelt (IBM/Harvard)

Visualization for Sequence Models for Debugging and Fun
Visual analysis is a great tool to explore deep learning models when there is no strong mathematical hypothesis yet available. I will present two visual tools where we used design study methodology to allow exploration of patterns in hidden state changes in RNNs/LSTMs (LSTMVis) and exploration of Sequence2Sequence models (Seq2Seq-Vis). Both model types have shown superior performance for NLP like language modeling or language translation. Examples about both tasks will be shown on a variety of models.
As beautiful distraction, we also utilize data science methods to investigate large data in a more artistic way. Formafluens is such a data experiment where we analyze a large collections of doodles made by humans in the Google Quickdraw tool.