Place: MIT 46-5165, Time: Thursdays, 5pm
2/14 Reading Group
Structures, not strings: linguistics as part of the cognitive sciences by Everaert et al. (2015)
2/21 Sherry Yong Chen (MIT Linguistics)
The linguistics of one, two, three
The interpretation of number words in context is of interest not only to linguists, but also logicians, psychologists, and computer scientists. In this talk, we will discuss some theoretical and developmental questions related to the linguistics of English numerals.
Bare numerals (e.g. two, three) present an interesting puzzle to semantic and pragmatic theories, as they seem to vary between several different interpretations: ‘at least n’, ‘exactly n’, and sometimes even ‘at most n’. We will examine how the availability of a particular interpretation seems to depend on the interaction between linguistic structure and contextual factors, and discuss three approaches that try to capture the relationship between these interpretations.
Turning to the acquisition of bare numerals, developmental research suggests that preschoolers by the age of 5 are able to access ’non-exactly' interpretations of a bare numeral in contexts where these interpretations are licensed, just like adult speakers. A natural hypothesis for this is that the knowledge of the full range of interpretations may come through a prior understanding of the meaning of explicit expressions such as ‘at least/at most’ in English. This turns out to be questionable, however, since it is also shown that 5-year-olds haven’t yet acquired the meaning of the expressions at least and at most yet. Time permitting, we will end with a discussion about what all this means for the development of numerical concept and/or language development in general.
3/14 Reading Group: Danfeng Wu (MIT Linguistics), Syntactic Theory: A Formal Introduction Chapters 9.3-9.9
What is the role of psycholinguistic evidence (specifically evidence from language processing) in the study of language? What is the relation between knowledge of language and use of language? We hope to explore these questions through a discussion of an HPSG textbook chapter. HPSG (Head-driven phrase structure grammar) is a different syntactic framework from generative transformational grammar, and is surface-oriented, constraint-based and strongly lexicalist. This textbook chapter argues that HPSG is more compatible than transformational grammar with observed facts about language processing. For instance, language processing is incremental and rapid (e.g. Tanenhaus et al. 1995 & 1996, Arnold et al. 2002). The order of presentation of the words largely determines the order of the listener’s mental operations in comprehending them. And lexical choices have a substantial influence on processing (MacDonald et al. 1994). For these reasons, such psycholinguistic evidence supports an HPSG type of grammar, and poses difficulty to transformational grammar.
3/21 Paola Merlo (University of Geneva)
In the computational study of intelligent behaviour, the domain of language is distinguished by the complexity of the representations and the sophistication of the domain theory that is available. It also has a large amount of observational data available for many languages. The main scientific challenge for computational approaches to language is the creation of theories and methods that fruitfully combine large-scale, corpus-based approaches with the linguistic depth of more theoretical methods. I report here on some recent and current work on word order universals and argument structure that exemplifies the quantitative computational syntax approach. First, we demonstrate that typological frequencies of noun phrase orderings, universal 20, are systematically correlated to abstract syntactic principles at work in structure building and movement. Then, we investigate higher level structural principles of efficiency and complexity. In a large-scale, computational study, we confirm a trend towards minimization of the distance between words, in time and across languages. In the third case study, much like the comparative method in linguistics, cross-lingual corpus investigations take advantage of any corresponding annotation or linguistic knowledge across languages. We show that corpus data and typological data involving the causative alternation exhibit interesting correlations explained by the notion of spontaneity of an event. Finally, time permitting, I will discuss current work investigating on whether the notion of similarity in the intervention theory of locality is related to current notions of similarity in word embedding space.
4/4 Candace Ross (MIT CSAIL)
Grounding Language Acquisition by Training Semantic Parsers using Captioned Videos
We develop a semantic parser that is trained in a grounded setting using pairs of videos captioned with sentences. This setting is both data-efficient, requiring little annotation, and similar to the experience of children where they observe their environment and listen to speakers. The semantic parser recovers the meaning of English sentences despite not having access to any annotated sentences. It does so despite the ambiguity inherent in vision where a sentence may refer to any combination of objects, object properties, relations or actions taken by any agent in a video. For this task, we collected a new dataset for grounded language acquisition. Learning a grounded semantic parser — turning sentences into logical forms using captioned videos — can significantly expand the range of data that parsers can be trained on, lower the effort of training a semantic parser, and ultimately lead to a better understanding of child language acquisition.
4/11 Reading Group
Integration of visual and linguistic information in spoken language comprehension by Tanenhaus et al. (1995)
4/18 Tal Linzen (JHU Cognitive Science) (co-hosted with MIT CBMM)
Linguistics in the age of deep learning
Deep learning systems with minimal or no explicit linguistic structure have recently proved to be surprisingly successful in language technologies. What, then, is the role of linguistics in language technologies in the deep learning age? I will argue that the widespread use of these "black box" models provides an opportunity for a new type of contribution: characterizing the desired behavior of the system along interpretable axes of generalization from the training set, and identifying the areas in which the system falls short of that standard.
I will illustrate this approach in word prediction (language models) and natural language inference. I will show that recurrent neural network language models are able to process many syntactic dependencies in typical sentences with considerable success, but when evaluated on carefully controlled materials, their error rate increases sharply. Perhaps more strikingly, neural inference systems (including ones based on the widely popular BERT model), which appear to be quite accurate according to the standard evaluation criteria used in the NLP community, perform very poorly in controlled experiments; for example, they universally infer from "the judge chastised the lawyer” that "the lawyer chastised the judge”. Finally, if time permits, I will show how neural network models can be used to address classic questions in linguistics, in particular by providing a platform for testing for the necessity and sufficiency of explicit structural biases in the acquisition of syntactic transformations.
5/2 Rachel Ryskin (MIT BCS)
Lifelong learning of linguistic representations
Like much of the perceptual input that humans experience, language is highly variable. Two speakers may produce the same phoneme with different acoustics, and a sentence can have multiple syntactic parses. How do listeners (typically) choose the correct interpretation when the same sentence can map onto several potential meanings? In this talk, I will provide evidence that comprehenders navigate this variability by tracking distributional information in the input and using it to constrain the possible interpretations. For example, listeners can rapidly learn, from exposure to co-occurrence statistics, that a verb is much more likely to be followed by one syntactic structure than another structure (though both are grammatical). Thus, language representations are continuously shaped by experience even in adulthood. However, the underlying learning mechanisms and their neural underpinnings remain an open question. I will discuss recent efforts aimed at testing an error-based learning account, including work with patients with hippocampal amnesia, as well as evidence that the relevant linguistic representations may change over the lifespan.
5/9 Joshua Hartshorne (Boston College)
In popular culture, robots or other theory-of-mind-impaired fictional characters have difficulties with metaphors or indirect speech because they lack "common sense". In fact, common sense plays an even more central role in language processing than these depictions suggest. Compare:
(1) The city council denied the protesters a permit because they feared violence.
(2) The city council denied the protesters a permit because they advocated violence.
Most people interpret they as referring to the city council in (1) and the protesters in (2). It is hard to explain this without invoking common sense. Similarly, compare:
(3) The hat fit in the box because it was big.
(4) The hat didn't fit in the box because it was big.
Such examples are ubiquitous in natural language. In this talk, I will describe recent computational and experimental work that tries to make sense of this commonplace phenomenon.
5/16 Ethan Wilcox (Harvard)
Recurrent Neural Networks (RNNs) are one type of neural model that has been able to achieve state-of-the-art scores on a variety of natural language tasks, including translation and language modeling (which is used in, for example, text prediction). However, the nature of the representations that these 'black boxes' learn is poorly understood, raising issues of accountability and controllability of the NLP system. In this talk, I will argue that one way to assess what these networks are learning is to treat like subjects in a psycholinguistic experiment. By feeding them hand-crafted sentences that belie the model's underlying knowledge of language I will demonstrate that they are able to learn the filler--gap dependency, and are even sensitive to the hierarchical constraints implicated in the dependency. Next, I turn to "island effects", or structural configurations that block the filler---gap dependency, which have been theorized to be unlearnable. I demonstrate that RNNs are able to learn some of the "island" constraints and even recover some of their pre-island gap expectation. These experiments demonstrate that linear statistical models are able to learn some fine-grained syntactic rules, however their behavior remains un-humanlike in many cases.