I am an associate professor at UCLA in the Department of Linguistics, and advise the UCLA Language Processing Lab. My research investigates how language users develop a sufficiently rich linguistic meaning during online comprehension. Recent topics include the processing of ellipsis and the assignment of focus, as well as the role of other semantic, pragmatic, and prosodic defaults in sentence interpretation.
I am committed to using experimental methods in my research, including Internet-based questionnaires, corpora, and online methods such as self-paced reading and eye tracking. See this page for a description of the various methods and data collection tools used in the lab.
I am an organizer for the California Meeting on Psycholinguistics (CAMP), and hosted the inaugural meeting at UCLA in 2017. CAMP 2018 was held at the University of Southern California. CAMP 2019 was held at UC Santa Cruz. CAMP 2021 was held virtually at UC Irvine. In 2023, CAMP5 returned to UCLA.
As a person who stutters, I’m proud to serve on the Board of Directors of the Myspeech, a non-profit dedicated to facilitating access to high-quality, client-centered speech therapy to people in underserved communities, as well as providing resources on disability rights for students who stutter and education for speech-language pathologists in training.
Before UCLA, I was an assistant professor at Pomona College, in the Department of Linguistics & Cognitive Science.
Finally, I regularly participate in the Psycholinguistics / Neurolinguistics Seminar; the current schedule may be found here.
PhD in Linguistics, 2012
UMass Amherst
MSc in Logic, 2007
University of Amsterdam
MA in Linguistics, 2003
University of Chicago
BA in Linguistics, 2003
University of Chicago
How does the language processing system make efficient use of multiple sources of information to produce a sufficiently rich representation? What information may go underspecified? How does grammatical knowledge constrain representations considered during online sentence processing?
For more details, please refer to this overview of my research agenda or my cv. Ongoing research is also described on the UCLA Language Processing Lab page.
*Recent and upcoming
Psycholinguistics is a relatively young, but rapidly growing, discipline that addresses how language might be realized as a component within the general cognitive system, and how language is comprehended, produced, and represented in memory. It is an interdisciplinary effort, drawing on research and techniques from linguistics, psychology, neuroscience, and computer science, and utilizes a variety of methods to investigate the underlying representations and mechanisms that are involved in linguistic computations.
This course concentrates on (i) uncovering and characterizing the subsystems that account for linguistic performance, (ii) exploring how such subsystems interact, and whether they interact within a fixed order, and (iii) investigating how the major linguistic subsystems relate to more general cognitive mechanisms.
Pragmatics explores the systematic relation between what was intended and what was literally said by examining what inferences can be made from a sentence meaning in a particular context of utterance, given what is known about the speaker and the participants in the discourse. Once treated as a virtual unstructured wasteland of non-linguistic information, pragmatics is reaching a new maturity as it more closely interfaces with linguistic subsystems. Pragmatic research addresses a notoriously broad domain. The course design emphasizes the theoretical components of pragmatics research, focusing on topics that highlight the internal structure of pragmatic mechanisms or the ways in which pragmatic information is embedded within the architecture of the language faculty. It also introduces methods and ongoing developments in experimental pragmatics, an area that has become a driving force in shaping research interests in the field.
This course starts by reviewing the classic cooperative foundations of pragmatic theory initiated by Grice, and then highlights recent advances in the field, concentrating on numerous major topics of current interest, including (a) Pragmatic theory and implicatures since Grice 1975, (b) Projection and not-at-issue content, (c) Speech acts, common ground, and speaker commitments, and (d) Choice points in formalizing context, discourse, and speaker, among others topics.
Psycholinguistics is a relatively young, but rapidly growing, discipline that addresses how language might be realized as a component within the general cognitive system, and how language is comprehended, produced, and represented in memory. It is an interdisciplinary effort, drawing on research and techniques from linguistics, psychology, neuroscience, and computer science, and utilizes a variety of methods to investigate the underlying representations and mechanisms that are involved in linguistic computations.
This course concentrates on (i) uncovering and characterizing the subsystems that account for linguistic performance, (ii) exploring how such subsystems interact, and whether they interact within a fixed order, and (iii) investigating how the major linguistic subsystems relate to more general cognitive mechanisms.
The proseminar course addresses several major themes in the current literature on the use of prosody in sentence processing, including (i) the influence of intonational phrasing on syntactic parsing, ambiguity resolution, and predictive processing, (ii) evidence for prosodic and metrical structure in silent reading, (iii) evidence for individual and cross-linguistic differences, and (iv) the calculation and incorporation of contextually salient focus alternatives.
Our general aim for this course is not to present an entirely crystalized view of how prosodic information has been addressed in sentence processing research, but instead to pose and evaluate conceptual choice points that have been instrumental in situating how prosodic and syntactic representations could be integrated together, given current proposals regarding how language processing routines are organized within a more general cognitive architecture. We will invite open-ended discussion on how complex linguistic mappings and the resulting representations might be integrated into existing models of sentence processing. Participants will be encouraged to consider what kinds of information the comprehender may have to prioritize during processing and the extent to which the interpretation of such information may be affected by broader contextual factors, such as speaker intention, as well as how prosodic grouping may interface with other cognitive representations, such as memory. Finally, we will introduce and evaluate various paradigms used in prosodic processing research, including offline questionnaires, priming, silent reading, visual world, and electroencephalography methods.
The Los Angeles Reading Corpus of Individual Differences (LARCID) is a corpus of natural reading and individual differences measures. The corpus is currently a feasibility pilot of eye tracking data collected from 15 readers. Five texts from public domain sources were included. In addition to the eye tracking measures, a battery of individual difference measures, along with basic demographic information, was collected in a separate session. Individual difference measures included the Rapid Automatized Naming, Reading Span, N-Back, and Raven’s Progressive Matrices tasks.
Pilot data, write up, and R-markdown files can be found on this Open Science Framework page. Comments welcome!
Robodoc is a Python program that automatically cleans eye tracking data of blinks and track losses. This new version improves usability and command line options. Learn more about this handy code here.
This tutorial shows how to use R to access the US Census to visualize language families spoken in the United States. The interactive Shiny app below illustrates how various languages are distributed in California according to the 2012 American Community Survey.
A fully executable R Markdown tutorial is hosted on github. To clone with git, run this command from the terminal:
git clone https://github.com/jaharris/Linguistic_Diversity_CA.git
The Embedded Appositives Corpus is an annotated collection of 278 sentences containing appositives embedded syntactically in the complement of propositional attitude predicates and verbs of saying, drawn from 177 million words of novels, newspaper articles, and TV transcripts. Intended to inform work on appositives, conventional implicatures, and textual entailment. Includes a Javascript interface, an XML corpus, and a short write-up describing the data and their theoretical relevance.
THE NPR Corpus scraper is a collection of Python programs built to crawl NPR and download transcripts into XML format, with links to audio files of radio interviews into a directory. It can be tweaked to crawl other news sites. Note: this tool requires a working knowledge of Python. To be posted with instructions soon!
The script downloads the Linguist List job posting archives for the years specified below. After some reformatting, it removes all but tenure track job postings and categorizes the jobs according to keywords listed in the posting. The method for categorization largely follows previous efforts; see the Language Log postings on the 2008 data, 2009 data, and 2009-2012 data.
A fully executable R Markdown tutorial is hosted on github. To clone with git, run this command from the terminal:
git clone https://github.com/jaharris/linglist-scrape.git
Simple to the point of trivial, this Ruby program writes results from Linger’s .dat files to a single file with the experiment name automatically appended along with the number of subjects run. Primarily for command line phobics. If Ruby is installed on Windows, simply place in the same folder as your .dat files, and then double click on the icon to run. Also works with Mac and Linux.