Peer-reviewed journal articles
Click the name of a paper to see a summary of what it's about, what's interesting about it, and/or my personal experience with it. Click the "full text" link to read the paper itself. (A * indicates that the paper is a registered report.)
Bernard Jap & Stephen Politzer-Ahles (in press). Visual mismatch negativity indexes automatic lexicality detection. Language, Cognition and Neuroscience . [full text]
In a study not long before this (Politzer-Ahles & Jap, 2024), Bernard and I had found that a certain component of brain activity—the Mismatch Negativity—is capable of automatically detecting abstract linguistic differences, such as the difference between past-tense and present-tense verbs, even when you're not consciously paying attention. In this experiment we tried to see how far this effect generalizes, by trying it again with a few new twists. First of all, we tried doing the visual instead of auditory mismatch negativity. Secondly, we did it in Indonesian instead of English. And thirdly, instead of a past-present distinction, we tried two new distinctions: one between real words and nonwords, and one between nouns and verbs. We found that the visual mismatch negativity was sensitive to the difference between words and nonwords (sort of), but, surprisingly, not to the difference between nouns and verbs. This study raises questions about the limits of what sorts of abstract differences the mismatch negativity is sensitive to, and in what experimental paradigms.
*Bernard Jap, Yu-Yin Hsu, & Stephen Politzer-Ahles (in press). Neural correlates of thematic role assignment for passives in Standard Indonesian. PLoS ONE.
Several decades' worth of studies have shown that sentences in the passive voice elicit a different pattern of brain activity than sentences in the active voice. It has never been clear, though, whether this happens because passive-voice sentences are structurally more complex than active-voice sentences, or because they're less common. In a language like English, those things both happen to be true, so how can we disentangle them? Well, it turns out that in Indonesian, passive voice is used much more often than in English, such that its frequency is much closer to that of active voice. In this registered report, Bernard took advantage of that fact to try to answer that longstanding conundrum. It turned out that Indonesian passive-voice sentences still elicit different brain activity than active-voice sentences, which suggests that this difference is triggered by the syntactic structure of passive voice, not by how common or uncommon it is.
Jennifer Lewendon, James Britton, & Stephen Politzer-Ahles (2024). The Phonological Mapping Negativity (PMN) as a language-specific component: exploring responses to linguistic vs musical mismatch. PLoS ONE, 19, e0315537. [full text]
When people are expecting one word and then hear a word that differs by one sound (e.g., hearing lap when they expected map), they show a particular pattern of brain activity, called the Phonological Mapping Negativity, or PMN for short. This pattern has long been argued to be specific to language-related mismatches, but there was actually little empirical test of that assumption. In this study, Jen systematically tested that assumption, by comparing people's brain responses to linguistic mismatches versus to musical mismatches. We found that the PMN effect indeed only occurred for linguistic mismatches, not for musical mismatches.
Bernard Jap, Stephen Politzer-Ahles, & Yu-Yin Hsu (2024). Are cleft sentence structures more difficult to process? Neuroscience Letters, 843, 138029. [full text]
"Cleft sentences" (like "It was the mouse that ate the cheese") are syntactically more complex than "simple" sentences (like just "The mouse ate the cheese") with the same literal meaning, and previous neurolinguistic research has shown that they elicit a different pattern of brain activity than simple sentences. But, in addition to having extra syntactic structure, in many languages they also differ in other ways—for example, they just have more words. This makes it hard to tell if the different brain activity they elicit is really because of their additional syntactic structure, or because of those other surface-level differences. In this study, Bernard solved that problem by taking advantage of sentence structure in Indonesian, where cleft sentences and simple sentences have the same surface word order and number of words (they're pretty much just distinguished by a prefix). We found that they still elicit that extra brain activity, even in Indonesian. This suggests that that brain activity is really due to abstract syntactic structure.
*Stephen Politzer-Ahles & Bernard Jap (2024). Can the mismatch negativity really be elicited by abstract linguistic contrasts? Neurobiology of Language, 5, 818-843. [full text]
The mismatch negativity is a brain response that occurs when a person subconsciously detects a difference between two categories of stimuli (as in, for instance, ba ba ba ba ba ba pa). It has long been thought to be an index of the brain's detection of any changes, including differences between abstract categories. That explanation had not been strongly put to the test, however, because almost all previous studies involved situations in which the difference between categories can be recognized based on some kind of physical cue. This study tested for abstract change detection by presenting listeners with purely abstract linguistic contrasts with no reliable physical cue: the difference between past-tense and present-tense verbs (as in, e.g., chose sang bled swore clung pave). Amazingly, the brain still detected this sort of abstract contrast—even with no physical correlate, and even without listeners actively paying attention to these streams of stimuli. This provides the strongest evidence to date that the mismatch negativity really is an index of the brain's automatic detection of differences between abstract categories of stimuli.
Roman Sarrazin-Gendron, Parham Ghasemloo Gheidari, Alexander Butyaev, Timothy Keding, Eddie Cai, Jiayue Zheng, ..., Borderlands Science players, ..., & Jérôme Waldispühl (2024). Improving microbial phylogeny with citizen science within a mass-market video game. Nature Biotechnology.
Ok, now this one is just a little bit of fun. I'm not really an author on it in any proper sense; I'm more like a participant.
What the paper describes is a "citizen science" exercise. The authors had a DNA data problem that was computationally intractable for computers, so instead they "gamified" it into a puzzle game, "Borderlands Science", which they embedded within the popular looter-shooter video game Borderlands 3. Over the next few years, about 4 million people played "Borderlands Science", contributing a ton of puzzle solutions which led to actual breakthoughs in the processing of those DNA data. This paper describes that process, and how the approach can be used for other "citizen science" research endeavors. The authors generously listed "Borderlands Science players" as coauthors, and I am one of those 4 million people who played the game, so I'm sort of a coauthor on this paper. But I didn't make any contribution other playing a bunch of Borderlands Science (and enjoying some of that sweet sweet XP), so I'm really more of a participant than an author. But it's a super cool project and I'm so happy that I was a small (like 1/4,000,000th) part of it.
Wenting Xue, Meichun Liu, Stephen Politzer-Ahles, & Ovid Jyh-Lang Tzeng (2024). Verbal effect on the processing of complement coercion: distinguishing between aspectual verbs and psych verbs. Lingua, 306, 103754. [full text]
Many verb-object combinations require you to re-interpret the object as an event in order to understand them. For example, if I say "She finished the book", you need to re-interpret what "the book" means, because a book is a thing which you can't actually start or finish. You can start or finish doing something; accordingly, we normally interpret "She finished the book" as meaning something like (depending on the context) she finished WRITING the book, she finished READING the book, etc. A lot of previous research has examined the process of reading structures like that, but much of that research has mixed together two different types of verbs: aspectual verbs (as in "finished the book") and psych verbs (as in "enjoyed the book"), which have different linguistic properties. In this study, Wenting systematically examined sentences with these two types of verbs separately, and found that only the aspectual type of verb elicited the abovementioned reinterpretation-of-object sort of processing costs.
Jennifer Lewendon, Stephen Politzer-Ahles, & James Britton (2023). The MMN by another name? Exploring the autonomy of the Phonological Mapping (Mismatch) Negativity. Language, Cognition and Neuroscience, 38, 1098-1114. [full text]
When people are expecting one word and then hear a word that differs by one sound (e.g., hearing lap when they expected map), they show a particular pattern of brain activity, called the Phonological Mapping Negativity, or PMN for short. This pattern has long been argued to depend on attention (i.e., it was thought to only occur when listeners are consciously paying attention to what sound they expect), but there was actually little empirical test of that assumption. In this study, Jen systematically tested that assumption, by using a clever design in which we could present listeners with unexpected sounds in a stream they are paying attention to or in a stream they are not paying attention to. She found that the PMN brain response indeed only occured for unexpected sounds when people are paying attention. This study thus provided one of the first direct tests of the claim that the PMN might be different than similar brain responses that can occur regardless of attention.
Stephen Politzer-Ahles, Lei Pan, Jueyao Lin, & Ka Keung Lee (2023). Long-lag identity priming in the absence of long-lag morphological priming: evidence from Mandarin tone alternation. Glossa: Psycholinguistics, 2, 2. [full text]
Decades of previous research have shown that people can recognize a word faster if they have recently seen or heard that same word—e.g., you can respond to the word cat a bit faster if you saw the same word a few minutes ago, compared to if you had not. Importantly, this also happens if you recently saw a different "version" of that word, or another word that includes that same word; for example, you still get that speed-up for recognizing cat if the word you saw a few minutes ago was cats, or catwalk. This phenomenon is known as long-lag priming. In this study, we tried doing that with different pronunciation variants of Mandarin words. For example, the word written 打 is usually pronounced daL (with a Low tone), but in some contexts it's pronounced daR (with a Rising tone). Surprisingly, we found—across three experiments with a total of 458 participants, which is quite massive for this sort of research—that hearing one of these did not speed up people's responses to hearing the other one later, even though they are just different versions of the same word! This study, therefore, revealed that there are some up-until-then unknown limitations on the conditions in which long-lag priming can occur. (Old versions: poster 1, poster 2, poster 3, slides)
Yao Yao, Katrina Connell, & Stephen Politzer-Ahles (2023). Hearing emotion in two languages: a pupillometry study of Cantonese-Mandarin bilinguals' perception of affective cognates in L1 and L2. Bilingualism: Language and Cognition, 26, 795-808. [full text]
This study (which is mainly Yao and Katrina's work; I just provided some feedback throughout the process) examined how emotionally aroused people get when seeing taboo words (like swear words) and other emotional words in their first language vs. their second language. Yao and Katrina did this with a pretty cool methodology called pupillometry: specifically, they measured how wide people's pupils got when they were listening to these words. When you hear something that really grabs your attention or riles up your emotions, your pupils tend to dilate a little. Using this method, Yao and Katrina found that taboo words elicited a higher emotional response (bigger pupils) than any other type of word they tested, but only in people's first language; people hearing taboo words in their second language didn't show this extra emotional response.
Xiaocong Chen, Caicai Zhang, Yiya Chen, Stephen Politzer-Ahles, Yuyu Zeng, & Jie Zhang (2022). Encoding category-level and context-specific phonological information at different stages: an EEG study of Mandarin third-tone sandhi word production. Neuropsychologia, 175, 108367. [full text]
For this study, Xiaocong devised a "cleaner" version of the speech production paradigm that Jie, Caicai and I had used in our previous (2022) experiment. Whereas for that one we created a new experiment paradigm out of whole cloth—which made it quite difficult to interpret what the brainwaves elicited in that experiment mean—for this one Xiaocong was able to take an existing and much better understood paradigm, production priming, and deploy it to test the question of when different forms of a word are activated during speech. We found converging evidence aligning with what our previous paper had suggested, that people think of a word's underlying form and then compute the other way(s) it should be pronounced in the context in which they have occurred.
Jie Zhang, Caicai Zhang, Stephen Politzer-Ahles, Ziyi Pan, Xunan Huang, Chang Wang, Gang Peng, & Yuyu Zeng (2022). The neural encoding of productive phonological alternation in speech production: Evidence from Mandarin Tone 3 sandhi. Journal of Neurolinguistics, 62, 101060. [full text]
This study, on which Jie and Caicai did the lion's share of work, demonstrates that different patterns of brain activity are elicited when people say words which include parts that must undergo a pronunciation change, versus words that don't. This was an extremely complicated experiment to run because we basically had to design a new kind of experiment in order to test this issue in a controlled way, and it took lots of trial and error to get it to actually work. The results ultimately select that speech production doesn't just use stored forms; rather, during the process of saying words, we retrieve words' underlying forms but then do some mental computation to figure out how they're supposed to be pronounced in their particular context.
Teresa Girolamo, Stephen Politzer-Ahles, Samantha Ghali, & Brittany Williams (2022). Preliminary evaluation of applicants to master's programs in speech-language pathology using vignettes and criteria from a holistic review process. American Journal of Speech-Language Pathology, 31, 552-577. [full text]
In this paper which was mainly Tree Girolamo's brainchild, we examined whether "holistic review" of applications to masters degree programs can really increase the variation in what kinds of students (from what kinds of backgrounds) get accepted. We developed fake profiles for applicants to speech-language pathology programs, controlled for certain characteristics, and had members of those programs rate them using holistic review criteria. We found that, even with holistic review, applicants from a particular overrepresented profile still received the highest ratings.
Stephen Politzer-Ahles, Jueyao Lin, Lei Pan, & Ka Keung Lee (2022). N400 evidence that the early stages of lexical access ignore knowledge of phonological alternations. Language and Speech, 65, 354-376. [full text]
The same word can often be pronounced in different ways in different contexts. This means that to "expect" a certain word might not necessarily mean you expect a certain pronunciation of that word. In this study, we used brainwaves to examine what happens when you are expecting a certain word and what you end up hearing is a different pronunciation of that same word. Surprisingly, we found that hearing a different pronunciation of the word you expected is just as surprising (in terms of the brain activity it elicits) as hearing a totally different word. This suggests that there is some stage of language comprehension during which listeners are making predictions specific to certain forms of words, and not considering other possible forms that the same word could take. (Old versions: slides 1, slides 2)
Mehdi Bakhtiar, Maryam Mokhlesin, Chotiga Pattamadilok, Stephen Politzer-Ahles, & Caicai Zhang (2021). The effect of orthographic transparency on auditory word recognition across the development of reading proficiency. Frontiers in Psychology, 12, 3129. [full text]
This study examined word reading in children learning Persian, a language whose writing system has pretty interesting properties. It's based on Arabic; accordingly, short vowels aren't written in Persian writing. When young children are first learning, however, these vowels are written. By third grade, they get phased out. This provides a unique natural opportunity to study reading of transparent orthographies (writing systems in which all sounds correspond to letters and vice versa) versus opaque orthographies (writing systems in which sounds don't always correspond to letters), within the same population and the same language. Mehdi's experiment found that, indeed, the speed with which children read words with short vowels goes through changes (consistent with the processing differences thought to occur when reading opaque as opposed to transparent orthographies) at third grade, right around when they are being removed from the writing.
Wenting Xue, Meichun Liu, & Stephen Politzer-Ahles (2021). Processing of complement coercion with aspectual verbs in Mandarin Chinese: evidence from a self-paced reading study. Frontiers in Psychology, 12, 1973. [full text]
The experiment described in this paper showed that, in Mandarin Chinese, verb-object combinations which require re-interpreting the object as an event (as in "finished the book", which has to be interpreted as something like "finished WRITING the book" or "finished READING the book") elicit extra processing costs in reading. This pattern had been widely shown in English, but this study extended that finding to Mandarin, demonstrating that this language processing phenomenon holds across languages with very different typological properties.
Stephen Politzer-Ahles & Suyeon Im (2020). Mismatch negativity is not always modulated by lexicality. Frontiers in Human Neuroscience, 14, 459. [full text]
Since the early 2000s, a particular component of brainwave activity was thought to be particularly sensitive to real words. This component is a part of the brainwave called the mismatch negativity, and some studies from that time had seemed to show that real words elicit a bigger mismatch negativity than fake words. In this paper, however, we report two high-power studies showing that this is actually not the case; in our two studies, with a far larger sample size than any previous one, real words did not elicit larger mismatch negativities than fake words. In this paper we also provided a more comprehensive review of the literature, showing that there were actually multiple other previous experiments failing to find this difference. (Old versions: slides)
Stephen Politzer-Ahles (2020). What can electrophysiology tell us about the cognitive processing of scalar implicatures? Language and Linguistics Compass, 14, 1-22. [full text]
This is not a empirical paper (i.e., not a paper presenting new results from some experiment), but rather is a literature review: a paper that attempts to tie together a bunch of different research to hopefully synthesize some new insights or conclusions. In particular, this paper takes up the growing body of research that has been using brainwave research to study how we "read between the lines" to determine what certain expressions are intended to mean. This sort of "between-the-lines" meaning is called pragmatic meaning in linguistics. While much of that research had taken the perspective of trying to find the "neural correlates" of pragmatics, this paper argues that much of the previous research (including my own) can't actually identify neural correlates of pragmatics. The paper wraps up by suggesting some different experiment design approaches that can be used to more effectively tackle that question, and highlights a few model examples of that kind of research. (Old versions: slides)
Stephen Politzer-Ahles, Teresa Girolamo, & Samantha Ghali (2020). Preliminary evidence of linguistic bias in academic reviewing. Journal of English for Academic Purposes, 47, 100895. [full text]
For a few years before this paper came out, a debate had been raging over whether there's a such thing as "linguistic injustice" in academic publishing—specifically, whether writers of scholarly work have a more difficult time of it if their first language is not English. (I should also acknowledge here that I bear some of the responsibility for starting this debate to begin with, specifically because of Politzer-Ahles et al. 2016 in Journal of Second Language Writing.) The debate was kind of intractable because arguments in favor of the existence of linguistic injustice were based on speculation (like "here are reasons that it must be hard, right?") or on self-report (like "I had a hard time of it") or other methods that left a lot of confounds. What we did in this study was the first-ever blinded randomized-control-trial sort of study to show that scholarly work written in "non-native"-sounding English get judged as being less "scientific", even when its scientific content is identical to work written in "native"-sounding English. This makes it the first study to provide hard evidence that [one particular source of] linguistic injustice really exists.
Since the advent of generative AI, there have been more and more papers saying something along the lines of "generative AI can help reduce linguistic injustice by helping people proofread their papers" and then citing this paper. I do not endorse that and I don't think our paper ever expressed such a position; while we didn't say much in the way of recommendations, what little we did say was squarely focused on how linguistic injustice should be reduced by reducing reviewers' biases (since reviewer bias is what this paper was actually about). We never recommended that it should be scholar-writers' own responsibility to use any tools to make their writing better conform to a certain variety of English, and I regret that we didn't make that clearer.
Stephen Politzer-Ahles, Ka Keung Lee, & Lue Shen (2020). Ganong effects for frequency may not be robust. JASA Express Letters, 147, EL37-EL42. [full text]
When people hear an ambiguous sound, the way they ultimately interpet it is affected by context. For example, if you take a sound that's halfway between a [d] and a [t], people will be more likely to interpret it as a [t] if you put it in the context _ask (where [t] makes a real word "task") than if you put that exact same sound in the context _esk (where [t] makes a fake word *"tesk"). This paper reports an experiment in which we tried to see if the same effect would happen with contexts that make a common vs. an uncommon word; for example, would people be more likely to interpret this sound as [t] in _ime (where "time" is more common than "dime") and less likely to interpret it as [t] in _or (where "tore" is less common than "door")? It turns out that they're not: people's judgments of ambiguous sounds were influenced by whether the context makes a real word or a fake word, but not by whether the context makes a common or uncommon word. (Old versions: poster 1, poster 2)
Seán Roberts, Christine Cuskley, Stephen Politzer-Ahles, & Tessa Verhoef (2020). Double-blind reviewing and gender biases at EvoLang conferences: an update. Journal of Language Evolution, 5, 92-99. [full text]
This is a neat project that I only contributed a little bit to but am very fortunate to have had the opportunity to be a part of. Seán Roberts had previously done an analysis of submissions to a major conference on language evolution, and had found evidence that there was potential gender bias before (with abstracts written by women getting lower ratings than they perhaps should have gotten) but that this effect was mitigated when the conference switched to double-blind reviewing. In this new paper, Seán added data from the [then] latest iteration of the conference, to see how the situation had changed since his previous paper had made the field aware of this issue. The pattern is quite interesting. The fascinating details are in the paper, but the short version is: the pattern of data suggests (or at least is not inconsistent with the possibility) that the experience of having double-blind reviewing at the previous year's conference led authors to change how they wrote their abstracts in the following year.
*Stephen Politzer-Ahles & Lei Pan (2019). Skilled musicians are indeed subject to the McGurk effect. Royal Society Open Science, 6, 181868. [full text]
The McGurk effect is a classic psycho-perceptual phenomenon: splice together a video of someone saying "ba" with audio of them saying "ga", and people will think the person is saying "da". A couple years before publishing this paper, Lei and I noticed a paper claiming that people with music experience are not susceptible to this perceptual illusion. We had some concerns with the data and statistical analysis being used to make that claim, and at the time Lei had just finished a psycholinguistics class and wanted more, so we decided to do our own replication of that study. We found that, contrary to the original study, musicians do still get the McGurk effect, and they do so just as much as everyone else.
This paper also happens to be my first ever registered report. Registered reports are a special new way of publishing academic research. In the traditional way, you spend a long time doing some research and writing up a paper, then it goes out to reviewers, and it might get accepted for publication or the reviewers might be like "actually you were doing X Y and Z wrong all along, go redo it before it's publishable". In a registered report, on the other hand, you first send out your experiment plan to reviewers before actually doing the experiment, and they give you all that constructive criticism before you spend time actually doing the experiment. That way, once you actually start doing the experiment, you're doing something that's been carefully vetted and is guaranteed to be published, regardless of how the results come out (as long as you follow the plan that was vetted). This both saves researchers the time they would have otherwise spent doing experiments that might have had problems, and also removes the pressure to massage your data to make it look "publishable".
Mante Nieuwland, Dale Barr, Federica Bartolozzi, Simon Busch-Moreno, Emily Darley, David Donaldson, Heather Ferguson, Xiao Fu, Evelien Heyselaar, Falk Huettig, E. Matthew Husband, Aine Ito, Nina Kazanina, Vita Kogan, Zdenko Kohút, Eugenia Kulakova, Diane Mézière, Stephen Politzer-Ahles, Guillaume Rousselet, Shirley-Ann Rueschemeyer, Katrien Segaert, Jyrki Tuomainen, Sarah Von Grebmer Zu Wolfsthurn (2019). Dissociable effects of prediction and integration during language comprehension: evidence from a large-scale study using brain potentials. Philosophical Transactions B, 375, 20180522. [full text]
This paper is sort a companion piece to Mante Nieuwland's multi-lab study (Nieuwland et al., 2018) discussed a few entries below. We used the data from that same experiment (it was such a massive undertaking, it would be a shame not to do lots with those data) and did some additional analysis of the data to answer new questions. Specifically, we found that the N400, which is often talked about as if it's one brainwave, actually represents multiple sub-components. Using fancy schmancy statistical analysis on this huge and well-controlled dataset, Mante was able to show that how predictable a word is in the context of a sentence and how plausibly the word fits in the sentence actually have two different, albeit overlapping, effects on the brain signal; it just takes fancy stats and tons of data to disentangle these effects. (preprint)
I-Hsuan Chen, Stephen Politzer-Ahles, & Chu-Ren Huang (2018). Determining the types of contrasts: the influence of prosody on pragmatic inferences. Frontiers in Psychology, 9, 2110. [full text]
If someone says, "I didn't even see one cat", that means they didn't see one cat as opposed to... what, exactly? I didn't see one cat, let alone ten cats? Or I didn't see one cat, let alone one lion? One of these interpretations may make more or less sense depending on the context, of course. But where you place the stress also matters; "I didn't even see ONE cat" encourages a different interpretation than "I didn't even see one CAT". In the series of experiments described in this paper, I-Hsuan and I demonstrated quantitatively that changing the stress does indeed change which interpretation people say they got out of these kinds of sentences.
Stephen Politzer-Ahles & Page Piccinini (2018). On visualizing phonetic data from repeated measures experiments with multiple random effects. Journal of Phonetics, 70, 56-69. [full text]
This is not an empirical study (i.e., it's not reporting new findings from some experiment). Rather, it's a primer about how to create data visualizations and what sort of considerations to take into account when deciding how to visualize data. The most important part, I think, is one of the early figures, which demonstrates that a pair of condition means with error bars on them actually provides no information about whether two conditions are significantly different or not, if the conditions were manipulated in a repeated measures design. That sounds nitpicky if you're not a statistics person. But it was (and unfortunately still is) very common practice for people to assume two conditions are not significantly different because "their error bars cross", and that conclusion is not necessarily right—as our figure in this paper shows, two conditions can have almost completely overlapping error bars but still be statistically super-significanlty different. This paper was our little attempt to get people to switch to making graphs that actually illustrate the statistical comparisons they intend to make.
Mante Nieuwland, Stephen Politzer-Ahles, Katrien Segaert, Emily Darley, Nina Kazanina, Sarah Von Grebmer Zu Wolfsthurn, Federica Bartolozzi, Vita Kogan, Aine Ito, Diane Mézière, Dale J. Barr, Guillaume Rousselet, Heather J. Ferguson, Simon Busch-Moreno, Xiao Fu, Jyrki Tuomainen, Eugenia Kulakova, E. Matthew Husband, David I. Donaldson, Zdenko Kohút, Shirley-Ann Rueschemeyer, Falk Huettig (2018). Large-scale replication study reveals a limit on probabilistic prediction in language comprehension. eLife, 7, e33468. [full text]
This paper represents a huge project, which was Mante Nieuwland's brainchild. He noticed that one of the classic and influential studies in neurolinguistics, on which much of our understanding of how prediction works in language is based, had actually almost never been replicated—which is not ideal, since replication is the gold standard of reliability in science. Given how important it was to replicate that study, he assembled a massive team of neurolinguistics across nine different laboratories to run simultaneous high-powered replications of that study, all while confirming to the highest standards of scientific replicability (e.g., pre-registration). The publication of this study sparked about huge surge of interest in neurolinguistic studies on prediction, and brought the issue of replicability to greater attention within the neurolinguistics community. (This study often gets described as failing to replicate the study on which it is based, but that's a bit of a mischaracterization. We actually did find a small effect consistent with what they found—albeit one that we only detected in exploratory analysis. But we also found that that effect was so small that it could only be meaningfully measured/studied in truly massive studies, bigger than anyone was or is doing, and so all the research done before couldn't tell the field much about the nature of this effect.) My role, other than running one of those nine experiments, was to set up some of the experiment software (used in about half of the labs) and help with running and reporting the statistical analysis. (Old versions: preprint)
Stephen Politzer-Ahles & E. Matthew Husband (2018). Eye movement evidence for context-sensitive derivation of scalar inferences. Collabra: Psychology, 4, 3. [full text]
One of my older studies (Politzer-Ahles & Fiorentino, 2013) had shown that, while interpreting some as meaning "not all" doesn't require any extra cognitive effort, it does depend on context: in some contexts we interpret some as "not all" and in some contexts we don't. Specifically, the old study showed that by showing that how we interpret some impacts how fast we recognize some other words later on (because interpreting some as "not all" will make us expect those words more). This study dug deeper into that finding by using eye-tracking, which gives much more fine-grained measures of how people read. It allowed us to pinpoint what stage of processing is impacted by people's interpretation of some; specifically, it's the early, predictive sort of processing. That provides further evidence that people really are computing different interpretations of some on the fly, and using those to predict upcoming words. (Old versions: poster)
Kevin Schluter, Stephen Politzer-Ahles, Meera Al-Kaabi, & Diogo Almeida (2017). Laryngeal features are phonetically abstract: mismatch negativity evidence from Arabic, English, and Russian. Frontiers in Psychology - Language Sciences, 8, 746. [full text]
Across a series of brainwave experiments in English, Arabic, and Russian, we found the same pattern of asymmetrical brain responses to sound contrasts. Specifically, the brains of listeners from all three of these language groups appear to treat voiced sounds like [z] as less "marked" than voiceless sounds like [s]—in other words, [s] is more different from [z] than [z] is from [s], in terms of how the brain reacts to the contrast between the two sounds. This happened regardless of how the sounds are physically realized (e.g., the difference between voiceless [s] and voiced [z] is physically quite different than the different between voiceless [t] and voiced [d]), and regardless of hwo the voiced-voiceless contrast is realized in the particular language (English is a language in which voiceless sounds are acoustically more "marked" than voiceless, whereas Arabic is the other way around, for example). This study, therefore, provided pretty compelling evidence that the way the brain responds to these contrasts is not purely based on their acoustics (how they physically sound), but on the abstract mental organization/structure of sounds.
Jiayu Zhan, Xiaoming Jiang, Stephen Politzer-Ahles, & Xiaolin Zhou (2017). Neural correlates of fine-grained meaning distinctions: an fMRI investigation of scalar quantifiers. Human Brain Mapping, 8, 3848-3864. [full text]
This study used neuroimaging to examine what brain areas are involved both in interpreting SOME as meaning "not all", and also in interpreting SOME in meaning "not most". I only had a very small role in this study (mainly just providing some feedback on the idea and helping frame the arguments in the paper), but what I really like about it is that Jiayu managed to experimentally look at these two different interpretations of SOME. In the field of experimental pragmatics (research on how we "read between the lines" to figure out what things are intended to mean), everybody and their grandmother has been looking at SOME-as-not-all for decades, but hardly anyone did experiments on the other things that it means. (I had tried something similar some years before, but with less success than Jiayu's study here.)
Stephen Politzer-Ahles, Ming Xiang, & Diogo Almeida (2017). "Before" and "after": investigating the relationship between temporal connectives and chronological ordering using event-related potentials. PLoS ONE, 12, e0175199. [full text]
Before this, a lot of studies had shown that sentences like "Before this happened, that happened" take a bit more cognitive effort to read than sentences like "After that happened, this happened". But it was never actually clear if this difference occurs because "Before X, Y" presents the events out of order (this is the explanation that pretty much all the previous research assumes), or because before is actually special and different than after (there are actually linguistically-motivated reasons to assume this; the semantics and pragmatics of before really are kind of different). Our study disentangled these, by trying out a different sentence structure: "This happened before that" vs. "This happened after that". We found that it's always the out-of-order sentences—regardless of whether they use "before"—that trigger extra processing costs. So in a way this study confirmed what the field had always assumed, but nobody had actually demonstrated that yet. (Old versions: poster)
Stephen Politzer-Ahles (2017). An extension of within-subject confidence intervals to models with crossed random effects. The Quantitative Methods for Psychology, 13, 75-94. [full text]
This paper proposes a new way to draw error bars on graphs so that they more accurately reflect which conditions are significantly different from which. Error bars representing traditional confidence intervals don't do that for data with repeated measures (e.g., comparing different conditions within the same individuals). Solutions for that situation have been around since the '90s (although they are still almost never used). But those solutions don't address the situation of having multiple kinds of repeated measures at once. This situation is very common in many research areas now—e.g., psycholinguistic research involves comparions different conditions within the same people and the same words. The method for calculating error bars described in this paper can handle this situation.
For updated, better code, as well as an easier-to-follow worked example, see LMEM-based intervals for error bars in graphs.
Stephen Politzer-Ahles, Kevin Schluter, Kefei Wu, & Diogo Almeida (2016). Asymmetries in the perception of Mandarin tones: evidence from mismatch negativity. Journal of Experimental Psychology: Human Perception and Performance, 42, 1547-1570. [full text]
Across three experiments, each comparing Mandarin-speaking vs. non-Mandarin-speaking volunteers, we found evidence that Mandarin "Tone 3" is processed differently in the brain than the other three tones in the language. More importantly, in the last experiment we found that some part of this difference is not due to physical aspects of how it sounds, but due to Mandarin speakers' abstract linguistic knowledge of the phonological properties of this tone.
This was one of my favourite experiments to have ever been a part of doing, and one that I am the most proud of. One reason is because it was really a process of pursuing scientific discovery. The whole experiment came from a "failed" experiment (which I later followed up in Politzer-Ahles & Im, 2020), and we just kept doing experiment after experiment to figure out what had gone wrong in that "failed" experiment, ultimately leading to these new, important, and totally unexpected discoveries. Another fun thing about it is that Luck (2014:137), essentially the brainwave bible, describes a hypothetical experiment like this (specifically, one testing the same manipulation on one group of participants who knows the language and one who does not) as being the best kind of experiment design but one that is extremely "difficult and time-consuming". I hadn't been aware of that tidbit until some years after doing the experiment, but knowing that we unwittingly did an experiment that even Steve Luck would approve of makes me feel warm inside. (Old versions: poster 1, poster 2, slides, proceedings paper)
Kevin Schluter, Stephen Politzer-Ahles, & Diogo Almeida (2016). No place for /h/: ERP investigation of English fricative place features. Language, Cognition and Neuroscience, 31, 728-740. [full text]
The first in a series of papers that Kevin, Diogo and I did on MMN asymmetries. This topic is ridiculously complicated but the short version is, the brain seems to register certain sound contrasts as being more "different" if you hear one then the other, but less different if you hear them the other way around. In this case, it was that [f]-[s] gets registered as a bigger "difference" than [s]-[f], but [h]-[s] vs. [s]-[h] show no such asymmetry. The reason this weird thing is surprisingly important is because it actually provides brain-level evidence for a particular theory of how the features of sounds are abstractly represented in the mind. (Old versions: poster)
Robert Fiorentino, Stephen Politzer-Ahles, Natalie Pak, María Teresa Martínez-García, & Caitlin Coughlin (2015). Dissociating morphological and form priming with novel complex word primes: evidence from masked priming, overt priming, and event-related potentials. The Mental Lexicon, 10, 413-434. [full text]
The experiments in this paper examined how people break down made-up compound words (like drugrack—not a real word, but made by combining two real words) as opposed to made-up words that aren't even compounds (like slegrack). For reasons that the paper gets into, these are a really good testing ground for the longstanding question of whether and how the mind breaks words down into their parts when we read or hear them. The experiment provided new (at the time) evidence that our minds really do attempt to break words down into all their possible parts when we see them.
Stephen Politzer-Ahles & Laura Gwilliams (2015). Involvement of prefrontal cortex in scalar implicatures: evidence from magnetoencephalography. Language, Cognition and Neuroscience, 30, 853-866. [full text]
The experiment described in this paper demonstrated that a certain portion of lateral prefrontal cortex is more active when it's hard to interpret some as meaning "not all" (i.e., when there is little contextual support for it) than when it's easy to. This was actually quite striking, because traditional theories of this sort of meaning processing had assumed that if there is any computational effort involved in realizing that "not all" meaning then it would be all-or-nothing (i.e., when you interpret some as meaning "not all" you work hard for it, and when you don't you don't). This study showed instead that that same computation can be hard or easy depending on how much contextual support there is. That finding is most consistent with a theory of processing that Judith Degen had recently proposed (which we discuss in more detail in the paper). Another thing I find important about this paper is a few paragraphs in the introduction which talk about experimental issues and confounds that need to be ruled out before you can conclude that an experiment really shows that implicatures (e.g., interpretations of some as meaning "not all") are hard to derive. Unfortunately, many (but not all!) studies since then have still made such claims without addressing those confounds. (Old versions: poster)
Stephen Politzer-Ahles & Jie Zhang (in press). Evidence for the role of tone sandhi in Mandarin speech production. Journal of Chinese Linguistics Monograph Series. [full text]
The experiments in this paper used a funky 1970s psychological paradigm called "implicit priming" to demonstrate that, when people go to articulate a word, they use mental representations of not only the word's underlying form (how it's actually stored in the mind) but also its surface form (how it's pronounced in a certain context). This shows that pronunciation-changes-in-context aren't just something that happens in the mouth, but something that actually is programmed for in the mind before the initiation of articulation.
But to be honest, the most interesting thing about this paper is not its content, but its publication history. I presented this study at a conference in 2012, and the conference attendees were invited to later write up papers for a journal special issue. I did that, and after a few rounds of revision it was accepted for publication in 2014. Since then it seems like the editor was unable to wrangle revisions from all the other authors, one thing led to another, and the special issue never got published. So this paper has been "accepted for publication" since 2014 (so, for 11 years, as of my writing this in 2025) and probably will never actually be published. Several of my colleagues with other papers accepted for this issue withdrew them and published them elsewhere, but I never did with this one; since it's been cited a little, and is findable here and on other resources like the Internet Archive, hopefully it's "out there" enough to have done some good for the field, but I'm not really prepared to do any more with it. (Old versions: proceedings paper)
Stephen Politzer-Ahles & Robert Fiorentino (2013). The realization of scalar inferences: context sensitivity without processing cost. PLoS ONE, 8, e63943. [full text].
The experiment in this paper used measurements of people's reading times to demonstrate that understanding some as meaning "not all" doesn't actually take extra time or effort, compared to understanding it as meaning "more than zero". This actually goes against a lot of previous research, which had argued that the "not all" meaning takes extra cognitive effort to realize, but which we argue had un-addressed confounds. The finding of this paper unfortunately remains pretty ignored; lots of current research still repeats the claim that pragmatic meanings (like the interpretation of some as meaning "not all") take extra cognitive effort, as if this claim is a done deal, even though this and several other studies since have contradicted that. From time to time (fortunately very rarely) this paper even gets cited together with a pile of other papers at the end of a sentence saying scalar implicatures are cognitively effortful, which is the opposite of what this paper claims. (Old versions: poster)
Lamar Hunt III, Stephen Politzer-Ahles, Linzi Gibson, Utako Minai, & Robert Fiorentino (2013). Pragmatic inferences modulate N400 during sentence comprehension: evidence from picture-sentence verification. Neuroscience Letters, 534, 246-251. [full text]
The experiment in this paper used EEG to demonstrate that, when people are in the process of reading a sentence, their predictions about upcoming words aren't just driven by sentence structure and literal meaning, but are also driven by the implied meaning(s) of the unfolding sentence. Previous studies had been argued to have shown that, but they all had various confounds which made it impossible to tell if they were really showing that or showing something else. Lamar's study here controlled that issue better than any study had before, by presenting sentences in well-controlled picture contexts—a method which has since been used on lots of other research building off of this technique to investigate even more nuanced and fine-grained kinds of distinctions.
Hyunjung Lee, Stephen Politzer-Ahles, & Allard Jongman (2013). Speakers of tonal and non-tonal Korean dialects use different cue weightings in the perception of the three-way laryngeal stop contrast. Journal of Phonetics, 41, 117-32. [full text]
The experiment in this paper demonstrated that, when Korean speakers from Seoul and Kyungsang perceive the differences between Korean stop consonants like ㄱ/ㅋ/ㄲ, they rely on different cues to different extents—Seoul listeners rely on both aspiration and fundamental frequency in a complex trading relationship, whereas Kyungsang listeners rely less on fundamental frequency. It's actually a really important and impactful study, and the role I had in it was very small and is the weakest part of the study. (Like honeydew or Jared Leto, I was the worst part of this thing that I was in.) Specifically, I helped with the statistics, and this was back before I knew how to do multilevel models (a.k.a. mixed-effects models), so I used a 1990s analysis. We managed to get it published, but a reviewer called it a "poor man's multilevel model", and that comment is what drove me to learn how to do multilevel models. (Old versions: poster)
Stephen Politzer-Ahles, Robert Fiorentino, Xiaoming Jiang, & Xiaolin Zhou (2013). Distinct neural correlates for pragmatic and semantic meaning processing: an event-related potential investigation of scalar implicature processing using picture-sentence verification. Brain Research, 1490, 134-152. [full text]
The experiments in this paper demonstrated that sentences which imply something false elicit different patterns of brain activity than sentences which literally mean something false. Note: since this came out, it has frequently been cited as evidence that the brain processes pragmatics (i.e., what things imply) differently from semantics (i.e., what things literally mean), and I kind of claimed that in the paper itself at the time. But I no longer believe this paper really supports that claim; see Politzer-Ahles (2020) for my own takedown of this paper. (Old versions: poster 1, poster 2, poster 3, master's thesis)
Chapters
Stephen Politzer-Ahles, Julie S., Chen, & I-Hsuan Chen (2024). Is self-paced listening sensitive to downstream consequences of focus? In Ivanova, O., Nandi, A, & Prasannanshu [eds.], Psycholinguistic Approaches to the Study of Linguistic Structures: Language in the Mind, Cambridge Scholars. [full text]
Inn an earlier study, I-Hsuan and I had shown that stress influences what people infer about sentences with "even". For example, "I didn't even see ONE cat" suggests something like "...let alone ten cats!", whereas "I didn't even see one CAT" suggests something like "...let alone one dog". In this study, we used a technique called self-paced listening to examine how quickly people understand words as they are in the process of listening to sentences. We had expected that people might be slow to understand continuations that don't fit the stress (e.g., "I didn't even see ONE cat, let alone one dog"). Unexpectedly, though, we found that people didn't slow down in this situation. The results suggested that the impact of stress on the interpretation of "even" sentences might not be enough to influence the predictions people are making in real-time as the sentence is unfolding—or, alternatively, that the self-paced listening method is simply not sensitive to this sort of thing.
Stephen Politzer-Ahles & Si Chen (2019). Significance. The SAGE Encyclopedia of Human Communication Sciences and Disorders. [full text]
This is a [very!] brief introduction to the concept of statistical significance and how it is used in language sciences.
Letters/commentaries/errata
Stephen Politzer-Ahles, Seán Roberts, Christine Cuskley, & Tessa Verhoef (2019). Errata for Roberts & Verhoef (2016). Journal of Language Evolution, 4, 140-141. [full text]
This is a minor correction to a very interesting and important paper by Roberts and colleagues. They had examined abstracts submitted to a conference on language evolution, and had found that (1) abstracts written by women in previous years had gotten lower evaluations, possibly because of gender bias; and (2) when they switched the abstract reviewing to a double-blind format, this bias went away. While this finding has major implications for the field, one [out of many] of their statistical analyses had an error. We worked together to put out this erratum correcting that statistical error.
Stephen Politzer-Ahles, Jeffrey J. Holliday, Teresa Girolamo, Maria Spychalska, & Kelly Harper Berkson (2016). Is linguistic injustice a myth? A response to Hyland (2016). Journal of Second Language Writing, 34, 3-8. [full text]
Probably the most impactful thing I've ever written—or at least the only one to kick off a major debate—and it's not even a proper paper, it's just a "letter"! Anyway, the context for this is that Ken Hyland published a paper (also in 2016) talking about the concept of "linguistic injustice", i.e., the idea that scholars whose first language is not English are unfairly disadvantaged in scholarly publishing. Hyland argued that this idea is a myth which lacks actual evidence. We wrote this paper in response, arguing that linguistic injustice may be real—more specifically, Hyland had raised some arguments claiming that people who learned English as a first language have it just as hard as people who didn't, and we argued that those arguments were incorrect. Responses to this response ensued, and ultimately spurred a whole cottage industry of papers attempting to resolve this debate which Hyland's paper and ours sort of created. (Or, to be more specific: inklings of that debate were already present in the literature, but I think this pair of papers is what turned the debate into a "thing".)
Manuscripts
Stephen Politzer-Ahles, Katrina Connell, Lei Pan, & Yu-Yin Hsu (under revision). Mandarin third tone sandhi may be incompletely neutralizing in perception as well as production. [full text]
In Mandarin, Tone 3 syllables are sometimes pronounced as Tone 2. A lot of previous research has suggested that people listening to Mandarin speech cannot hear the difference between a "real" Tone 2 vs. a Tone 2 derived from what was originally Tone 3. But that research was all based on explicit judgments (e.g., playing a sound to someone and asking "which tone was that?"). In this study, we tested whether people can hear the difference between these tones using a less explicit / more unconscious method, visual world eye-tracking (in which people hear a sound while looking at a screen, and we measure where their eyes look). We found a very slight sensitivity to the difference: people looked a tiny bit more at the word corresponding to the tone they actually heard (e.g., when hearing a Tone 2 derived from a Tone 3, they looked a bit more at the word that matches that than at the word with a "real" Tone 2 in it). One of the reasons this isn't published yet is that we need to do a follow-up/replication experiment, and just have not managed to put together the resources to do it yet.
Stephen Politzer-Ahles, Wing Ki Ng, & Li Chong Shih (under revision). No significant loanword priming advantage in Cantonese-English bilinguals. Lingua Sinica. [full text]
Previous research had found that the translation priming effect (e.g., people who know both English and Spanish can respond to the word "CAT" faster if they have just seen its Spanish translation "gato", as opposed to some unrelated word) is bigger when the translations used are cognates or loanwords, which are pronounced similarly across the two languages (like bicicleta-BICYCLE, as opposed to, e.g., araña-SPIDER). We tested this pattern in Cantonese-English bilinguals, which is different from previous research because Cantonese uses a logographic writing system (in which characters represent words or meaningful parts of words), whereas previous research had been on pairs of languages with sound-based writing systems (in which characters represent sounds). We found, unlike previous research, no difference in the size of the priming effect as between loanword pairs and non-loanword pairs. One of the reasons this isn't published yet is that it needs a follow-up/replication experiment, and I haven't managed to do one yet; Kate and Rachel, my co-authors and the ones who did the lion's share of the leg work for this project, graduated and started working, and I haven't yet managed to succor someone new into working on this.
Selected working papers and proceedings papers
Candice Chi-Hang Cheung, Stephen Politzer-Ahles, Heeju Hwang, Ronald Lung Yat Chui, Man Tak Leung, & Tempo Po Yi Tang. (2017). Comprehension of presuppositions in school-age Cantonese-speaking children with and without autism spectrum disorders. Clinical Linguistics and Phonetics, 31 (conference proceedings special issue), 557-572. [full text]
This paper is about language use in children with autism. My approach to this sort of research has changed a lot in the time since this research was done. First of all, this paper takes a deficit sort of approach to talking about autism, which I now know is not correct or helpful. It also was done without collaboration with members of the community. I don't do research this way anymore.
This paper compared children with and without autism in terms of how they understand certain types meanings in language (specifically, those known as "presuppositions" to semanticists and pragmaticists). Previous research had suggested that children with autism don't understand presuppositions as often as children without autism do, but it wasn't clear whether this is specifically because they have autism or if it was an epiphenomenon of other differences between kids with and without autism. The present paper used statistical analysis that allowed us to compare kids with and without autism while controlling for some of the many other factors that are different between these groups, and showed that these two groups understand presuppositions differently even when we control for all those other factors.
Stephen Politzer-Ahles (2015). "Maybe" not all scalar implicatures are created equal. LSA Extended Abstracts. [full text]
The field of experimental pragmatics is largely built upon research examining how people interpret "some" as meaning "not all"; it's, like, the most disproportionately overstudied thing (not that I haven't contributed to that). Fortunately, in the past 15ish years there has been a real proliferation of excellent new research looking at pragmatic phenomena beyond just the <all, some> scale. Around the same time as the excellent research was starting to happen, this experiment also happened (nice implicature, right?). This was my attempt to break free from <all, some> world, in a very tiny and incremental way: by looking at <definitely, maybe>.
Specifically, when people say that something might happen, they implicate that it's not certain to happen. In this study I recorded people's brain activity while they saw scenarios where something was definitely going to happen but someone was saying it maybe will happen. For example, a pitcher with just a teeny tiny drop of soda in it, next to a small cup, and a caption saying "Will all the soda in this pitcher fit into the cup?"; and then a pragmatically wrong answer "maybe" pops up on the screen. (This is a pragmatically inappropriate answer because the soda will definitely fit in the cup.) This was intended to be a similar design as what I did in my 2013 experiment on some with Rob Fiorentino and Xiaoming Jiang. I found some slim evidence of a time window in which brain activity elicited by the pragmatically inappropriate word was different than that elicited by logically wrong words, but it was statistically pretty inconclusive. (Old versions: slides)
Stephen Politzer-Ahles (2012). Are intermediate levels of the scale used during online comprehension of scalar implicatures? Kansas Working Papers in Linguistics, 33, 1-15. [full text]
The field of experimental pragmatics is largely built upon research examining how people interpret "some" as meaning "not all"; it's, like, the most disproportionately overstudied thing (not that I haven't contributed to that). Fortunately, in the past 15ish years there has been a real proliferation of excellent new research looking at pragmatic phenomena beyond just the <all, some> scale. Around the same time as the excellent research was starting to happen, this experiment also happened (nice implicature, right?). This was my attempt to break free from <all, some> world, in a very tiny and incremental way: by looking at <most, some>.
Specifically, when people say some, they don't just often mean "not all"; they also often mean "not most" (i.e., less than half). For example, if you think most of the students this year are excellent, you wouldn't normally say "Some of the students this year are excellent". So in this li'l study, I tried to find experimental evidence that people's processing is driven by this implicature. I ended up not finding that—I didn't find evidence that people respond faster or more accurately in situations where some is referring to a less-than-half situation than when it is referring to a more-than-half situation.
I revisited this issue in a now file-drawered experiment using self-paced reading, and similarly found no difference. But later Jiayu Zhan tested something similar in Mandarin (in Zhan et al. 2017, which I discuss some more up above) and did find the difference we would have expected. So I really don't know what's going on any more. Science is hard, guys!
Stephen Politzer-Ahles & Jie Zhang (2012b). The role of phonological alternation in speech production: Evidence from Mandarin tone sandhi. Proceedings of Meetings on Acoustics, 18, 060001. [full text]
This is a slightly methodologically different twist on Politzer-Ahles & Zhang (in press), which I talked about a bit more above. Same phenomenon, same type of experiment, pretty much the same conclusions, just a slightly different set of conditions and comparisons.
Stephen Politzer-Ahles (2011). A Minimalist account of Uyghur genitives. Kansas Working Papers in Linguistics, 32, 106-119. [full text]
This is a lovely paper from my past life as a syntactician! I can barely even understand it anymore!
What I really remember about this work on Uyghur genitives (possessives) is less the syntax side, but more the semantics/pragmatics side that I thought about a little bit later (but never wrote up for anything bigger than a class project). Here's what's interesting about them. Most of the time in most languages, possessive structures like "Kasim's kids" pressuppose that the thing they are talking about exists: in other words, if I say "Kasim's kids such-and-suched" then I am presupposing that I believe Kasim has kids. So it would sound quite weird to say "Kasim's kids such-and-such" in a situation where I don't think such kids exist (i.e., where I don't think Kasim has kids). In Uyghur, on the other hand, "Kasim's kids don't exist" is precisely how you could say Kasim doesn't have kids. Uyghur doesn't really have any verb that you could translate to English "have". Rather, the way you say someone has or doesn't have something is to say that the thing exists or doesn't exist. So to say Kasim has kids or Kasim has a pencil, you literally say "Kasim's kids exist" or "At Kasim there exists a pencil". And on the flip side, to say Kasim doesn't have kids or Kasim doesn't have a pencil, you say "Kasim's kids don't exist" or "At Kasim there does not exist any pencil". I find the "Kasim's kids don't exist" sort of sentence an interesting challenge to traditional semantic theories of presupposition, which propose that definite descriptions like "Kasim's kids" automatically trigger presuppositions; it seems like they don't automatically trigger presuppositions in this case! (Although to be fair, it would be a bit like bullying or beating a dead horse for me to use this example to beat up on traditional semantic theories of presupposition, because those theories have already taken a pretty serious beating over the last 50 years.)
Dissertation
- Stephen Politzer-Ahles (2013). Psycholinguistic and neurolinguistic investigations of scalar implicature. Doctoral dissertation, University of Kansas.
Stephen Politzer-Ahles, Julie S., Chen, & I-Hsuan Chen (2024). Is self-paced listening sensitive to downstream consequences of focus? In Ivanova, O., Nandi, A, & Prasannanshu [eds.], Psycholinguistic Approaches to the Study of Linguistic Structures: Language in the Mind, Cambridge Scholars. [full text]
Inn an earlier study, I-Hsuan and I had shown that stress influences what people infer about sentences with "even". For example, "I didn't even see ONE cat" suggests something like "...let alone ten cats!", whereas "I didn't even see one CAT" suggests something like "...let alone one dog". In this study, we used a technique called self-paced listening to examine how quickly people understand words as they are in the process of listening to sentences. We had expected that people might be slow to understand continuations that don't fit the stress (e.g., "I didn't even see ONE cat, let alone one dog"). Unexpectedly, though, we found that people didn't slow down in this situation. The results suggested that the impact of stress on the interpretation of "even" sentences might not be enough to influence the predictions people are making in real-time as the sentence is unfolding—or, alternatively, that the self-paced listening method is simply not sensitive to this sort of thing.
Stephen Politzer-Ahles & Si Chen (2019). Significance. The SAGE Encyclopedia of Human Communication Sciences and Disorders. [full text]
This is a [very!] brief introduction to the concept of statistical significance and how it is used in language sciences.
Stephen Politzer-Ahles, Seán Roberts, Christine Cuskley, & Tessa Verhoef (2019). Errata for Roberts & Verhoef (2016). Journal of Language Evolution, 4, 140-141. [full text]
This is a minor correction to a very interesting and important paper by Roberts and colleagues. They had examined abstracts submitted to a conference on language evolution, and had found that (1) abstracts written by women in previous years had gotten lower evaluations, possibly because of gender bias; and (2) when they switched the abstract reviewing to a double-blind format, this bias went away. While this finding has major implications for the field, one [out of many] of their statistical analyses had an error. We worked together to put out this erratum correcting that statistical error.
Stephen Politzer-Ahles, Jeffrey J. Holliday, Teresa Girolamo, Maria Spychalska, & Kelly Harper Berkson (2016). Is linguistic injustice a myth? A response to Hyland (2016). Journal of Second Language Writing, 34, 3-8. [full text]
Probably the most impactful thing I've ever written—or at least the only one to kick off a major debate—and it's not even a proper paper, it's just a "letter"! Anyway, the context for this is that Ken Hyland published a paper (also in 2016) talking about the concept of "linguistic injustice", i.e., the idea that scholars whose first language is not English are unfairly disadvantaged in scholarly publishing. Hyland argued that this idea is a myth which lacks actual evidence. We wrote this paper in response, arguing that linguistic injustice may be real—more specifically, Hyland had raised some arguments claiming that people who learned English as a first language have it just as hard as people who didn't, and we argued that those arguments were incorrect. Responses to this response ensued, and ultimately spurred a whole cottage industry of papers attempting to resolve this debate which Hyland's paper and ours sort of created. (Or, to be more specific: inklings of that debate were already present in the literature, but I think this pair of papers is what turned the debate into a "thing".)
Manuscripts
Stephen Politzer-Ahles, Katrina Connell, Lei Pan, & Yu-Yin Hsu (under revision). Mandarin third tone sandhi may be incompletely neutralizing in perception as well as production. [full text]
In Mandarin, Tone 3 syllables are sometimes pronounced as Tone 2. A lot of previous research has suggested that people listening to Mandarin speech cannot hear the difference between a "real" Tone 2 vs. a Tone 2 derived from what was originally Tone 3. But that research was all based on explicit judgments (e.g., playing a sound to someone and asking "which tone was that?"). In this study, we tested whether people can hear the difference between these tones using a less explicit / more unconscious method, visual world eye-tracking (in which people hear a sound while looking at a screen, and we measure where their eyes look). We found a very slight sensitivity to the difference: people looked a tiny bit more at the word corresponding to the tone they actually heard (e.g., when hearing a Tone 2 derived from a Tone 3, they looked a bit more at the word that matches that than at the word with a "real" Tone 2 in it). One of the reasons this isn't published yet is that we need to do a follow-up/replication experiment, and just have not managed to put together the resources to do it yet.
Stephen Politzer-Ahles, Wing Ki Ng, & Li Chong Shih (under revision). No significant loanword priming advantage in Cantonese-English bilinguals. Lingua Sinica. [full text]
Previous research had found that the translation priming effect (e.g., people who know both English and Spanish can respond to the word "CAT" faster if they have just seen its Spanish translation "gato", as opposed to some unrelated word) is bigger when the translations used are cognates or loanwords, which are pronounced similarly across the two languages (like bicicleta-BICYCLE, as opposed to, e.g., araña-SPIDER). We tested this pattern in Cantonese-English bilinguals, which is different from previous research because Cantonese uses a logographic writing system (in which characters represent words or meaningful parts of words), whereas previous research had been on pairs of languages with sound-based writing systems (in which characters represent sounds). We found, unlike previous research, no difference in the size of the priming effect as between loanword pairs and non-loanword pairs. One of the reasons this isn't published yet is that it needs a follow-up/replication experiment, and I haven't managed to do one yet; Kate and Rachel, my co-authors and the ones who did the lion's share of the leg work for this project, graduated and started working, and I haven't yet managed to succor someone new into working on this.
Selected working papers and proceedings papers
Candice Chi-Hang Cheung, Stephen Politzer-Ahles, Heeju Hwang, Ronald Lung Yat Chui, Man Tak Leung, & Tempo Po Yi Tang. (2017). Comprehension of presuppositions in school-age Cantonese-speaking children with and without autism spectrum disorders. Clinical Linguistics and Phonetics, 31 (conference proceedings special issue), 557-572. [full text]
This paper is about language use in children with autism. My approach to this sort of research has changed a lot in the time since this research was done. First of all, this paper takes a deficit sort of approach to talking about autism, which I now know is not correct or helpful. It also was done without collaboration with members of the community. I don't do research this way anymore.
This paper compared children with and without autism in terms of how they understand certain types meanings in language (specifically, those known as "presuppositions" to semanticists and pragmaticists). Previous research had suggested that children with autism don't understand presuppositions as often as children without autism do, but it wasn't clear whether this is specifically because they have autism or if it was an epiphenomenon of other differences between kids with and without autism. The present paper used statistical analysis that allowed us to compare kids with and without autism while controlling for some of the many other factors that are different between these groups, and showed that these two groups understand presuppositions differently even when we control for all those other factors.
Stephen Politzer-Ahles (2015). "Maybe" not all scalar implicatures are created equal. LSA Extended Abstracts. [full text]
The field of experimental pragmatics is largely built upon research examining how people interpret "some" as meaning "not all"; it's, like, the most disproportionately overstudied thing (not that I haven't contributed to that). Fortunately, in the past 15ish years there has been a real proliferation of excellent new research looking at pragmatic phenomena beyond just the <all, some> scale. Around the same time as the excellent research was starting to happen, this experiment also happened (nice implicature, right?). This was my attempt to break free from <all, some> world, in a very tiny and incremental way: by looking at <definitely, maybe>.
Specifically, when people say that something might happen, they implicate that it's not certain to happen. In this study I recorded people's brain activity while they saw scenarios where something was definitely going to happen but someone was saying it maybe will happen. For example, a pitcher with just a teeny tiny drop of soda in it, next to a small cup, and a caption saying "Will all the soda in this pitcher fit into the cup?"; and then a pragmatically wrong answer "maybe" pops up on the screen. (This is a pragmatically inappropriate answer because the soda will definitely fit in the cup.) This was intended to be a similar design as what I did in my 2013 experiment on some with Rob Fiorentino and Xiaoming Jiang. I found some slim evidence of a time window in which brain activity elicited by the pragmatically inappropriate word was different than that elicited by logically wrong words, but it was statistically pretty inconclusive. (Old versions: slides)
Stephen Politzer-Ahles (2012). Are intermediate levels of the scale used during online comprehension of scalar implicatures? Kansas Working Papers in Linguistics, 33, 1-15. [full text]
The field of experimental pragmatics is largely built upon research examining how people interpret "some" as meaning "not all"; it's, like, the most disproportionately overstudied thing (not that I haven't contributed to that). Fortunately, in the past 15ish years there has been a real proliferation of excellent new research looking at pragmatic phenomena beyond just the <all, some> scale. Around the same time as the excellent research was starting to happen, this experiment also happened (nice implicature, right?). This was my attempt to break free from <all, some> world, in a very tiny and incremental way: by looking at <most, some>.
Specifically, when people say some, they don't just often mean "not all"; they also often mean "not most" (i.e., less than half). For example, if you think most of the students this year are excellent, you wouldn't normally say "Some of the students this year are excellent". So in this li'l study, I tried to find experimental evidence that people's processing is driven by this implicature. I ended up not finding that—I didn't find evidence that people respond faster or more accurately in situations where some is referring to a less-than-half situation than when it is referring to a more-than-half situation.
I revisited this issue in a now file-drawered experiment using self-paced reading, and similarly found no difference. But later Jiayu Zhan tested something similar in Mandarin (in Zhan et al. 2017, which I discuss some more up above) and did find the difference we would have expected. So I really don't know what's going on any more. Science is hard, guys!
Stephen Politzer-Ahles & Jie Zhang (2012b). The role of phonological alternation in speech production: Evidence from Mandarin tone sandhi. Proceedings of Meetings on Acoustics, 18, 060001. [full text]
This is a slightly methodologically different twist on Politzer-Ahles & Zhang (in press), which I talked about a bit more above. Same phenomenon, same type of experiment, pretty much the same conclusions, just a slightly different set of conditions and comparisons.
Stephen Politzer-Ahles (2011). A Minimalist account of Uyghur genitives. Kansas Working Papers in Linguistics, 32, 106-119. [full text]
This is a lovely paper from my past life as a syntactician! I can barely even understand it anymore!
What I really remember about this work on Uyghur genitives (possessives) is less the syntax side, but more the semantics/pragmatics side that I thought about a little bit later (but never wrote up for anything bigger than a class project). Here's what's interesting about them. Most of the time in most languages, possessive structures like "Kasim's kids" pressuppose that the thing they are talking about exists: in other words, if I say "Kasim's kids such-and-suched" then I am presupposing that I believe Kasim has kids. So it would sound quite weird to say "Kasim's kids such-and-such" in a situation where I don't think such kids exist (i.e., where I don't think Kasim has kids). In Uyghur, on the other hand, "Kasim's kids don't exist" is precisely how you could say Kasim doesn't have kids. Uyghur doesn't really have any verb that you could translate to English "have". Rather, the way you say someone has or doesn't have something is to say that the thing exists or doesn't exist. So to say Kasim has kids or Kasim has a pencil, you literally say "Kasim's kids exist" or "At Kasim there exists a pencil". And on the flip side, to say Kasim doesn't have kids or Kasim doesn't have a pencil, you say "Kasim's kids don't exist" or "At Kasim there does not exist any pencil". I find the "Kasim's kids don't exist" sort of sentence an interesting challenge to traditional semantic theories of presupposition, which propose that definite descriptions like "Kasim's kids" automatically trigger presuppositions; it seems like they don't automatically trigger presuppositions in this case! (Although to be fair, it would be a bit like bullying or beating a dead horse for me to use this example to beat up on traditional semantic theories of presupposition, because those theories have already taken a pretty serious beating over the last 50 years.)
Dissertation
- Stephen Politzer-Ahles (2013). Psycholinguistic and neurolinguistic investigations of scalar implicature. Doctoral dissertation, University of Kansas.
Stephen Politzer-Ahles, Katrina Connell, Lei Pan, & Yu-Yin Hsu (under revision). Mandarin third tone sandhi may be incompletely neutralizing in perception as well as production. [full text]
In Mandarin, Tone 3 syllables are sometimes pronounced as Tone 2. A lot of previous research has suggested that people listening to Mandarin speech cannot hear the difference between a "real" Tone 2 vs. a Tone 2 derived from what was originally Tone 3. But that research was all based on explicit judgments (e.g., playing a sound to someone and asking "which tone was that?"). In this study, we tested whether people can hear the difference between these tones using a less explicit / more unconscious method, visual world eye-tracking (in which people hear a sound while looking at a screen, and we measure where their eyes look). We found a very slight sensitivity to the difference: people looked a tiny bit more at the word corresponding to the tone they actually heard (e.g., when hearing a Tone 2 derived from a Tone 3, they looked a bit more at the word that matches that than at the word with a "real" Tone 2 in it). One of the reasons this isn't published yet is that we need to do a follow-up/replication experiment, and just have not managed to put together the resources to do it yet.
Stephen Politzer-Ahles, Wing Ki Ng, & Li Chong Shih (under revision). No significant loanword priming advantage in Cantonese-English bilinguals. Lingua Sinica. [full text]
Previous research had found that the translation priming effect (e.g., people who know both English and Spanish can respond to the word "CAT" faster if they have just seen its Spanish translation "gato", as opposed to some unrelated word) is bigger when the translations used are cognates or loanwords, which are pronounced similarly across the two languages (like bicicleta-BICYCLE, as opposed to, e.g., araña-SPIDER). We tested this pattern in Cantonese-English bilinguals, which is different from previous research because Cantonese uses a logographic writing system (in which characters represent words or meaningful parts of words), whereas previous research had been on pairs of languages with sound-based writing systems (in which characters represent sounds). We found, unlike previous research, no difference in the size of the priming effect as between loanword pairs and non-loanword pairs. One of the reasons this isn't published yet is that it needs a follow-up/replication experiment, and I haven't managed to do one yet; Kate and Rachel, my co-authors and the ones who did the lion's share of the leg work for this project, graduated and started working, and I haven't yet managed to succor someone new into working on this.
Candice Chi-Hang Cheung, Stephen Politzer-Ahles, Heeju Hwang, Ronald Lung Yat Chui, Man Tak Leung, & Tempo Po Yi Tang. (2017). Comprehension of presuppositions in school-age Cantonese-speaking children with and without autism spectrum disorders. Clinical Linguistics and Phonetics, 31 (conference proceedings special issue), 557-572. [full text]
This paper is about language use in children with autism. My approach to this sort of research has changed a lot in the time since this research was done. First of all, this paper takes a deficit sort of approach to talking about autism, which I now know is not correct or helpful. It also was done without collaboration with members of the community. I don't do research this way anymore.
This paper compared children with and without autism in terms of how they understand certain types meanings in language (specifically, those known as "presuppositions" to semanticists and pragmaticists). Previous research had suggested that children with autism don't understand presuppositions as often as children without autism do, but it wasn't clear whether this is specifically because they have autism or if it was an epiphenomenon of other differences between kids with and without autism. The present paper used statistical analysis that allowed us to compare kids with and without autism while controlling for some of the many other factors that are different between these groups, and showed that these two groups understand presuppositions differently even when we control for all those other factors.
Stephen Politzer-Ahles (2015). "Maybe" not all scalar implicatures are created equal. LSA Extended Abstracts. [full text]
The field of experimental pragmatics is largely built upon research examining how people interpret "some" as meaning "not all"; it's, like, the most disproportionately overstudied thing (not that I haven't contributed to that). Fortunately, in the past 15ish years there has been a real proliferation of excellent new research looking at pragmatic phenomena beyond just the <all, some> scale. Around the same time as the excellent research was starting to happen, this experiment also happened (nice implicature, right?). This was my attempt to break free from <all, some> world, in a very tiny and incremental way: by looking at <definitely, maybe>.
Specifically, when people say that something might happen, they implicate that it's not certain to happen. In this study I recorded people's brain activity while they saw scenarios where something was definitely going to happen but someone was saying it maybe will happen. For example, a pitcher with just a teeny tiny drop of soda in it, next to a small cup, and a caption saying "Will all the soda in this pitcher fit into the cup?"; and then a pragmatically wrong answer "maybe" pops up on the screen. (This is a pragmatically inappropriate answer because the soda will definitely fit in the cup.) This was intended to be a similar design as what I did in my 2013 experiment on some with Rob Fiorentino and Xiaoming Jiang. I found some slim evidence of a time window in which brain activity elicited by the pragmatically inappropriate word was different than that elicited by logically wrong words, but it was statistically pretty inconclusive. (Old versions: slides)
Stephen Politzer-Ahles (2012). Are intermediate levels of the scale used during online comprehension of scalar implicatures? Kansas Working Papers in Linguistics, 33, 1-15. [full text]
The field of experimental pragmatics is largely built upon research examining how people interpret "some" as meaning "not all"; it's, like, the most disproportionately overstudied thing (not that I haven't contributed to that). Fortunately, in the past 15ish years there has been a real proliferation of excellent new research looking at pragmatic phenomena beyond just the <all, some> scale. Around the same time as the excellent research was starting to happen, this experiment also happened (nice implicature, right?). This was my attempt to break free from <all, some> world, in a very tiny and incremental way: by looking at <most, some>.
Specifically, when people say some, they don't just often mean "not all"; they also often mean "not most" (i.e., less than half). For example, if you think most of the students this year are excellent, you wouldn't normally say "Some of the students this year are excellent". So in this li'l study, I tried to find experimental evidence that people's processing is driven by this implicature. I ended up not finding that—I didn't find evidence that people respond faster or more accurately in situations where some is referring to a less-than-half situation than when it is referring to a more-than-half situation.
I revisited this issue in a now file-drawered experiment using self-paced reading, and similarly found no difference. But later Jiayu Zhan tested something similar in Mandarin (in Zhan et al. 2017, which I discuss some more up above) and did find the difference we would have expected. So I really don't know what's going on any more. Science is hard, guys!
Stephen Politzer-Ahles & Jie Zhang (2012b). The role of phonological alternation in speech production: Evidence from Mandarin tone sandhi. Proceedings of Meetings on Acoustics, 18, 060001. [full text]
This is a slightly methodologically different twist on Politzer-Ahles & Zhang (in press), which I talked about a bit more above. Same phenomenon, same type of experiment, pretty much the same conclusions, just a slightly different set of conditions and comparisons.
Stephen Politzer-Ahles (2011). A Minimalist account of Uyghur genitives. Kansas Working Papers in Linguistics, 32, 106-119. [full text]
This is a lovely paper from my past life as a syntactician! I can barely even understand it anymore!
What I really remember about this work on Uyghur genitives (possessives) is less the syntax side, but more the semantics/pragmatics side that I thought about a little bit later (but never wrote up for anything bigger than a class project). Here's what's interesting about them. Most of the time in most languages, possessive structures like "Kasim's kids" pressuppose that the thing they are talking about exists: in other words, if I say "Kasim's kids such-and-suched" then I am presupposing that I believe Kasim has kids. So it would sound quite weird to say "Kasim's kids such-and-such" in a situation where I don't think such kids exist (i.e., where I don't think Kasim has kids). In Uyghur, on the other hand, "Kasim's kids don't exist" is precisely how you could say Kasim doesn't have kids. Uyghur doesn't really have any verb that you could translate to English "have". Rather, the way you say someone has or doesn't have something is to say that the thing exists or doesn't exist. So to say Kasim has kids or Kasim has a pencil, you literally say "Kasim's kids exist" or "At Kasim there exists a pencil". And on the flip side, to say Kasim doesn't have kids or Kasim doesn't have a pencil, you say "Kasim's kids don't exist" or "At Kasim there does not exist any pencil". I find the "Kasim's kids don't exist" sort of sentence an interesting challenge to traditional semantic theories of presupposition, which propose that definite descriptions like "Kasim's kids" automatically trigger presuppositions; it seems like they don't automatically trigger presuppositions in this case! (Although to be fair, it would be a bit like bullying or beating a dead horse for me to use this example to beat up on traditional semantic theories of presupposition, because those theories have already taken a pretty serious beating over the last 50 years.)