In everyday language, you can sometimes anticipate a word so clearly that you can almost hear it before it has been said – for instance, “it’s raining cats and (…)”. In such cases, your mind seems to automatically fill in the blank. Intuitively, these mental predictions might seem important to help us understand language. But, in the language sciences, the role of prediction in language has always been controversial. To see why, we first need to understand a peculiar property of language itself.

Every language consists of two ingredients: words and rules. And it’s this combination that makes languages both highly predictable and highly unpredictable. A rather strange situation, which we could call the paradox of prediction.

Roughly, the paradox starts as follows. On the one hand languages are predictable because they follow regularities. Such regularities can be described with statistics. This means that you could, in principle, describe an entire language – including all its subtleties – as a giant pile of probabilities. Probabilities can be estimated from associations. Simply by counting, you can for instance conclude that “she walks” is a lot more common – and therefore more likely – than “she walk” or “she trombone”.

Motivated by this insight, scientists in the 1950’s tried to capture language in terms of statistics. The linguist Zelig Harris, for instance, outlined the connections between grammar and probability. Around the same time, the mathematician Claude Shannon estimated the information density of English, based on the predictability of words.

The linguist Noam Chomsky formulated an influential criticism of this research programme. Chomsky argued that, precisely due to the regular combination of words by rules, languages are fundamentally unpredictable. So unpredictable, even, that concepts like probability are effectively useless for understanding language.

Chomsky illustrated his point with the following (now-famous) pair of sentences:

  1. Colourless green ideas sleep furiously
  2. Furiously sleep ideas green colourless

It is fair to assume,” Chomsky wrote, “that neither sentence (1) nor (2) (nor indeed any part of these sentences) has ever occurred in an English discourse.” As such, he argued, any statistical model of language would deem both equally improbable. And yet, we immediately see that only sentence (1) is grammatical. And that only for sentence (1) we can derive its meaning – no matter how nonsensical it is. Therefore, Chomsky wrote that understanding language can never be a matter of mere probability.

“Language makes infinite use of finite means”

Now this point is not just limited to Chomsky’s rather bizarre example sentences. Even in everyday language, we constantly utter (and perfectly understand) sentences that no one in history has ever heard before: world-firsts, in other words – again and again. In the early 19th century von Humboldt had already concluded that “Language makes infinite use of finite means”. It’s this infinity that makes natural language so creative – and so deeply unpredictable.

Chomsky’s argument had a profound influence on the field of linguistics. For a long time, any statistical approach to language (and especially to formal grammar) was met with suspicion. In psycholinguistics, meanwhile, the idea took hold that the role of prediction in language comprehension would be marginal at most. After all, why would the brain try to predict something that’s fundamentally unpredictable?

Over the last few decades, this idea has been turned on its head. With the development of language technologies like automatic speech recognisers in the 1990’s, statistical language models turned out much more useful than you would expect based on Chomsky’s conjecture. A key insight was the importance of relative probability, especially for sentences or phrases which seem highly improbable (or unique) at first glance.

Consider Chomsky’s sentences. As a matter of statistical fact, after a noun (like ‘ideas’) a verb (like ‘sleep’) is much more likely than an adjective (like ‘green’). As such, although sentence (1) is unlikely, it is still many orders of magnitude more likely than sentence (2). Modern probabilistic models of language can discover such ‘deeper’ regularities purely based on natural language statistics – and without ever having to be taught explicitly about abstract syntactic categories like ‘nouns’ or ‘verbs’.

Technologies like Google translate or Apple’s Siri all work probabilistically and are therefore fundamentally predictive. This means, for instance, that a speech recogniser identifies a word not just using the sound-waves detected by the microphone, but also based on its own prediction of what word could come next, informed by prior context.More and more evidence suggests that our brain employs a similar ‘predictive processing’ strategy. Using measures of brain activity, for instance, psycholinguists have long demonstrated that the brain responds very differently (and much more vigorously) to a word when it is unexpected based on context. Psycholinguists have also shown that, during normal reading, the length of time we spend looking at each word seems closely linked to a word’s probability. It seems, in other words, as if the brain weighs each word against its prior expectations.

Today, the question is therefore not if but how the brain makes linguistic predictions. Is our brain a bit like Siri and does it probabilistically predict every upcoming word? Or do we only predict the most predictable words? And when we predict, do we predict entire words – including their expected sound and intonation – or do we only predict a much more abstract syntactic or semantic category – like ‘NOUN’ or ‘ANIMAL’?

By better understanding the role of prediction in language understanding, we ultimately hope to better understand how the mind turns linguistic stimuli – vibrations in the ear or patterns of light on the retina – into meaningful abstractions in the mind. This could be useful when helping people whose language processing has been gone awry due to brain injury, or when trying to equip computers with a better grasp of natural language.

It’s these kinds of questions that I’m pursuing in my PhD. I study linguistic prediction in the most general sense: from the prediction of individual letters when recognising a written word to the prediction of speech sounds and syntax when listening to a story. As soon as the first results are published you will be able to read about them on this blog.