informationtheory — Language & Literacy

Irregularity Enhances Learning (Maybe)

November 12, 2021

Sharing a fun paper to geek out on with my fellow language nerds, “How children learn to communicate discriminatively” by Michael Ramscar. In this paper, the author makes an argument that the contrasting forces of “discriminability” and “regularity” both serve to make language something we pick up pretty much naturally, even if we don’t know all the words in the language.

“…the existence of regular and irregular forms represents a trade-off that balances the opposing communicative pressures of discriminability and learnability in the evolution of communicative codes. From this perspective, the existence of frequent, well-discriminated irregular forms serves to make important communicative contrasts more discriminable and thus also more learnable. By contrast, because regularity entails less discriminability, learners’ representations of lexico-morphological neighbourhoods will tend to be more generic, which causes the forms of large numbers of less frequent items to be learned implicitly, compensating for the incompleteness of individual experience.”

The language of this paper is, as you can see, a bit opaque, so much of this went just a bit over my head, but I found the arguments fascinating given the debates that happen about how to teach the “irregular” spelling of so many words in the English language. Here, the author seems to suggest (I may be over-extrapolating as I often tend to do, but this is what got me geeking out on it) that in fact there is some level of constructive tension between language forms that show up again and again, and the language forms that are more infrequent, but thus inherently gain more of our attention. This relates to the theory of “statistical learning” with which we not only learn language, but also when we map a language to its written form.

The author later provides what I thought was a very concrete thought experiment that demonstrates this principle when he moved from morphology to names:

Imagine that 33% of males are called John, and only 1% Cornelius. In this scenario, learning someone is named Cornelius is more informative than learning their name is John (Corneliuses are better discriminated by their names than Johns). On the other hand, Johns will be easier to remember (guessing ‘John’ will be correct 1/3 of the time). Further, although the memory advantage of John relies on its frequency, the memorability of Cornelius also benefits from this: Cornelius is easier to remember if the system contains fewer names (also, as discussed earlier, if John is easier to say than Cornelius, this will reduce the average effort of name articulation).

What is also interesting about the author’s argument in this paper connecting information theory to language learning is that these assertions are empirically testable:

“Whether these mathematical points about sampling and learning actually apply to human learners are empirical questions. This account makes clear predictions in regard to them: if learners are exposed to sets of geometrically distributed forms, they should acquire models of their probabilities that better approximate one another than when learning from other distributions. Conversely, if learning from geometric distributions does not produce convergence, it would suggest the probabilistic account of communication described here (indeed, any probabilistic account of communication) is false.”

There’s a lot more in the paper to nerd out on–I found the section on verbs especially interesting, for example, given that it connects to some other tidbits on the power and challenge of verbs I’ve come across before:

I’ll leave the rest to you!

#verbs #regularity #irregularity #learning #language #statisticallearning #probability #discriminability #informationtheory #form