Language & Literacy


Natural digital

Regularity and irregularity. Decodable and tricky words. Learnability and surprisal. Predictability and randomness. Low entropy and high entropy.

Why do such tensions exist in human language? And in our AI tools developed to both create code and use natural language, how can the precision required for computation co-exist alongside this necessary complexity and messiness of our human language?


A statistical tapestry

”. . . the fact, as suggested by these findings, that semantic properties can be extracted from the formal manipulation of pure syntactic properties – that meaning can emerge from pure form – is undoubtedly one of the most stimulating ideas of our time.”

The Structure of Meaning in Language: Parallel Narratives in Linear Algebra and Category Theory

In our last post, we began exploring what Large Language Models (LLMs) and their uncanny abilities might tell us about language itself. I posited that the power of LLMs stems from the statistical nature of language.

But what is that statistical nature of language?


“Semantic gradients,” are a tool used by teachers to broaden and deepen students' understanding of related words by plotting them in relation to one another. They often begin with antonyms at each end of the continuum. Here are two basic examples:

Semantic gradient examples

Now imagine taking this approach and quantifying the relationships between words by adding numbers to the line graph. Now imagine adding another axis to this graph, so that words are plotted in a three dimensional space in their relationships. Then add another dimension, and another . . . heck, make it tens of thousands more dimensions, relating all the words available in your lexicon across a high dimensional space. . .

. . . and you may begin to envision one of the fundamental powers of Large Language Models (LLMs).


an organized classroom

Thanks to a podcast, Emerging Research in Educational Psychology, from professor Jeff Greene speaking with professor Erika Patall about a meta-analysis she was the lead author on, I learned about her paper that looked across a large number of studies to synthesize findings on the impact of classroom structure. I thought some of the high-level takeaways were well worth highlighting with you for our 4th research highlight in this series!

  • Citation: Patall, E. A., Yates, N., Lee, J., Chen, M., Bhat, B. H., Lee, K., Beretvas, S. N., Lin, S., Man Yang, S., Jacobson, N. G., Harris, E., & Hanson, D. J. (2024). A meta-analysis of teachers’ provision of structure in the classroom and students’ academic competence beliefs, engagement, and achievement. Educational Psychologist, 59(1), 42–70.

I think it’s no surprise to most educators that providing structure for kids, both in terms of the classroom environment and culture, and in terms of the design of instructional tasks, is critical to improving student learning. Part of this work is what we often term “classroom management,” but as the paper describes, the work is far more encompassing than that:

“In sum, creating structure is a multifaceted endeavor that involves a diverse assortment of teacher practices that can be used independently or in various combinations, as well as to various extents, and are all intended to organize and guide students’ school-relevant behavior in the process of learning in the classroom.”


I wrote a little while ago about Andrew Watson’s excellent book, “The Goldilocks Map.” I had an opportunity to attend a Learning and the Brain conference, which was what sparked Andrew’s own journey into brain research and learning to balance openness to new practice with a healthy dose of skepticism. In fact, Andrew was one of the keynote presenters at this conference – and I think his trenchant advice provided an important grounding for consideration of many of the other presentations.

I think there’s something in the nature of presenting to a general audience of educators that compels researchers to attempt to derive generalized implications of their research that can all too easily overstep the confines of their very specialized and specific domains.


Ontogenesis model

A recent paper caught my eye, Ontogenesis Model of the L2 Lexical Representation, and despite the immediate mind glazing effect of the word “ontogenesis,” I found the model well worth digging into and sharing here—and it may bear relevance to conversations on orthographic mapping.

How we learn words and all their phonological, morphological, orthographic, and semantic characteristics is a fascinating topic of research—most especially in the areas of written word recognition and in the learning of a new language.


In our last post in a series exploring the question, “What is (un)natural about learning to read and write?,” we looked at a paper from 1980 by Phillip Gough and Michael Hillinger, Learning to Read: An Unnatural Act, that provided a counter to Ken and Yetta Goodman’s argument that learning to read is natural, and provided us with a useful analogy: learning to read an alphabetic writing system is a form of cryptanalysis. Using this analogy, Gough and Hillinger drew out a fine-grained distinction between a code and a cipher that allowed them to make some precise observations about the difficulty of breaking the alphabetic cipher that have held up quite well over the years.


Sharing a fun paper to geek out on with my fellow language nerds, How children learn to communicate discriminatively by Michael Ramscar. In this paper, the author makes an argument that the contrasting forces of “discriminability” and “regularity” both serve to make language something we pick up pretty much naturally, even if we don’t know all the words in the language.

“…the existence of regular and irregular forms represents a trade-off that balances the opposing communicative pressures of discriminability and learnability in the evolution of communicative codes. From this perspective, the existence of frequent, well-discriminated irregular forms serves to make important communicative contrasts more discriminable and thus also more learnable. By contrast, because regularity entails less discriminability, learners’ representations of lexico-morphological neighbourhoods will tend to be more generic, which causes the forms of large numbers of less frequent items to be learned implicitly, compensating for the incompleteness of individual experience.”

The language of this paper is, as you can see, a bit opaque, so much of this went just a bit over my head, but I found the arguments fascinating given the debates that happen about how to teach the “irregular” spelling of so many words in the English language. Here, the author seems to suggest (I may be over-extrapolating as I often tend to do, but this is what got me geeking out on it) that in fact there is some level of constructive tension between language forms that show up again and again, and the language forms that are more infrequent, but thus inherently gain more of our attention. This relates to the theory of “statistical learning” with which we not only learn language, but also when we map a language to its written form.

The author later provides what I thought was a very concrete thought experiment that demonstrates this principle when he moved from morphology to names:

Imagine that 33% of males are called John, and only 1% Cornelius. In this scenario, learning someone is named Cornelius is more informative than learning their name is John (Corneliuses are better discriminated by their names than Johns). On the other hand, Johns will be easier to remember (guessing ‘John’ will be correct 1/3 of the time). Further, although the memory advantage of John relies on its frequency, the memorability of Cornelius also benefits from this: Cornelius is easier to remember if the system contains fewer names (also, as discussed earlier, if John is easier to say than Cornelius, this will reduce the average effort of name articulation).

What is also interesting about the author’s argument in this paper connecting information theory to language learning is that these assertions are empirically testable:

“Whether these mathematical points about sampling and learning actually apply to human learners are empirical questions. This account makes clear predictions in regard to them: if learners are exposed to sets of geometrically distributed forms, they should acquire models of their probabilities that better approximate one another than when learning from other distributions. Conversely, if learning from geometric distributions does not produce convergence, it would suggest the probabilistic account of communication described here (indeed, any probabilistic account of communication) is false.”

There’s a lot more in the paper to nerd out on–I found the section on verbs especially interesting, for example, given that it connects to some other tidbits on the power and challenge of verbs I’ve come across before:

I’ll leave the rest to you!

#verbs #regularity #irregularity #learning #language #statisticallearning #probability #discriminability #informationtheory #form

A drawing of a brain

As I began my great awakening to the relatively extensive body of research on reading, one of the claims of reading research proponents that I’ve picked up on and carried with me is the idea that reading is unnatural and our brains were not born to read. And this makes sense from an evolutionary perspective, given that oral language has been around for a very long time (though we don’t know, of course, exactly when it showed up), while writing systems only showed up roughly 5,000 years ago.


In the attempt to close the chapter on my Schools as Ecosystems blog and move into more thinking and writing on language and literacy, I posted two very long posts, on the influence of acoustics and greenery on learning, respectively, which once were slated to be part of a book that I just couldn’t scrounge the time together to complete. One of the chapters-to-be was on the importance of air quality in learning — and damn, how timely it would have been if I could have pulled that all together pre-COVID-19?!

While I most likely won’t ever write that book, I’d still like to highlight the critical importance of air quality in schools and learning, which has become all the more apparent during a time of a respiratory virus, but which is important at all times. And since I don’t have the time to write it all up in full, I’ll post links to the threads that I had laying about in a document instead, and let you, dear reader, complete the thoughts:

The Health Impacts of Air Pollution

Roth and his team looked at students taking exams on different days – and also measured how much pollution was in the air on those given days. All other variables remained the same: The exams were taken by students of similar levels of education, in the same place, but over multiple days.

He found that the variation in average results were staggeringly different. The most polluted days correlated with the worst test scores. On days where the air quality was cleanest, students performed better.

To determine the long-term effects, Roth followed up to see what impact this had eight to 10 years later. Those who performed worst on the most polluted days were more likely to end up in a lower-ranked university and were also earning less, because the exam in question was so important for future education. —HOW AIR POLLUTION IS DOING MORE THAN KILLING US” BY MELISSA HOGENBOOM IN BBC FUTURE

The Impact of Indoor Air Quality on Learning

When the level of fresh air in the classrooms was increased, the students performed up to seven per cent better than when they were working on the tests in their usual indoor climates. The study also revealed that the students did not themselves notice that they were not quite as astute in the poorer climate. —“BAD AIR QUALITY MAKES CHILDREN PERFORM WORSE IN SCHOOLS” BY JONAS SALOMONSEN IN SCIENCENORDIC

Southern California’s air agency, the South Coast Air Quality Management District, earmarked settlements from polluting companies and other funds to cover the cost of such filtration at about 80 schools near freeways or other pollution sources. Nothing’s preventing other states from following the same model. “The technology is well established, the installation is straightforward and the maintenance is simple,” said district spokesman Sam Atwood, who doesn’t recall officials from other states getting in touch to learn from his agency’s experience. —“THE INVISIBLE HAZARD AFFLICTING THOUSANDS OF SCHOOLS” BY JAMIE SMITH HOPKINS FOR THE CENTER FOR PUBLIC INTEGRITY

The Relationship of Air Pollution to COVID-19

#ecosystems #schools #learning #airquality #pollution #environment #health