Language, Cognition, and LLMs

“Semantic gradients,” are a tool used by teachers to broaden and deepen students' understanding of related words by plotting them in relation to one another. They often begin with antonyms at each end of the continuum. Here are two basic examples:

Semantic gradient examples

Now imagine taking this approach and quantifying the relationships between words by adding numbers to the line graph. Now imagine adding another axis to this graph, so that words are plotted in a three dimensional space in their relationships. Then add another dimension, and another . . . heck, make it tens of thousands more dimensions, relating all the words available in your lexicon across a high dimensional space. . .

. . . and you may begin to envision one of the fundamental powers of Large Language Models (LLMs).

LLMs Are Powered by Language: Or, Words as a Vast Sea of Interrelated Statistical Arrays of Tokens

At root, the most powerful current forms of AI derive their capacities from decomposing human language into vast arrays of numbers based on their high dimensional statistical relationships and then predicting probabilistically what the next tokens are most likely to be.

There’s a kind of alchemical transformation that occurs that seems to maintain the meaning in the generative pronouncements of the frontier LLMs, all the more amazing because so far the very engineers who have designed the structure for these operations to occur do not fully understand what the models are doing to arrive at their seemingly oracular destinations.

In other words – the power of LLMs seemingly derives from the statistical power of language. There is something in the nature of language itself that seems to provide these computations of vast arrays of numbers with a lattice of our world, enabling LLMs to gain uncanny abilities from superpowered next word prediction. That LLMs have the generative powers they have—and that they have them without any consciousness or social interaction whatsoever—bolsters the argument that there is something about language itself, not just our brains, that is powerful.

An Aside on Power Law Scaling

One of the interesting features of human language is that it exhibits power scaling laws, as with other complex adaptive systems such as animals, cities, or businesses, as I recently examined in this post about Geoffrey West's fascinating book, Scale. The frequency of word usage, the length of sentences and texts, and the number of words in a language all follow power law distributions. This means that a small number of words are used frequently, while most words are used infrequently, and long sentences and texts are less common than shorter ones. As an interesting parallel, power law scaling is exhibited not only by language itself and through its generative manifestations in LLMs, but furthermore through the data—and the data centers and energy—required for training and using LLMs. Thus far, there is no apparent ceiling for LLM advancement in capability beyond that of the ceiling on the scalability of computer chips, data centers, and training data.

Innate vs. Developed Language: A Review of Our Path Traversed Thus Far

In our series “Innate vs. Developed”, we have explored the nature of language, challenging a widely held view that language is completely and innately hardwired in the human brain. Drawing upon “The Language Game” and “Rethinking Innateness” as sources of inspiration, we have considered the notion that language is an emergent, culturally-evolved phenomenon that mounts atop an “inner scaffold” that exists within our brains and further refines and specializes our neural networks through simple repeated social interactions over time.

We also considered how developing proficiency in reading and writing yet further extends and reinforces these channels across our brains – and how developing proficiency in multiple languages and literacies makes those networks even yet more robust.

We went further afield and investigated Cormac McCarthy’s ponderings on a seeming division between language and the ancient parts of our brain that exist before and beyond language. We also investigated the paradoxical nature of language, in that it can both enhance and potentially occlude our connection to our unconscious selves and to our natural world.

I promised at the end of the first post in this series that I would “maybe dig into the relation of cognition and language and literacy a little, and riff on the implications for AI, ANNs, and LLMs.” It’s taken me some time to let all of this ripen, especially given the rapid pace at which LLMs are developing. I think I’m finally starting to gain some perspective on LLMs that may allow me to indulge in a little riffing.

Sources for Spelunking

Before said indulgence in my next post, I’ll first outline a few sources I will draw upon at the outset so you can go off and explore on your own before being further biased by my own rambling.

First, if you are interested in learning more about that analogy of a high dimensional semantic gradient and gaining insight into how LLMs kinda work, I recommend three sources shared by Ethan Mollick (he himself is also an excellent source):

Second, if you want to explore some interesting aspects of language itself that are related to LLMs, check out the following:

An Anticipation of Where We May Go From Here

From these and other sources, including dabbling with Copilot and Claude and Gemini, I will ponder some of the following points on what computational neural networks may be able to tell us about language and what language may be able to tell us about LLMs – and, ultimately, perhaps, what this all may be able to tell us about teaching and learning:

#language #literacy #LLMs #computation #statistical #learning #ai