When I typically begin a series of blogs to conduct nerdy inquiry into an abstract topic, I don't generally know where I'm going to end up. This series on LLMs was unusual in that in our first post, I outlined pretty much the exact topics I would go on to cover.
Here's where I had spitballed we might go:
The surprisingly inseparable interconnection between form and meaning
Blundering our way to computational precision through human communication; Or, the generative tension between regularity and randomness
The human (and now, machine) capacity for learning and using language may simply be a matter of scale
Is language as separable from thought (and, for that matter, from the world) as Cormac McCarthy said?
Implicit vs. explicit learning of language and literacy
Indeed, we then went on to explore each of these areas, in that order. Cool!
Regularity and irregularity. Decodable and tricky words. Learnability and surprisal. Predictability and randomness. Low entropy and high entropy.
Why do such tensions exist in human language? And in our AI tools developed to both create code and use natural language, how can the precision required for computation co-exist alongside this necessary complexity and messiness of our human language?
“Semantic gradients,” are a tool used by teachers to broaden and deepen students' understanding of related words by plotting them in relation to one another. They often begin with antonyms at each end of the continuum. Here are two basic examples:
Now imagine taking this approach and quantifying the relationships between words by adding numbers to the line graph. Now imagine adding another axis to this graph, so that words are plotted in a three dimensional space in their relationships. Then add another dimension, and another . . . heck, make it tens of thousands more dimensions, relating all the words available in your lexicon across a high dimensional space. . .
. . . and you may begin to envision one of the fundamental powers of Large Language Models (LLMs).