Reviewing Claims I’ve Made on LLMs

October 7, 2024

Novice bunny and expert bunny on bikes When I typically begin a series of blogs to conduct nerdy inquiry into an abstract topic, I don't generally know where I'm going to end up. This series on LLMs was unusual in that in our first post, I outlined pretty much the exact topics I would go on to cover.

Here's where I had spitballed we might go:

The surprisingly inseparable interconnection between form and meaning
Blundering our way to computational precision through human communication; Or, the generative tension between regularity and randomness
The human (and now, machine) capacity for learning and using language may simply be a matter of scale
Is language as separable from thought (and, for that matter, from the world) as Cormac McCarthy said?
Implicit vs. explicit learning of language and literacy

Indeed, we then went on to explore each of these areas, in that order. Cool!

Some Hypotheses from This Series

What theories have we raised through this exploration?

1) LLMs gain their uncanny powers from the statistical nature of language itself; 2) the meaning and experiences of our world are more deeply entwined with the form and structure of our language than we previously imagined; 3) LLMs may offer us an opportunity to further the convergence between human and machine language; 4) AI can potentially extend our cognitive abilities, enabling us to process and understand far more information; 5) Both human and machine learning progresses from fuzzy, imprecise representations to higher precision, and the greater the precision, the greater the effort and practice (or “compute”) that is required; and 6) LLMs challenge Chomsykan notions of innateness and suggest that implicit, statistical learning alone can lead to gaining the grammatical structure and meaning of a language.

While I’ve been mostly positive and excited about the potential of AI (aside from pointing out how it is accelerating the looming ecological catastrophe that seems to be our trajectory) I should probably pause here to acknowledge that there may be important counterpoints to many of these (perhaps somewhat starry-eyed) hypotheses.

Onto the Counterclaims

Let's take a more critical look at some of my claims:

1) I claim that language is fundamental to the generative powers of LLMs.

Yet Andrej Karpathy, who is no stranger to LLM development, tweeted:

It's a bit sad and confusing that LLMs (“Large Language Models”) have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something.

They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can “throw an LLM at it.”

I agree that LLMs are performing “statistical modeling of token streams,” and that “for any arbitrary vocabulary of some set of discrete tokens, you can ‘throw an LLM at it.’”

We now have multimodal LLMs that are modeling out of token streams of audio, visual, and text, and will no doubt have ones feeding from additional streams of sensory data as they are increasingly paired with cameras on humans, objects, and robots.

Yet I also think Karpathy undersells that when LLMs suddenly exploded into general public awareness and fascination, it was merely a “historical” fact that they were trained upon vast amounts of human generated text and were able to reproduce and generate human language. As we’ve explored in this and a previous series, there is something about human language itself uniquely adapted for our brain circuitry and the propagation of our culture within social interaction in our world. And being able to communicate with a powerful computational model through the medium of conversational human language has been a revolutionary advent. We are just in the beginning stages of grokking it.

As I tweeted in response to Karpathy, token streams may be applied to anything, but human language seems to be uniquely suited to the advancement of combined human and machine learning. Not only because we rely on it for communication – but furthermore due to the algebraic and statistical nature of our language.

Recent case in point: the viral attention currently on NotebookLM’s Audio Overview. Listening to a conversation, however artificial, resonates with us, because that's what's in our social nature. And, surprisingly, it does a fairly good job of surfacing information from across multiple multimodal sources (and soon, across languages) that we find interesting, relevant, and meaningful.

Speaking of NotebookLM Audio Overview. . . here’s one derived from all the blog posts (except this one) from this series, as well as the sources–outlined in post 1–that inspired them all: https://notebooklm.google.com/notebook/a4f35399-e288-4293-b2d2-0489e6b1f037/audio

4) I claim there is great potential for AI to extend our cognitive capabilities

Yet there is a strong case of an equal and commensurate danger that use of LLMs can reduce our cognitive capabilities.

Learning more formal content and skills, like what we learn in school or in a job, requires deliberate effort until we develop an unconscious fluency. If students learning new concepts and skills externally automate their practice of new learning (such as writing or math) to an LLM, then they will not–ironically–be able to develop the automatized internal knowledge and capacity they need to wield powerful tools like AI more effectively.

When “experts” use tools like AI, they know where the gaps are in the output and are able to use it strategically to enhance their own production and output. A few examples of this:

Simon Willison, a programmer who is also a great communicator, uses different LLMs to support his projects, and writes and speaks about how he does so. Here’s a podcast, for example, where he explains how he uses them.
Nicholas Carlini, a research scientist at Google DeepMind, similarly wrote about how he uses AI to support his work.
Cal Newport, who writes extensively about how to do “deep work” in a world of distractions, recently wrote in The New Yorker how he has found ChatGPT useful to his writing.

All the people above are highly skilled at what they do – so when they explore and then figure out how to use AI to support their work, they do so in a way that does not diminish their own hard-earned ability, but rather enhances and extends their capabilities.

On the other hand, for students–who are by definition novices in the skills and knowledge they are learning–an over-reliance on AI tools may limit their ability to develop skills such as literacy, critical thinking, problem-solving, and creativity.

Recent reports on AI in education, such as from Cognitive Resonance, Center for American Progress, and Bellwether, have rightfully raised this concern.

And all educators, whether K-12 or in higher ed, are seeing an increasing use of AI by students to complete homework assignments, so this danger of truncating the development of internal capacity is real.

I think the steps we can to take to address this are two-fold:

limit the use of digital technology for learners at the earliest stages of learning, whether learners are preK-3 or learners being introduced to a new concept
move practice of essential skills directly into the classroom as much as possible, while considering how AI could be used to extend, rather than diminish, any practice and feedback outside of the classroom

In a post on ethical use of AI, Jacob Kaplan-Moss argues that fully automated AI is unethical in the public sector due to its inherent biases and potential for unfairness in high-stakes situations. In contrast, the assistive use of AI can enhance human decision-making.

This assistive vs. automated use of AI may be a useful frame for thinking of how AI can be used most ethically and effectively in education. We want AI to be used to assist the learning process, rather than simply automating the solving of math problems or writing essays. This view aligns with Ethan Mollick’s idea of “co-intelligence” as well.

So far, I find the most powerful and interesting assistive applications for AI are more focused on educators (“the experts”), rather than on students (“the novices”). Teachers can leverage AI to support administrative tasks, analysis of student data, and consider additional enhancements of their instruction based on student data.

That said, I don’t think the assistive use cases of AI are only limited to “experts” in a domain. AI can also help to equip those without knowledge and expertise in a specific area with the language they need to navigate learning or real-world communications more effectively. And there are some really interesting use cases of AI for feedback on student thinking and writing, when structured with specific guidelines and criteria and with the teacher in the loop.

But in the context of classroom learning, such uses must be very strategically designed and cautiously incorporated. For example, see this explanation from professor Michael Brenner on how he has begun incorporating AI into his pedagogy. But note this example is from a graduate level math class, so again, that novice vs. expert dynamic is quite different from what we would need to consider at a preK-8 level. But even at that graduate level, you can see there is quite a bit of complexity the instructor needed to consider and think through to design his course to leverage LLMs so strategically.

There’s a lot more to unpack here on all sides of the equation. I’ll leave this one here for now, accepting non-closure, and I hope to dig further into these tensions and opportunities in both this space and in my professional work.

6) I claim that LLMs have shown that language can be learned without any innate programming or structure – therefore demonstrating the power of statistical, implicit learning

I’d moved into the “Chomsky is wrong” camp for a while now, but I happened to listen to an interview of Jean-Rémi King recently, a scientist at Meta AI, by Stephen Wilson on The Language Neuroscience podcast (did I tell you I’m a nerd?). Towards the end of the conversation, King warns against writing off Chomsky too readily, and that there is something intrinsic to the human brain in its readiness for language.

I uploaded the relevant portion of the transcript from the interview, and asked Claude AI for a concise summary of King's main claims, which it willingly obliged (while I’m sure it drew upon an unconscionable amount of energy):

King argues that human brains likely don't use the same “next word prediction” principle as large language models for language acquisition, primarily because humans are exposed to far less linguistic data than these models.

He contends that while language models have shown impressive capabilities, they are extremely inefficient compared to human language learning, suggesting that we're missing some fundamental principles of how humans acquire language so efficiently.

While I think I’ve tried to temper most of my pronouncements throughout this series, I think it’s important to acknowledge that the fact that LLMs can learn language from statistical associations of word tokens alone does not mean that is exactly how we humans must also learn language.

It is rather a proof of concept that language can be learned in this way (without any innate grammar or teaching of rules). But as King points out, this is via a scale of input that is ridiculously and exponentially larger than that of any child.

That said, there are other Artificial Neural Networks (ANNs), such as in the research of Gašper Beguš, that learn from raw speech in an unsupervised manner, more closely mimicking human language acquisition. His lab has found interesting similarities between these ANNs and the human brain in processing language sounds – a parallel to King’s own research, which has found that LLM models can generate brain-like representations when predicting words from context.

And there will continue to be research into tinier models trained off sparser, and potentially richer, data.

But as King points to, there’s just so much more we need to learn. And this is exactly where I find all of this the most exciting.

Where I may be most rightfully critiqued in my last post, and perhaps in other posts, may be in extrapolating from the theoretical demonstration of LLMs to implications for classrooms.

So let me state my position a bit more clearly in case there was any confusion that I am falling onto the side of the Goodmans or something. Children need consistency, stability, clarity, and coherency in their learning experiences, and teaching what is most important to know for a given subject directly and explicitly is critical. For children at the earliest stages of learning abstract skills and content, such as learning to read, explicit and well-structured teaching is essential. At the same time, however, we need to ensure that students have abundant structured opportunities to apply and practice what they are learning – and this is where ensuring they are spending more time reading, writing, and talking–connected to the content of what we are teaching–is essential.

If you have more critiques that I am missing in any of the above, please do share!

Egads, I think I may actually have ANOTHER post left in me after all of this. Who knew LLMs would be such an interesting topic?!

#language #literacy #AI #LLMs #cognition #research #computation #models