Language & Literacy

LLMs

In the typical Hollywood action movie, a hero acquires master-level skill in a specialized art, such as Kung Fu, in a few power ballad-backed minutes of a training montage. 

In real life, it may seem self-evident that gaining mastery takes years of intense, deliberate, and guided work. Yet the perennial optimism of students cramming the night before an exam tells us that the pursuit of a cognitive shortcut may be an enduring human impulse.

It is unsurprising, then, that students—and many adults—increasingly use the swiftly advancing tools of AI and Large Language Models (LLMs) as a shortcut around deeper, more effortful cognitive work.

The Irreducible Nature of Effort and Mastery

In a previous post in my series on LLMs, we briefly explored Stephen Wolfram's concept of “computational irreducibility”—the idea that there are certain processes cannot be shortcut and that you have to run the entire process to get the result.

One of the provocations of LLMs has been the revelation that human language (and maybe, animal language?) is far more computationally reducible than we assumed. As AI advances, it demonstrates that other tasks and abilities previously thought to reside exclusively within the human province may also be more computationally tractable than we believed.

Actual learning by any human being—which we could operationally define as a discrete body of knowledge and skills internalized to automaticity—inevitably requires practice and effort. A student must replicate essential learning steps to genuinely own such knowledge. There is no shortcut to mastery.

That said, the great enterprise of education is to break down complex and difficult concepts and skills until they are pitched at the Goldilocks level of difficulty to accelerate a learner towards mastery. This is the work, as I've explored elsewhere of scaffolding and differentiation.

Scaffolding and Differentiation
In a conversation on the Dwarkesh Podcast, Andrej Karpathy praises the “diagnostic acumen” of a human tutor who helped him learn Korean. She could “instantly... understand where I am as a student” and “probe... my world model” to serve content precisely at his “current sliver of capability.”

This is differentiation: aligning instruction to the individual's trajectory. It requires knowing exactly where a student stands and providing the necessary manner and time required for them to progress.

His tutor was then able to scaffold his learning, providing the content-aligned steps that lead to mastery, just as recruits learn the parachute landing fall in three weeks at the army jump school in Fort Benning, as described in Make It Stick.
Mastering the parachute landing fall at the army jump school.

“In my mind, education is the very difficult technical process of building ramps to knowledge. . . you have a tangle of understanding and you’re trying to lay it out in a way that creates a ramp where everything only depends on the thing before it.” — Andrej Karpathy

Scaffolding and Differentiation
Crucially, neither differentiation nor scaffolding is about making learning easier in the sense of removing effort. They are both about ensuring the learner encounters the “desirable difficulty” necessary to move towards mastery.

Karpathy views a high quality human tutor as a “high bar” to set for any AI tutor, but seems to feel that though the achievement of such a tutor will take longer than expected, it is ultimately a tractable (i.e. “computationally reducible”) task. He notes that “we have machines for heavy lifting, but people still go to the gym. Education will be the same.” Just as computers can play chess better than humans, yet humans still enjoy playing chess, he imagines a future where we learn for the intrinsic joy of it, even if AI can do the thinking for us.

The Algorithmic Turn and Frictionless Design

As Carl Hendrick explored recently on “The Learning Dispatch,” there's a possibility that teaching and learning themselves are more computationally tractable than we had assumed:

“If teaching becomes demonstrably algorithmic, if learning is shown to be a process that machines can master . . . what does it mean for human expertise when the thing we most value about ourselves... turns out to be computable after all?””

The problem lies in the design of most AI tools — they are designed for user friendly efficiency and task completion. Yet such efficiency counters the friction needed for learning. The Harvard study on AI tutoring showed promise precisely because the system was engineered to resist the natural tendency of LLMs to be maximally helpful. It was constrained to scaffold rather than solve.

As Hendrick notes, the fact is that human pedagogical excellence does not scale well, while AI improvements can scale exponentially. If teaching is indeed computationally tractable, then a breakthrough in AI tutoring could be an actuality. But even with better design for learning, unless both teachers and students wield such powerful tools effectively, they could lead to a paradoxical situation in which we have the perfect tools for learning, but no learners capable of using them.

Brain Rot & the Trap of the Novice

The danger of AI, then, is that rather than leading us to the promised land of more learning, it may instead impair our ability—both individually and generationally—to learn over time. Rather than going to a gym to work out “for fun” or for perceived social status, many may elect to opt out of the rat race altogether. The power of AI thus misdirected as an avoidance strategy, deflecting as much thought and effort and care from our lives as conceivably possible.

The term “brain rot” describes a measurable cognitive decline when people only passively process information.

A study on essay writing with and without ChatGPT found that “The ChatGPT users showed the lowest brain activity” and “The vast majority of ChatGPT users (83 percent) could not recall a single sentence” of the AI-generated text submitted in their name. By automating the difficult cognitive steps, the students lost ownership of the knowledge.

Such risk is highest for novices. A novice could be defined by a need to develop automatized internal knowledge in a domain. Whereas an expert can wield AI as a cognitive enhancement, extending their own expertise, a novice tends to use it as a cognitive shortcut, bypassing the process of learning needed to stand on their own judgment.

If we could plug a Matrix-style algorithm into our brains to master Kung Fu instantly, we all surely would. As consumers, we have been conditioned to expect the highest quality we can gain with minimal effort. So is it any surprise that our students are eager to take full advantage of a tool designed for the most frictionless task completion? Why think, when a free chatbot can produce output that plausibly looks like you thought about it?

Simas Kicinskas, in University education as we know it is over, details how “take-home assignments are dead . . .[because] AI now solves university assignments perfectly in minutes,” and that students use AI as a “crutch rather than as a tutor,” getting perfect answers without understanding because “AI makes thinking optional.”

But really, why should we place all the burden of betterness on the shoulders of our students, when they are defaulting to what is clearly human nature?

The Barbell Approach

Kicinskas suggests that despite the pervasive current use of AI to shortcut thinking, “Universities are uniquely positioned to become a cognitive gym, a place to train deep thinking in the age of AI.”

He proposes “a barbell strategy: pure fundamentals (no AI) on one end, full-on AI projects on the other, with no mushy middle. . . [because] you need cognitive friction to train your mental muscles.”

Barbell strategy

The NY Times article highlighted a similar dynamic in that MIT study cited earlier: students who initially used only their brains to write drafts recorded the highest brain activity once they were allowed to use ChatGPT later. Students who started with ChatGPT never reached parity with the former group.

“The students who had originally relied only on their brains recorded the highest brain activity once they were allowed to use ChatGPT. The students who had initially used ChatGPT, on the other hand, were never on a par with the former group when they were restricted to using their brains, Dr. Kosmyna said.”

In other words, AI can enhance our abilities, but only after we have already put in the cognitive effort and work for a first draft.

So Kicinskas is onto something with the barbell strategy. We start with real learning, the learning that requires desireable difficulty, friction, and effort that is pitched at the right level for where the learner is at that moment in order to gain greater fluency with that concept or skill.

Once some level of ability and knowledge has been acquired (determined by the success criteria set for that particular task, course, subject, and domain) adding AI can accelerate and enhance the exploration of that problem space.

Using AI for Cognitive Lift, Rather than Cognitive Crutch

We must therefore design and use AI in more alignment with the “barbell” strategy.

At the beginning of a student's journey, or at the beginning of the development of our own individual products, we need to double down on the fundamentals. We must carve out that space for independent thought as well as for the analog and social interaction we require to gain new insights.. This is how we build the inner scaffold required for true expertise.

On the other side of the barbell, we can more enthusiastically embrace the capacity of AI to scale our ability for processing and communicating information. Once we have done the heavy lifting to clarify our thinking, we can use these tools to extend our reach and traverse vast landscapes of data.

The danger lies in that “mushy middle,” wherein we can all too easily follow the path of least resistance and allow others, including AI, do all our thinking for us by taking our attention away from our own goals. We must choose to think for ourselves not because we have to for survival, but because the friction of generating our own thought is what gives us our agency.

In a previous post, I explored how both language and learning is a movement from fuzziness to greater precision. It is possible that AI can greatly accelerate us in that journey, even as it is possible that it could greatly stymie our growth. The key is that we must subject our fuzzy, half formed intuitions first to greater resistance until they crystallize into more precise and communicable thought. If we bypass this struggle, we doom ourselves to perpetual fuzziness, unable to distinguish between AI automated slop and AI assisted insight. AI in Education infographic

Postscript: How I used AI for this Post

I use AI extensively in both my personal and professional life, and writing this post was no exception. I thought it might be helpful to illustrate some of the arguments I made above by detailing exactly how AI both posed a risk to my own agency and served to enhance it during the creation of this essay.

I began by collecting sources. I had come across several articles and a podcast that felt connected, sensing emerging themes that related to my previous posts on LLMs. I started sketching out some initial thoughts by hand, then uploaded my sources into Google's NotebookLM.

My first impulse was to pull on the thread of “computational irreducibility.” I knew there was an interesting tension in language between regularity and irregularity, so I used Deep Research to find more sources on the topic. This led me down a rabbit hole. By flooding my notebook with technical papers, the focus shifted to abstractions likeKolmogorov Complexity and NP-completeness—fascinating, but a distraction from the pedagogical argument I wanted to make. Realizing this, I had the AI summarize the concept of irreducibility and then deleted the technical source files to clear the noise.

I then used the notebook to explore patterns between my remaining sources. Key themes began coalescing. It was here that I made a classic mistake: I asked Google Gemini to draft a blog post based on those themes.

The result wasn't bad, but it wasn't mine. It completely missed the actual ideas that I was trying to unravel. I realized I was trying to shortcut the “irreducible” work of synthesis. To be fair to my intent at the time, however, I was really just interested in seeing whether the AI gave me any ideas I hadn't thought of, from a brainstorming stance. It wasn't very useful, however, so I discarded that approach, went back to my sources, and spent time thinking through the connections as I began drafting out something new.

I then began to draft the post in Joplin, which is what I now use for notes and blog drafts. I landed on the analogy of the Hollywood training montage as the way to begin, and I then pulled up Google Gemini in a split screen and began wordsmithing some of what I wanted to say. As I continued drafting, I used Gemini as an editorial support. It advised syntactical revisions and fixed a number of mispellings. I then used it to help me expand on a half-formed conclusion, as well as for cutting an extended naval-gazing section that was completely unnecessary.

Gemini tends to oversimplify in its recommendations, however, and I didn't take all of it's suggestions. I generated some images in NotebookLM based on all the sources, and also enhanced an image I had already made previously using Gemini. Finally, I did a few additional rounds of feedback between NotebookLM to reconsider my draft in relation to all the sources in my notebook, and then returned with that feedback in Gemini, and again went through my draft on a split screen. This additional process gave me some good suggestions for reorganization and enhancement of some of the content.

In the end, I almost misled myself by trying to automate the thinking process too early. It was only when I returned to the “gym”—drafting the core ideas myself—that the AI became useful. My experience writing this confirms the barbell strategy: draft what you want to say first to build the conceptual structure, then use AI to draw that out further, and to polish and enhance it. Be very cautious in the mushy middle.

#AI #LLMs #cognition #mastery #learning #education #tutoring #scaffolding #differentiation #barbell

Novice bunny and expert bunny on bikes When I typically begin a series of blogs to conduct nerdy inquiry into an abstract topic, I don't generally know where I'm going to end up. This series on LLMs was unusual in that in our first post, I outlined pretty much the exact topics I would go on to cover.

Here's where I had spitballed we might go:

  • The surprisingly inseparable interconnection between form and meaning
  • Blundering our way to computational precision through human communication; Or, the generative tension between regularity and randomness
  • The human (and now, machine) capacity for learning and using language may simply be a matter of scale
  • Is language as separable from thought (and, for that matter, from the world) as Cormac McCarthy said?
  • Implicit vs. explicit learning of language and literacy

Indeed, we then went on to explore each of these areas, in that order. Cool!

Read more...

NYC skyline

The Surprising Success of Large Language Models

“The success of large language models is the biggest surprise in my intellectual life. We learned that a lot of what we used to believe may be false and what I used to believe may be false. I used to really accept, to a large degree, the Chomskyan argument that the structures of language are too complex and not manifest in input so that you need to have innate machinery to learn them. You need to have a language module or language instinct, and it’s impossible to learn them simply by observing statistics in the environment.

If it’s true — and I think it is true — that the LLMs learn language through statistical analysis, this shows the Chomskyan view is wrong. This shows that, at least in theory, it’s possible to learn languages just by observing a billion tokens of language.”

–Paul Bloom, in an interview with Tyler Cowen

Read more...

Through the window In our series on AI, LLMs, and Language so far we’ve explored a few implications of LLMs relating to language and literacy development:

1) LLMs gain their uncanny powers from the statistical nature of language itself; 2) the meaning and experiences of our world are more deeply entwined with the form and structure of our language than we previously imagined; 3) LLMs offer an opportunity for further convergence between human and machine language; and 4) LLMs can potentially extend our cognitive abilities, enabling us to process far more information.

In a previous series, “Innate vs. Developed,” we’ve also challenged the idea that language is entirely hardwired in our brains, highlighting the tension between our more recent linguistic innovations and our more ancient brain structures. Cormac McCarthy, the famed author of some of the most powerful literature ever written, did some fascinating pontificating on this very issue.

In this post, we’ll continue picking away at these tensions, considering implications for AI and LLMs.

Read more...

The Octopus

“Over cultural evolution, the human species was so pressured for increased information capacity that they invented writing, a revolutionary leap forward in the development of our species that enables information capacity to be externalized, frees up internal processing and affords the development of more complex concepts. In other words, writing enabled humans to think more abstractly and logically by increasing information capacity. Today, humans have gone to even greater lengths: the Internet, computers and smartphones are testaments to the substantial pressure humans currently face — and probably faced in the past — to increase information capacity.”

Uniquely human intelligence arose from expanded information capacity, Jessica Cantlon & Steven Piantadosi

According to the perspectives of the authors in the paper quoted above, the capacity to process and manage vast quantities of information is a defining characteristic of human intelligence. This ability has been extended over time through the development of tools and techniques for externalizing information, such as via language, writing, and digital technology. These advancements have, in turn, allowed for increasingly abstract and complex thought and technologies.

The paper by Jessica Cantlon & Steven Piantadosi further proposes that the power of scaling is what lies behind human intelligence, and that this power of scaling is what further lies behind the remarkable results achieved by artificial neural networks in areas such as speech recognition, LLMs, and computer vision, and that these accomplishments have not been achieved through specialized representations and domain-specific development, but rather through the use of simpler techniques combined with increased computational power and data capacity.

Read more...

Natural digital

Regularity and irregularity. Decodable and tricky words. Learnability and surprisal. Predictability and randomness. Low entropy and high entropy.

Why do such tensions exist in human language? And in our AI tools developed to both create code and use natural language, how can the precision required for computation co-exist alongside this necessary complexity and messiness of our human language?

Read more...

“Semantic gradients,” are a tool used by teachers to broaden and deepen students' understanding of related words by plotting them in relation to one another. They often begin with antonyms at each end of the continuum. Here are two basic examples:

Semantic gradient examples

Now imagine taking this approach and quantifying the relationships between words by adding numbers to the line graph. Now imagine adding another axis to this graph, so that words are plotted in a three dimensional space in their relationships. Then add another dimension, and another . . . heck, make it tens of thousands more dimensions, relating all the words available in your lexicon across a high dimensional space. . .

. . . and you may begin to envision one of the fundamental powers of Large Language Models (LLMs).

Read more...