AI — Language & Literacy

AI, Mastery, and the Barbell of Cognitive Enhancement

Mon, 15 Dec 2025 04:00:35 +0000

In the typical Hollywood action movie, a hero acquires master-level skill in a specialized art, such as Kung Fu, in a few power ballad-backed minutes of a training montage.

In real life, it may seem self-evident that gaining mastery takes years of intense, deliberate, and guided work. Yet the perennial optimism of students cramming the night before an exam tells us that the pursuit of a cognitive shortcut may be an enduring human impulse.

It is unsurprising, then, that students—and many adults—increasingly use the swiftly advancing tools of AI and Large Language Models (LLMs) as a shortcut around deeper, more effortful cognitive work.

The Irreducible Nature of Effort and Mastery

In a previous post in my series on LLMs, we briefly explored Stephen Wolfram's concept of “computational irreducibility”—the idea that there are certain processes cannot be shortcut and that you have to run the entire process to get the result.

One of the provocations of LLMs has been the revelation that human language (and maybe, animal language?) is far more computationally reducible than we assumed. As AI advances, it demonstrates that other tasks and abilities previously thought to reside exclusively within the human province may also be more computationally tractable than we believed.

Actual learning by any human being—which we could operationally define as a discrete body of knowledge and skills internalized to automaticity—inevitably requires practice and effort. A student must replicate essential learning steps to genuinely own such knowledge. There is no shortcut to mastery.

That said, the great enterprise of education is to break down complex and difficult concepts and skills until they are pitched at the Goldilocks level of difficulty to accelerate a learner towards mastery. This is the work, as I've explored elsewhere of scaffolding and differentiation.

In a conversation on the Dwarkesh Podcast, Andrej Karpathy praises the “diagnostic acumen” of a human tutor who helped him learn Korean. She could “instantly... understand where I am as a student” and “probe... my world model” to serve content precisely at his “current sliver of capability.”

This is differentiation: aligning instruction to the individual's trajectory. It requires knowing exactly where a student stands and providing the necessary manner and time required for them to progress.

His tutor was then able to scaffold his learning, providing the content-aligned steps that lead to mastery, just as recruits learn the parachute landing fall in three weeks at the army jump school in Fort Benning, as described in Make It Stick.

“In my mind, education is the very difficult technical process of building ramps to knowledge. . . you have a tangle of understanding and you’re trying to lay it out in a way that creates a ramp where everything only depends on the thing before it.” — Andrej Karpathy

Crucially, neither differentiation nor scaffolding is about making learning easier in the sense of removing effort. They are both about ensuring the learner encounters the “desirable difficulty” necessary to move towards mastery.

Karpathy views a high quality human tutor as a “high bar” to set for any AI tutor, but seems to feel that though the achievement of such a tutor will take longer than expected, it is ultimately a tractable (i.e. “computationally reducible”) task. He notes that “we have machines for heavy lifting, but people still go to the gym. Education will be the same.” Just as computers can play chess better than humans, yet humans still enjoy playing chess, he imagines a future where we learn for the intrinsic joy of it, even if AI can do the thinking for us.

The Algorithmic Turn and Frictionless Design

As Carl Hendrick explored recently on “The Learning Dispatch,” there's a possibility that teaching and learning themselves are more computationally tractable than we had assumed:

“If teaching becomes demonstrably algorithmic, if learning is shown to be a process that machines can master . . . what does it mean for human expertise when the thing we most value about ourselves... turns out to be computable after all?””

The problem lies in the design of most AI tools — they are designed for user friendly efficiency and task completion. Yet such efficiency counters the friction needed for learning. The Harvard study on AI tutoring showed promise precisely because the system was engineered to resist the natural tendency of LLMs to be maximally helpful. It was constrained to scaffold rather than solve.

As Hendrick notes, the fact is that human pedagogical excellence does not scale well, while AI improvements can scale exponentially. If teaching is indeed computationally tractable, then a breakthrough in AI tutoring could be an actuality. But even with better design for learning, unless both teachers and students wield such powerful tools effectively, they could lead to a paradoxical situation in which we have the perfect tools for learning, but no learners capable of using them.

Brain Rot & the Trap of the Novice

The danger of AI, then, is that rather than leading us to the promised land of more learning, it may instead impair our ability—both individually and generationally—to learn over time. Rather than going to a gym to work out “for fun” or for perceived social status, many may elect to opt out of the rat race altogether. The power of AI thus misdirected as an avoidance strategy, deflecting as much thought and effort and care from our lives as conceivably possible.

The term “brain rot” describes a measurable cognitive decline when people only passively process information.

A study on essay writing with and without ChatGPT found that “The ChatGPT users showed the lowest brain activity” and “The vast majority of ChatGPT users (83 percent) could not recall a single sentence” of the AI-generated text submitted in their name. By automating the difficult cognitive steps, the students lost ownership of the knowledge.

Such risk is highest for novices. A novice could be defined by a need to develop automatized internal knowledge in a domain. Whereas an expert can wield AI as a cognitive enhancement, extending their own expertise, a novice tends to use it as a cognitive shortcut, bypassing the process of learning needed to stand on their own judgment.

If we could plug a Matrix-style algorithm into our brains to master Kung Fu instantly, we all surely would. As consumers, we have been conditioned to expect the highest quality we can gain with minimal effort. So is it any surprise that our students are eager to take full advantage of a tool designed for the most frictionless task completion? Why think, when a free chatbot can produce output that plausibly looks like you thought about it?

Simas Kicinskas, in University education as we know it is over, details how “take-home assignments are dead . . .[because] AI now solves university assignments perfectly in minutes,” and that students use AI as a “crutch rather than as a tutor,” getting perfect answers without understanding because “AI makes thinking optional.”

But really, why should we place all the burden of betterness on the shoulders of our students, when they are defaulting to what is clearly human nature?

The Barbell Approach

Kicinskas suggests that despite the pervasive current use of AI to shortcut thinking, “Universities are uniquely positioned to become a cognitive gym, a place to train deep thinking in the age of AI.”

He proposes “a barbell strategy: pure fundamentals (no AI) on one end, full-on AI projects on the other, with no mushy middle. . . [because] you need cognitive friction to train your mental muscles.”

The NY Times article highlighted a similar dynamic in that MIT study cited earlier: students who initially used only their brains to write drafts recorded the highest brain activity once they were allowed to use ChatGPT later. Students who started with ChatGPT never reached parity with the former group.

“The students who had originally relied only on their brains recorded the highest brain activity once they were allowed to use ChatGPT. The students who had initially used ChatGPT, on the other hand, were never on a par with the former group when they were restricted to using their brains, Dr. Kosmyna said.”

In other words, AI can enhance our abilities, but only after we have already put in the cognitive effort and work for a first draft.

So Kicinskas is onto something with the barbell strategy. We start with real learning, the learning that requires desireable difficulty, friction, and effort that is pitched at the right level for where the learner is at that moment in order to gain greater fluency with that concept or skill.

Once some level of ability and knowledge has been acquired (determined by the success criteria set for that particular task, course, subject, and domain) adding AI can accelerate and enhance the exploration of that problem space.

Using AI for Cognitive Lift, Rather than Cognitive Crutch

We must therefore design and use AI in more alignment with the “barbell” strategy.

At the beginning of a student's journey, or at the beginning of the development of our own individual products, we need to double down on the fundamentals. We must carve out that space for independent thought as well as for the analog and social interaction we require to gain new insights.. This is how we build the inner scaffold required for true expertise.

On the other side of the barbell, we can more enthusiastically embrace the capacity of AI to scale our ability for processing and communicating information. Once we have done the heavy lifting to clarify our thinking, we can use these tools to extend our reach and traverse vast landscapes of data.

The danger lies in that “mushy middle,” wherein we can all too easily follow the path of least resistance and allow others, including AI, do all our thinking for us by taking our attention away from our own goals. We must choose to think for ourselves not because we have to for survival, but because the friction of generating our own thought is what gives us our agency.

In a previous post, I explored how both language and learning is a movement from fuzziness to greater precision. It is possible that AI can greatly accelerate us in that journey, even as it is possible that it could greatly stymie our growth. The key is that we must subject our fuzzy, half formed intuitions first to greater resistance until they crystallize into more precise and communicable thought. If we bypass this struggle, we doom ourselves to perpetual fuzziness, unable to distinguish between AI automated slop and AI assisted insight.

Postscript: How I used AI for this Post

I use AI extensively in both my personal and professional life, and writing this post was no exception. I thought it might be helpful to illustrate some of the arguments I made above by detailing exactly how AI both posed a risk to my own agency and served to enhance it during the creation of this essay.

I began by collecting sources. I had come across several articles and a podcast that felt connected, sensing emerging themes that related to my previous posts on LLMs. I started sketching out some initial thoughts by hand, then uploaded my sources into Google's NotebookLM.

My first impulse was to pull on the thread of “computational irreducibility.” I knew there was an interesting tension in language between regularity and irregularity, so I used Deep Research to find more sources on the topic. This led me down a rabbit hole. By flooding my notebook with technical papers, the focus shifted to abstractions likeKolmogorov Complexity and NP-completeness—fascinating, but a distraction from the pedagogical argument I wanted to make. Realizing this, I had the AI summarize the concept of irreducibility and then deleted the technical source files to clear the noise.

I then used the notebook to explore patterns between my remaining sources. Key themes began coalescing. It was here that I made a classic mistake: I asked Google Gemini to draft a blog post based on those themes.

The result wasn't bad, but it wasn't mine. It completely missed the actual ideas that I was trying to unravel. I realized I was trying to shortcut the “irreducible” work of synthesis. To be fair to my intent at the time, however, I was really just interested in seeing whether the AI gave me any ideas I hadn't thought of, from a brainstorming stance. It wasn't very useful, however, so I discarded that approach, went back to my sources, and spent time thinking through the connections as I began drafting out something new.

I then began to draft the post in Joplin, which is what I now use for notes and blog drafts. I landed on the analogy of the Hollywood training montage as the way to begin, and I then pulled up Google Gemini in a split screen and began wordsmithing some of what I wanted to say. As I continued drafting, I used Gemini as an editorial support. It advised syntactical revisions and fixed a number of mispellings. I then used it to help me expand on a half-formed conclusion, as well as for cutting an extended naval-gazing section that was completely unnecessary.

Gemini tends to oversimplify in its recommendations, however, and I didn't take all of it's suggestions. I generated some images in NotebookLM based on all the sources, and also enhanced an image I had already made previously using Gemini. Finally, I did a few additional rounds of feedback between NotebookLM to reconsider my draft in relation to all the sources in my notebook, and then returned with that feedback in Gemini, and again went through my draft on a split screen. This additional process gave me some good suggestions for reorganization and enhancement of some of the content.

In the end, I almost misled myself by trying to automate the thinking process too early. It was only when I returned to the “gym”—drafting the core ideas myself—that the AI became useful. My experience writing this confirms the barbell strategy: draft what you want to say first to build the conceptual structure, then use AI to draw that out further, and to polish and enhance it. Be very cautious in the mushy middle.

#AI #LLMs #cognition #mastery #learning #education #tutoring #scaffolding #differentiation #barbell

Reviewing Claims I’ve Made on LLMs

Mon, 07 Oct 2024 00:10:15 +0000

When I typically begin a series of blogs to conduct nerdy inquiry into an abstract topic, I don't generally know where I'm going to end up. This series on LLMs was unusual in that in our first post, I outlined pretty much the exact topics I would go on to cover.

Here's where I had spitballed we might go:

The surprisingly inseparable interconnection between form and meaning
Blundering our way to computational precision through human communication; Or, the generative tension between regularity and randomness
The human (and now, machine) capacity for learning and using language may simply be a matter of scale
Is language as separable from thought (and, for that matter, from the world) as Cormac McCarthy said?
Implicit vs. explicit learning of language and literacy

Indeed, we then went on to explore each of these areas, in that order. Cool!

Some Hypotheses from This Series

What theories have we raised through this exploration?

1) LLMs gain their uncanny powers from the statistical nature of language itself; 2) the meaning and experiences of our world are more deeply entwined with the form and structure of our language than we previously imagined; 3) LLMs may offer us an opportunity to further the convergence between human and machine language; 4) AI can potentially extend our cognitive abilities, enabling us to process and understand far more information; 5) Both human and machine learning progresses from fuzzy, imprecise representations to higher precision, and the greater the precision, the greater the effort and practice (or “compute”) that is required; and 6) LLMs challenge Chomsykan notions of innateness and suggest that implicit, statistical learning alone can lead to gaining the grammatical structure and meaning of a language.

While I’ve been mostly positive and excited about the potential of AI (aside from pointing out how it is accelerating the looming ecological catastrophe that seems to be our trajectory) I should probably pause here to acknowledge that there may be important counterpoints to many of these (perhaps somewhat starry-eyed) hypotheses.

Onto the Counterclaims

Let's take a more critical look at some of my claims:

1) I claim that language is fundamental to the generative powers of LLMs.

Yet Andrej Karpathy, who is no stranger to LLM development, tweeted:

It's a bit sad and confusing that LLMs (“Large Language Models”) have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something.

They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can “throw an LLM at it.”

I agree that LLMs are performing “statistical modeling of token streams,” and that “for any arbitrary vocabulary of some set of discrete tokens, you can ‘throw an LLM at it.’”

We now have multimodal LLMs that are modeling out of token streams of audio, visual, and text, and will no doubt have ones feeding from additional streams of sensory data as they are increasingly paired with cameras on humans, objects, and robots.

Yet I also think Karpathy undersells that when LLMs suddenly exploded into general public awareness and fascination, it was merely a “historical” fact that they were trained upon vast amounts of human generated text and were able to reproduce and generate human language. As we’ve explored in this and a previous series, there is something about human language itself uniquely adapted for our brain circuitry and the propagation of our culture within social interaction in our world. And being able to communicate with a powerful computational model through the medium of conversational human language has been a revolutionary advent. We are just in the beginning stages of grokking it.

As I tweeted in response to Karpathy, token streams may be applied to anything, but human language seems to be uniquely suited to the advancement of combined human and machine learning. Not only because we rely on it for communication – but furthermore due to the algebraic and statistical nature of our language.

Recent case in point: the viral attention currently on NotebookLM’s Audio Overview. Listening to a conversation, however artificial, resonates with us, because that's what's in our social nature. And, surprisingly, it does a fairly good job of surfacing information from across multiple multimodal sources (and soon, across languages) that we find interesting, relevant, and meaningful.

Speaking of NotebookLM Audio Overview. . . here’s one derived from all the blog posts (except this one) from this series, as well as the sources–outlined in post 1–that inspired them all: https://notebooklm.google.com/notebook/a4f35399-e288-4293-b2d2-0489e6b1f037/audio

4) I claim there is great potential for AI to extend our cognitive capabilities

Yet there is a strong case of an equal and commensurate danger that use of LLMs can reduce our cognitive capabilities.

Learning more formal content and skills, like what we learn in school or in a job, requires deliberate effort until we develop an unconscious fluency. If students learning new concepts and skills externally automate their practice of new learning (such as writing or math) to an LLM, then they will not–ironically–be able to develop the automatized internal knowledge and capacity they need to wield powerful tools like AI more effectively.

When “experts” use tools like AI, they know where the gaps are in the output and are able to use it strategically to enhance their own production and output. A few examples of this:

Simon Willison, a programmer who is also a great communicator, uses different LLMs to support his projects, and writes and speaks about how he does so. Here’s a podcast, for example, where he explains how he uses them.
Nicholas Carlini, a research scientist at Google DeepMind, similarly wrote about how he uses AI to support his work.
Cal Newport, who writes extensively about how to do “deep work” in a world of distractions, recently wrote in The New Yorker how he has found ChatGPT useful to his writing.

All the people above are highly skilled at what they do – so when they explore and then figure out how to use AI to support their work, they do so in a way that does not diminish their own hard-earned ability, but rather enhances and extends their capabilities.

On the other hand, for students–who are by definition novices in the skills and knowledge they are learning–an over-reliance on AI tools may limit their ability to develop skills such as literacy, critical thinking, problem-solving, and creativity.

Recent reports on AI in education, such as from Cognitive Resonance, Center for American Progress, and Bellwether, have rightfully raised this concern.

And all educators, whether K-12 or in higher ed, are seeing an increasing use of AI by students to complete homework assignments, so this danger of truncating the development of internal capacity is real.

I think the steps we can to take to address this are two-fold:

limit the use of digital technology for learners at the earliest stages of learning, whether learners are preK-3 or learners being introduced to a new concept
move practice of essential skills directly into the classroom as much as possible, while considering how AI could be used to extend, rather than diminish, any practice and feedback outside of the classroom

In a post on ethical use of AI, Jacob Kaplan-Moss argues that fully automated AI is unethical in the public sector due to its inherent biases and potential for unfairness in high-stakes situations. In contrast, the assistive use of AI can enhance human decision-making.

This assistive vs. automated use of AI may be a useful frame for thinking of how AI can be used most ethically and effectively in education. We want AI to be used to assist the learning process, rather than simply automating the solving of math problems or writing essays. This view aligns with Ethan Mollick’s idea of “co-intelligence” as well.

So far, I find the most powerful and interesting assistive applications for AI are more focused on educators (“the experts”), rather than on students (“the novices”). Teachers can leverage AI to support administrative tasks, analysis of student data, and consider additional enhancements of their instruction based on student data.

That said, I don’t think the assistive use cases of AI are only limited to “experts” in a domain. AI can also help to equip those without knowledge and expertise in a specific area with the language they need to navigate learning or real-world communications more effectively. And there are some really interesting use cases of AI for feedback on student thinking and writing, when structured with specific guidelines and criteria and with the teacher in the loop.

But in the context of classroom learning, such uses must be very strategically designed and cautiously incorporated. For example, see this explanation from professor Michael Brenner on how he has begun incorporating AI into his pedagogy. But note this example is from a graduate level math class, so again, that novice vs. expert dynamic is quite different from what we would need to consider at a preK-8 level. But even at that graduate level, you can see there is quite a bit of complexity the instructor needed to consider and think through to design his course to leverage LLMs so strategically.

There’s a lot more to unpack here on all sides of the equation. I’ll leave this one here for now, accepting non-closure, and I hope to dig further into these tensions and opportunities in both this space and in my professional work.

6) I claim that LLMs have shown that language can be learned without any innate programming or structure – therefore demonstrating the power of statistical, implicit learning

I’d moved into the “Chomsky is wrong” camp for a while now, but I happened to listen to an interview of Jean-Rémi King recently, a scientist at Meta AI, by Stephen Wilson on The Language Neuroscience podcast (did I tell you I’m a nerd?). Towards the end of the conversation, King warns against writing off Chomsky too readily, and that there is something intrinsic to the human brain in its readiness for language.

I uploaded the relevant portion of the transcript from the interview, and asked Claude AI for a concise summary of King's main claims, which it willingly obliged (while I’m sure it drew upon an unconscionable amount of energy):

King argues that human brains likely don't use the same “next word prediction” principle as large language models for language acquisition, primarily because humans are exposed to far less linguistic data than these models.

He contends that while language models have shown impressive capabilities, they are extremely inefficient compared to human language learning, suggesting that we're missing some fundamental principles of how humans acquire language so efficiently.

While I think I’ve tried to temper most of my pronouncements throughout this series, I think it’s important to acknowledge that the fact that LLMs can learn language from statistical associations of word tokens alone does not mean that is exactly how we humans must also learn language.

It is rather a proof of concept that language can be learned in this way (without any innate grammar or teaching of rules). But as King points out, this is via a scale of input that is ridiculously and exponentially larger than that of any child.

That said, there are other Artificial Neural Networks (ANNs), such as in the research of Gašper Beguš, that learn from raw speech in an unsupervised manner, more closely mimicking human language acquisition. His lab has found interesting similarities between these ANNs and the human brain in processing language sounds – a parallel to King’s own research, which has found that LLM models can generate brain-like representations when predicting words from context.

And there will continue to be research into tinier models trained off sparser, and potentially richer, data.

But as King points to, there’s just so much more we need to learn. And this is exactly where I find all of this the most exciting.

Where I may be most rightfully critiqued in my last post, and perhaps in other posts, may be in extrapolating from the theoretical demonstration of LLMs to implications for classrooms.

So let me state my position a bit more clearly in case there was any confusion that I am falling onto the side of the Goodmans or something. Children need consistency, stability, clarity, and coherency in their learning experiences, and teaching what is most important to know for a given subject directly and explicitly is critical. For children at the earliest stages of learning abstract skills and content, such as learning to read, explicit and well-structured teaching is essential. At the same time, however, we need to ensure that students have abundant structured opportunities to apply and practice what they are learning – and this is where ensuring they are spending more time reading, writing, and talking–connected to the content of what we are teaching–is essential.

If you have more critiques that I am missing in any of the above, please do share!

Egads, I think I may actually have ANOTHER post left in me after all of this. Who knew LLMs would be such an interesting topic?!

#language #literacy #AI #LLMs #cognition #research #computation #models

LLMs, Statistical Learning, and Explicit Teaching

Wed, 18 Sep 2024 01:51:31 +0000

The Surprising Success of Large Language Models

“The success of large language models is the biggest surprise in my intellectual life. We learned that a lot of what we used to believe may be false and what I used to believe may be false. I used to really accept, to a large degree, the Chomskyan argument that the structures of language are too complex and not manifest in input so that you need to have innate machinery to learn them. You need to have a language module or language instinct, and it’s impossible to learn them simply by observing statistics in the environment.

If it’s true — and I think it is true — that the LLMs learn language through statistical analysis, this shows the Chomskyan view is wrong. This shows that, at least in theory, it’s possible to learn languages just by observing a billion tokens of language.”

–Paul Bloom, in an interview with Tyler Cowen

Challenging the Hypothesis of Innateness

For decades, the Chomskyan view has dominated our understanding of language development. This view argues that language structures are too complex to be learned solely from environmental input and therefore must require some kind of innate linguistic machinery in the brain (a “universal grammar”).

Yet as the quote above from Paul Bloom makes explicit, what LLMs have demonstrated–as a proof of concept–is that grammatical structures for language does not need to be innate. That machines can learn language via statistical associations alone, rather than explicitly programmed grammatical rules.

We have explored in a previous series on this blog the idea that language may not be a completely innate property of our brains, but rather more of a cultural phenomenon. This parallels the insight–much more widely accepted now–that learning to read is not innate.

The success of LLMs in acquiring language-like abilities through mere statistical analysis of texts demonstrates that it's possible to learn languages via statistical associations alone.

The Power of Statistical Learning

This revelation–that LLMs can learn language via statistical associations alone, rather than through any explicitly programmed rules–challenges our traditional understanding of language development and points to the power of implicit statistical learning.

However, unlike human children, who can rapidly learn language from relatively sparse input, current frontier LLMs require astronomical amounts of data to be trained. Yet the fact that machines can learn in this way suggests that the structure of language itself lends itself to such implicit learning.

This insight extends beyond language development and into literacy. We have previously examined seminal papers by Philip Gough and co arguing that learning to read words is more akin to learning a cipher than breaking a code. Rather than learning explicit rules, as from a codebook, we internalize patterns of sounds, letters, and meanings in an algorithmic fashion.

There is a fascinating line of research focused on “statistical learning,” and while there remains much to be learned about this domain, there seems to be an interesting convergence between this research as it relates to reading and as it relates to LLMs.

Reading nerds are already well acquainted with Mark Seidenberg, as he is a steady presence in the public sphere of communication and debates about reading instruction. What may be somewhat less known about him is that his oeuvre of research has been into computational, connectionist models of reading that have demonstrated how learning to read is a process of statistical learning between sounds, spelling, and meaning. It’s not that he hides this, by the way, but rather that the community of educators that are deep into the “science of reading” stuff don’t seem to be as enticed by abstract stuff like computational models and statistical learning.

But the convergence between connectionist accounts of learning language and learning to read and the advent of LLMs are important to understand. Not just from a nerdy stance, which has been mine throughout all these posts, but rather because LLMs have–again, as a proof of concept–demonstrated that implicit learning of statistical associations are fundamental not only to language and to reading, but to our knowledge and experience of the world.

Connectionist Models: Bridging AI and Human Learning

In fact, Seidenberg himself has repeatedly attempted to communicate the understanding that implicit statistical learning is just as fundamental to learning to read as it is to learning language.

He stirred up some recent controversy on this topic when he suggested that the “SOR” movement has over-corrected in response to previous squishy balanced literacy approaches by focusing too hard on explicit instruction as the cure-all for everything. See his provocative presentation and writing on this topic here: https://seidenbergreading.net/2024/06/24/where-does-the-science-of-reading-go-from-here-2/

To summarize his argument, which dovetails with where we started with LLMs, learning to read can not all be taught explicitly, and there is an opportunity cost to an over-reliance on the explicit teaching of “rules” over providing more opportunity for actual reading and writing to build up the statistical associations needed to become fluent:

“The purpose of explicit learning is to scaffold implicit learning about print, sound, meaning. Explicit instruction is the tip of the iceberg. The larger part under the surface is learned implicitly instead of teaching the whole iceberg.”

—slides on “Where does the Science of Reading go from here?”

In other words – only provide enough explicit instruction as needed to successfully spend more time engaged in an increasing volume of reading, writing, and talking.

Balancing Explicit and Implicit Learning in Language and Reading Instruction

In a paper, “The Impact of Language Experience on Language and Reading,” Seidenberg and Maryellen MacDonald also point to the fact that learning to read is easier for children with more advanced spoken language skills, while those with less exposure (due to greater variability of linguistic input) face greater challenges. This is because children exposed to multiple dialects or languages are learning to navigate multiple language systems, each with its own set of statistical linguistic patterns.

For multilingual and multidialectal learners, it is therefore especially critical to find the right combination of statistical learning and explicit teaching. According to the paper, consistent and increased exposure to the language of instruction is important. This exposure should be complemented by explicit teaching of both oral and written language patterns. And by explicitly comparing and contrasting home languages and dialects with the language used at school–both orally and in writing–students can develop metalinguistic awareness and a deeper understanding of varying language structures. This approach, implemented strategically within a welcoming and supportive classroom, allows students to leverage their existing linguistic knowledge while acquiring new language skills.

Another way of thinking about this, as we’ve explored in another post, is the movement from fuzziness to precision. By seeing, hearing, speaking, and writing an increasing volume of language, students can rapidly begin to make statistical associations. However, especially in the initial stages of learning a new language or learning to read, more effort will be required to gain greater precision, and thus, more mistakes will be a part of the learning process, and thus more feedback is needed to course correct at the very beginning.

I’ve written elsewhere about the importance of striking a balance between close reading of shared grade-level texts that are worth reading, while ensuring that each and every student reads a steady volume of texts that are more accessible. I’ve also written here about the need for “daily textual feasts” to increase the volume of rich language, knowledge, and critical thinking, as per Dr. Alfred Tatum.

Rethinking Language and Literacy Instruction

In sum, the surprising and awesome ability of LLMs, derived from mere statistical associations, has challenged traditional assumptions about the innate nature of language and, potentially, the role of explicit and implicit instruction in language and literacy learning.

This underscores the need for a comprehensive approach to teaching of reading and language, in which explicit teaching is strategically counterbalanced alongside implicit learning opportunities.

#AI #learning #language #LLMs #reading #explicit #implicit

The Interplay of Language, Cognition, and LLMs: Where Fuzziness Meets Precision

Sun, 28 Jul 2024 14:00:33 +0000

In our series on AI, LLMs, and Language so far we’ve explored a few implications of LLMs relating to language and literacy development:

In a previous series, “Innate vs. Developed,” we’ve also challenged the idea that language is entirely hardwired in our brains, highlighting the tension between our more recent linguistic innovations and our more ancient brain structures. Cormac McCarthy, the famed author of some of the most powerful literature ever written, did some fascinating pontificating on this very issue.

In this post, we’ll continue picking away at these tensions, considering implications for AI and LLMs.

Fuzziness and Precision in Language Development and Use

To start us off, I want to ground our exploration in two concepts we’ve covered previously in “An Ontogenesis Model of Word Learning in a Second Language”:

Fuzziness: “inexact or ambiguous encoding of different components or dimensions of the lexical representation that can be caused by several linguistic, cognitive, and learning-induced factors. These factors include, among others, changes in neural plasticity, the complexity of mapping L2 semantic representations on the existing L1 semantic representations and of mapping L2 forms on the semantic representations, and problems with L2 phonological encoding”
Optimum: “the ultimate attainment of a representation (or its individual components), i.e., the highest level of its acquisition, when the representation is properly encoded and no longer fuzzy”

I think these concepts are useful not only for thinking of learning new words in a language, but also for how we interact with LLMs and the language they are trained upon.

From Fuzziness → Optimum

When we first learn a language, whether while in the womb, in school, or after moving to a new community, what we hear and understand is fuzzy. The first thing we attune to is the prosody of the language: its tones, volume, and duration. We can’t yet fully distinguish words and sentences within a stream of speech, nor syllables from phonemes, nor vowels from consonants. Let alone connect those sounds (or signs) to meaning and communicate with them to others.

Yet as we gain greater discernment across hearing, vision, movement, and speaking, our representations of a language becomes more flexible and more precise. As I’ve written about elsewhere, connecting speech directly to its form in writing can enhance language and reading and writing development simultaneously. Oral and written language – and reading and writing – can develop reciprocally. Developing one supports refining the other.

Why would that be, given we didn’t invent the technology of writing until far down the timescale of human evolution?

Precision in Language and Cognition

Maybe it’s because the written form of a language requires greater precision in the representation in our minds. When greater precision is required, it takes more time and effort, at least initially, to produce.

As an example, you may have heard of the term “receptive bilinguals.” These are individuals who can understand the gist of an everyday conversation in another language, but may struggle to speak or produce it fluently. This is because they may have had fairly significant exposure to the language, especially in childhood, but their mental representations remain “fuzzy” because they rarely produce the language either orally or in written form.

The more that we hear and read AND produce a word – and particularly when we produce it both orally and in writing – the more likely and quickly we are to reach optimum.

We see this process play out in real time with babies. They listen to our sounds and watch our faces, then begin to babble, mimicking us. They begin connecting those sounds to things and ideas. And then they begin to gain a more precise understanding and use of a word, from there stringing multiple words together into sentences, again starting haphazardly and working towards greater flexibility and precision.

Fuzziness, Precision, and Specialization in Language, Cognition, Computation, and Literacy

LLMs have demonstrated that there is far more knowledge, meaning, and comprehension of the world embedded within the statistical relationships of the words and phrases we use than we previously suspected.

As we’ve also explored, there are fuzzier and more precise terms and concepts in a language. The more abstract and “decontextualized” an event or idea (meaning that the event or idea is not readily available in the context of that environment or moment) the more precise, vivid, or specialized our language becomes in the effort to describe it. This can lead us all the way to the extreme of computational language, which is highly precise, much harder for humans to learn, and quite alien in comparison to the general fuzziness of our everyday language used to communicate about everyday things.

The reason read-alouds are so very powerful in the beginning of childhood (and arguably, through adolescence, perhaps even beyond) is because they provide children with exposure to and immersion in this more decontextualized type of language and more abstract and broad understandings of the world. This helps prepare them for when they later engage with written forms of language and increasingly discipline-specific forms of discourse.

As language learning develops towards greater precision, networks in the brain are forged and strengthened. One of the reasons why early childhood is so incredibly important to language and literacy and motor development is because the brain supercharges the neural connections it is forming in all directions. Dendrites spring up like fungus after a rain. But learning new things requires a bit more effort as we age because we work far more on pruning our existing connections for efficiency.

Yet no matter our age, developing these increasingly robust cross-brain connections, and then increasingly specializing and refining them for specific domains and uses, can increase our mental resilience.

We can see this process of specialization play out in real time with young children as they learn to read and write. As they gain greater precision with representations of language through spelling, writing, and volume of reading, their brains increasingly forge further connections between the architecture used with executive function, speech, vision, and motor control, while then specializing and refining them.

Developing language and literacy in multiple languages – to the point of optimum – even further connects, specializes, and refines those networks. And when one is bi- or multi-literate on disciplinary topics – with the specialized and precise language required for communicating flexibly about those topics – then those networks are yet further refined.

This is similar, arguably, as with the development of cognition. Cognition—a fancy way of saying “awareness, knowledge, and understanding”—includes the facets of executive function and memory that are also tapped into when developing language, yet are surprisingly separable from language in the brain, in terms of the processes identified through brain scans, at the same time.

I think a useful way to think of this distinction may be the difference between the unconsciousness or the lack of awareness we may have about something PRIOR to learning it, and the unconsciousness and lack of awareness we have AFTER learning it to optimum. When we have attained fluency with a skill or pushed our knowledge into long-term memory, we no longer need to apply much effort – nor thought – to drawing upon it. It is the degree of effort that is required in order to learn or use something that determines the level of cognition we need to initially draw upon. And while we can certainly expand our cognitive ability and other aspects of our learning potential, there are also hard upper limits – such as the bottlenecks of our working memory and our attention.

We overcome those bottlenecks by committing important information to long-term memory through regular use and communication, automatizing regularly used skills through practice, and leveraging the institutionalization of knowledge-based communities and the technologies of writing (texts) and digitization to process and communicate and further refine larger volumes of information.

The Limitations and Potential of LLMs

While human children rapidly develop language and literacy from comparably minimal amounts of input and interaction in their world, LLMs are trained on vast bodies of text, the majority in written form (thus far). Their training is developed to refine and make more precise their abilities to predict the concatenations of continued tokens and words from what we have fed them.

Similar to human brains, LLMs move from a fuzzy-to-precise spectrum as they refine the “weights” they assign to linguistic tokens across their many layers. Early or small models of LLMs, akin to our “receptive bilingual” example earlier, demonstrate some receptive capabilities, but their generated outputs are highly fuzzy, as they did not have sufficient neural layers, training, and feedback (i.e. sufficient input and production) to achieve something close to optimum in their generation of human-like language.

But to state the obvious, LLMs do not experience the world as we do. They have no bodies, no sensory input, no social interactions (unless you count the part of their training that requires humans to provide them with corrective feedback). As a reminder, the fact that they have the capabilities they do–derived merely from the accumulated statistical relationships of parts of words–is remarkable. They do not “think,” at least, not in the manner in which our own cognition functions, and they do not continuously build and further refine their knowledge–yet–from ongoing interactions and input from other AI and with us.

LLMs are like if we took away all the other parts of our brain—those more ancient parts that continue solving problems and help us steer our way home and keeps our hearts beating—and only left the parts dedicated to language. That they are able to do all they can from mere statistical relationships forged from language alone is–again–remarkable, but it also shows us their limitations.

To be frank, that the dialogue has been so singularly focused on the “intelligence” of LLMs, with the goal of forming “artificial general intelligence” (AGI) seems remarkably off base to me. What I am far more interested in is the potential of these models to teach us something about our own development of language and literacy–and thus, how we can better teach those abilities–and to extend our own cognitive abilities.

Enhancing Cognition with AI

Towards this end, I want to suggest some implications for education that takes us away from fears about AI making kids dumber or taking away jobs from teachers.

AI and LLMs can enhance our cognitive abilities by helping us to:

Process Large Amounts of Information to Gain Knowledge: AI and LLMs are getting better and better (seemingly every week) in sifting through vast amounts of information, such as databases, research, transcripts, and other documents, to help us summarize, answer questions, paraphrase, and understand the relevant knowledge contained in them. Furthermore, they are getting better and better at translating across multiple languages and in reading multiple modalities. You can feed an LLM an image with text in another language and it can read it.
Augment Our Own Thinking and Writing: LLMs work really well in helping us spitball ideas or redraft our own writing. The fear that they will stop kids from being taught to write is misplaced – the writing produced by LLMs is only as good as what they are given. Yes, they are great at boilerplate forms of writing! But that’s the exact kind of writing that we do want to automate and reduce our own time and thinking on. When it comes to deeper writing and thinking like this series and post, it ain’t writing it for me. But I do find it really helpful when I get stuck or when I want to get suggestions for revision.

In Sum

The effectiveness of our use of AI and LLMs hinges on the quality of our input.

As with previous tools like Google Search, the more precise and informed our prompts, the more powerful and accurate their responses.

Another way of framing this idea: LLMs can help us further widen or refine our own ideas and language. They are far less useful in just handing them to us. They mirror and leverage what we provide to them.

There is a lot of talk about the “hallucinations” of LLMs, but perhaps a better way to frame it is as “pixelation,” or grain size. There are larger and smaller grain sizes of pixels. The coarser the grain, the less clear it is. The finer the grainer, the sharper it becomes. The more vague and broad the grain size we feed them, the more BS they will spit. The more precise and narrow grain sizes we provide, the more accurate and useful their responses will be. They can then help us move into different grain sizes from there (either widen our lens, or narrow our lens).

This means that we need to keep teaching our kids stuff. The more knowledge they have, the more precise and flexible their ability to wield language, the better they can use powerful tools like AI.

We can help kids to use AI in this way, and we can create tech-free spaces in our schools where they need to put in the cognitive effort and time they need to build their fluency with language and literacy and read texts that build their knowledge. And then when we engage them with the tech, we teach them how to use it to extend, rather than diminish, their own potential.

There’s implications here for teachers too – in fact, I think the most exciting potential for AI is actually freeing teachers up to spend more time teaching, and less time marking up papers and analyzing data. But that’s for another post.

#AI #LLMs #cognition #language #literacy #learning #education Discuss...

Scaling Our Capacity for Processing Information

Thu, 04 Jul 2024 03:56:02 +0000

“Over cultural evolution, the human species was so pressured for increased information capacity that they invented writing, a revolutionary leap forward in the development of our species that enables information capacity to be externalized, frees up internal processing and affords the development of more complex concepts. In other words, writing enabled humans to think more abstractly and logically by increasing information capacity. Today, humans have gone to even greater lengths: the Internet, computers and smartphones are testaments to the substantial pressure humans currently face — and probably faced in the past — to increase information capacity.”

—Uniquely human intelligence arose from expanded information capacity, Jessica Cantlon & Steven Piantadosi

According to the perspectives of the authors in the paper quoted above, the capacity to process and manage vast quantities of information is a defining characteristic of human intelligence. This ability has been extended over time through the development of tools and techniques for externalizing information, such as via language, writing, and digital technology. These advancements have, in turn, allowed for increasingly abstract and complex thought and technologies.

The paper by Jessica Cantlon & Steven Piantadosi further proposes that the power of scaling is what lies behind human intelligence, and that this power of scaling is what further lies behind the remarkable results achieved by artificial neural networks in areas such as speech recognition, LLMs, and computer vision, and that these accomplishments have not been achieved through specialized representations and domain-specific development, but rather through the use of simpler techniques combined with increased computational power and data capacity.

I think the authors may be overselling scaling as the main factor behind intelligence, but scale most definitely plays a leading role alongside brain and neural network architecture and specialized data, and it most definitely plays a role in how human language is used and developed.

The Potential of Scale

“LLMs give us a very effective way of accessing information from other humans.”

–Alison Gopnik in an interview with Julien Crockett in the Los Angeles Review of Books

In our previous explorations of language, cognition, and Large Language Models (LLMs), the recurring theme of the power of scale has certainly emerged.

We've delved into the statistical nature of language, where the vast interconnectedness of word combinations and their contextual relationships drive LLMs' generative abilities. We've pondered the inherent imprecision of human language and the journey towards computational precision in LLMs. And throughout, the concept of scale has remained central – the scale of data, the scale of computation, and the scale of language itself.

It's intriguing to consider the possibility, as this paper suggests, that the capacity to process increasing amounts of information may have been a key factor in the development of human intelligence. This idea extends to how, as a species, we have continually sought ways to expand our ability to store and access information, from the invention of writing to the development of computers, the internet, and smartphones.

This suggests that the most exciting potential of artificial neural networks such as LLMs may lie not only in their ability to respond to and generate human language, but furthermore in their ability to help us to process and manage vast quantities of information, and thus further extend our cognitive capabilities. When framed in this manner, it shifts the debate from whether LLMs already demonstrate human intelligence and whether they will soon achieve superhuman intelligence, to whether LLMs will indeed equip us with superhuman abilities. And – as always with advancements in powerful technologies – the question is who among us will gain the most from those abilities and whether the new tools will further increase or diminish disparities between groups (i.e. “the future is already here — it's just not very evenly distributed”).

So we’ve explored a few implications of LLMs relating to language and literacy development so far, then: 1) LLMs gain the base for their uncanny powers from the statistical nature of language itself; 2) LLMs present us with an opportunity for further convergence between human and machine language; and 3) LLMs present us with an opportunity to further extend our cognitive abilities by allowing us to process far more information.

The Dark Side of Scale

All of this said, there is a dark side to scale, as Geoffrey West elucidates in his book, Scale (more on this on my other blog, Schools & Ecosystems), which is that as we consume far more energy and create far more waste beyond our biological needs and functions than any other creature on earth as we continue to scale our technologies. As West describes it, we humans are energy-guzzling behemoths, using thirty times more energy than nature intended for creatures our size. Our outsized energy footprint makes our 7.3 billion population act as if it were in excess of 200 billion people. And we are hitting the upper limits on ecological constraints of the earth as we do so.

Similarly, as LLMs extend our capabilities, they consume ever more power as they consume and produce ever more data. So at the very same time that our earth is rapidly accelerating towards critical thresholds of environmental change and wreaking havoc on insect, animal, soil, and plant life, we are rapidly accelerating our consumption of energy and production of waste.

It’s hard to see a clear end in sight to this. It’s possible that the greedy demands of continuing to scale AI model training and use ends up leading to rapid development of greener technologies and accelerated efficiency in digital computation and compression. It’s just as possible that in our short-sighted endeavors we put a half-life on human civilization via no longer containable war, famine, disaster, and disease.

Not to end this post on such a sour note, but it is important to bear a healthy skepticism about a new technology and its attendant powers, even as we seek to gain from it. And from what I see in the discourse, it seems to me that there has been a pretty healthy mix of boosterism and critique and excitement and paranoia about it all, so I’m enjoying the ride, nonetheless.

#cognition #language #AI #LLMs #technology #brains #scale

The Algebra of Language: Unveiling the Statistical Tapestry of Form and Meaning

Sat, 27 Apr 2024 16:44:50 +0000

”. . . the fact, as suggested by these findings, that semantic properties can be extracted from the formal manipulation of pure syntactic properties – that meaning can emerge from pure form – is undoubtedly one of the most stimulating ideas of our time.”

—The Structure of Meaning in Language: Parallel Narratives in Linear Algebra and Category Theory

In our last post, we began exploring what Large Language Models (LLMs) and their uncanny abilities might tell us about language itself. I posited that the power of LLMs stems from the statistical nature of language.

But what is that statistical nature of language? A couple of years ago, I happened to listen to a podcast conversation between physicist Sean Carroll and mathematician Tai-Danae Bradley that touched on this topic that I found quite fascinating. So it came back to my mind as I was pondering all of this. In the conversation, Bradley describes the algebraic nature of language due to the concatenation of words. She notes that the statistics and probabilities of word co-occurrences can serve as a proxy for grammar rules in modeling language, which is why LLMs can generate coherent text without any explicit grammar rules.

She also shares a theory called the Yoneda Lemma:

“The Yoneda Lemma says if you want to understand an object, a mathematical object, like a group or a space, or a set, the Yoneda Lemma says that all of the information about that object is contained in the totality of relationships that object has with all other objects in its environment.”

She then links that mathematical concept to linguistics:

“. . . there’s a linguist, John Firth, I think in a 1957 paper, he says, “You shall know a word by the company it keeps. . . So what’s the meaning of fire truck? Well, it’s kind of like all of the contexts in which the word fire truck appears in the English language. . . everything I need to know about this word, the meaning of the word fire truck, is contained in the network of ways that word fits into the language.”

Since this interview, frontier LLMs have demonstrated that there is quite a bit of meaning that can be derived from the context and co-occurrences in which words show up in a body of language.

In a more recent paper, Bradley and co-authors Gastaldi and Terilla make the statement that I began this post with, which I will re-post here again, as it’s worth pondering:

”. . . the fact, as suggested by these findings, that semantic properties can be extracted from the formal manipulation of pure syntactic properties – that meaning can emerge from pure form – is undoubtedly one of the most stimulating ideas of our time.” [bold added]

They go on to further state:

“Therefore, the surprising properties exhibited by embeddings are less the consequence of some magical attribute of neural models than the algebraic structure underlying linguistic data found in corpora of text.”

In other words: LLMs (a type of artificial neural network) derive their generative linguistic capabilities from the algebraic and statistical properties of the texts they are trained upon. And the fact that they can do so suggests that the form and structure of language is intimately intertwined with its meaning.

In previous post, I referred to a Bloom and Lahey model from 1978, which delineates three components of language, form, meaning, and use:

Over the past few decades of linguistic research and language teaching, there may have been trends in a focus on one of those components over the other — in the past teachers of English as a second language, for example, may have put a stronger emphasis on the teaching of grammar, while more recent TESOL teachers may put a stronger focus on meaning over form. A more current strand of linguistics research focuses on “usage-based” theories.

There is some parallel in the edu sphere related to reading, in that there have been varying emphases in the research and practice on code-based (form) vs. meaning-based skills (i.e. the Simple View of Reading), with a more recent shift back to code-based emphasis, now seemingly defined by a perpetual tug-of-war between the two.

Rarely made explicit in any of these shifts in focus has been the assumption that form and meaning can be completely disentangled. After all, a writing system is somewhat arbitrary as a pairing of spoken sounds to symbols. This is, according to a 1980 account by Gough and Hillinger, one of the reasons that learning to decode can be so very difficult–because there isn't meaning in those symbols in-and-of themselves. It is rather the abstraction of what they represent that we need to learn.

Yet what if form and meaning are much more closely interwoven than we may have assumed? What if, in fact, a large quantity of meaning can be derived merely from an accumulated volume of statistical associations of words in sentences?

That LLMs have the abilities they do, given that they have not acquired language in the way that humans have (via social and physical interaction in the world) and without cognition, would seem to suggest that that the “mere” form and structure of a language possesses far more information about our world than we would have assumed – and that meaning is deeply and fundamentally interwoven with form.

More to ponder!

Some additional interesting sources on these topics to further explore (thanks to Copilot for the suggestions):

#AI #language #learning #statistics #mathematics #cognition #machinelearning

Language, Cognition, and LLMs

Tue, 23 Apr 2024 14:48:41 +0000

“Semantic gradients,” are a tool used by teachers to broaden and deepen students' understanding of related words by plotting them in relation to one another. They often begin with antonyms at each end of the continuum. Here are two basic examples:

Now imagine taking this approach and quantifying the relationships between words by adding numbers to the line graph. Now imagine adding another axis to this graph, so that words are plotted in a three dimensional space in their relationships. Then add another dimension, and another . . . heck, make it tens of thousands more dimensions, relating all the words available in your lexicon across a high dimensional space. . .

. . . and you may begin to envision one of the fundamental powers of Large Language Models (LLMs).

LLMs Are Powered by Language: Or, Words as a Vast Sea of Interrelated Statistical Arrays of Tokens

At root, the most powerful current forms of AI derive their capacities from decomposing human language into vast arrays of numbers based on their high dimensional statistical relationships and then predicting probabilistically what the next tokens are most likely to be.

There’s a kind of alchemical transformation that occurs that seems to maintain the meaning in the generative pronouncements of the frontier LLMs, all the more amazing because so far the very engineers who have designed the structure for these operations to occur do not fully understand what the models are doing to arrive at their seemingly oracular destinations.

In other words – the power of LLMs seemingly derives from the statistical power of language. There is something in the nature of language itself that seems to provide these computations of vast arrays of numbers with a lattice of our world, enabling LLMs to gain uncanny abilities from superpowered next word prediction. That LLMs have the generative powers they have—and that they have them without any consciousness or social interaction whatsoever—bolsters the argument that there is something about language itself, not just our brains, that is powerful.

An Aside on Power Law Scaling

One of the interesting features of human language is that it exhibits power scaling laws, as with other complex adaptive systems such as animals, cities, or businesses, as I recently examined in this post about Geoffrey West's fascinating book, Scale. The frequency of word usage, the length of sentences and texts, and the number of words in a language all follow power law distributions. This means that a small number of words are used frequently, while most words are used infrequently, and long sentences and texts are less common than shorter ones. As an interesting parallel, power law scaling is exhibited not only by language itself and through its generative manifestations in LLMs, but furthermore through the data—and the data centers and energy—required for training and using LLMs. Thus far, there is no apparent ceiling for LLM advancement in capability beyond that of the ceiling on the scalability of computer chips, data centers, and training data.

Innate vs. Developed Language: A Review of Our Path Traversed Thus Far

In our series “Innate vs. Developed”, we have explored the nature of language, challenging a widely held view that language is completely and innately hardwired in the human brain. Drawing upon “The Language Game” and “Rethinking Innateness” as sources of inspiration, we have considered the notion that language is an emergent, culturally-evolved phenomenon that mounts atop an “inner scaffold” that exists within our brains and further refines and specializes our neural networks through simple repeated social interactions over time.

We also considered how developing proficiency in reading and writing yet further extends and reinforces these channels across our brains – and how developing proficiency in multiple languages and literacies makes those networks even yet more robust.

We went further afield and investigated Cormac McCarthy’s ponderings on a seeming division between language and the ancient parts of our brain that exist before and beyond language. We also investigated the paradoxical nature of language, in that it can both enhance and potentially occlude our connection to our unconscious selves and to our natural world.

I promised at the end of the first post in this series that I would “maybe dig into the relation of cognition and language and literacy a little, and riff on the implications for AI, ANNs, and LLMs.” It’s taken me some time to let all of this ripen, especially given the rapid pace at which LLMs are developing. I think I’m finally starting to gain some perspective on LLMs that may allow me to indulge in a little riffing.

Sources for Spelunking

Before said indulgence in my next post, I’ll first outline a few sources I will draw upon at the outset so you can go off and explore on your own before being further biased by my own rambling.

First, if you are interested in learning more about that analogy of a high dimensional semantic gradient and gaining insight into how LLMs kinda work, I recommend three sources shared by Ethan Mollick (he himself is also an excellent source):

Second, if you want to explore some interesting aspects of language itself that are related to LLMs, check out the following:

An Anticipation of Where We May Go From Here

From these and other sources, including dabbling with Copilot and Claude and Gemini, I will ponder some of the following points on what computational neural networks may be able to tell us about language and what language may be able to tell us about LLMs – and, ultimately, perhaps, what this all may be able to tell us about teaching and learning:

The surprisingly inseparable interconnection between form and meaning
Blundering our way to computational precision through human communication; Or, the generative tension between regularity and randomness
The human (and now, machine) capacity for learning and using language may simply be a matter of scale
Is language as separable from thought (and, for that matter, from the world) as Cormac McCarthy said? . . . which actually ended up becoming more about fuzziness and precision in language, but hey!
Implicit vs. explicit learning of language and literacy

#language #literacy #LLMs #computation #statistical #learning #ai