learning — Language & Literacy

AI, Mastery, and the Barbell of Cognitive Enhancement

Mon, 15 Dec 2025 04:00:35 +0000

In the typical Hollywood action movie, a hero acquires master-level skill in a specialized art, such as Kung Fu, in a few power ballad-backed minutes of a training montage.

In real life, it may seem self-evident that gaining mastery takes years of intense, deliberate, and guided work. Yet the perennial optimism of students cramming the night before an exam tells us that the pursuit of a cognitive shortcut may be an enduring human impulse.

It is unsurprising, then, that students—and many adults—increasingly use the swiftly advancing tools of AI and Large Language Models (LLMs) as a shortcut around deeper, more effortful cognitive work.

The Irreducible Nature of Effort and Mastery

In a previous post in my series on LLMs, we briefly explored Stephen Wolfram's concept of “computational irreducibility”—the idea that there are certain processes cannot be shortcut and that you have to run the entire process to get the result.

One of the provocations of LLMs has been the revelation that human language (and maybe, animal language?) is far more computationally reducible than we assumed. As AI advances, it demonstrates that other tasks and abilities previously thought to reside exclusively within the human province may also be more computationally tractable than we believed.

Actual learning by any human being—which we could operationally define as a discrete body of knowledge and skills internalized to automaticity—inevitably requires practice and effort. A student must replicate essential learning steps to genuinely own such knowledge. There is no shortcut to mastery.

That said, the great enterprise of education is to break down complex and difficult concepts and skills until they are pitched at the Goldilocks level of difficulty to accelerate a learner towards mastery. This is the work, as I've explored elsewhere of scaffolding and differentiation.

In a conversation on the Dwarkesh Podcast, Andrej Karpathy praises the “diagnostic acumen” of a human tutor who helped him learn Korean. She could “instantly... understand where I am as a student” and “probe... my world model” to serve content precisely at his “current sliver of capability.”

This is differentiation: aligning instruction to the individual's trajectory. It requires knowing exactly where a student stands and providing the necessary manner and time required for them to progress.

His tutor was then able to scaffold his learning, providing the content-aligned steps that lead to mastery, just as recruits learn the parachute landing fall in three weeks at the army jump school in Fort Benning, as described in Make It Stick.

“In my mind, education is the very difficult technical process of building ramps to knowledge. . . you have a tangle of understanding and you’re trying to lay it out in a way that creates a ramp where everything only depends on the thing before it.” — Andrej Karpathy

Crucially, neither differentiation nor scaffolding is about making learning easier in the sense of removing effort. They are both about ensuring the learner encounters the “desirable difficulty” necessary to move towards mastery.

Karpathy views a high quality human tutor as a “high bar” to set for any AI tutor, but seems to feel that though the achievement of such a tutor will take longer than expected, it is ultimately a tractable (i.e. “computationally reducible”) task. He notes that “we have machines for heavy lifting, but people still go to the gym. Education will be the same.” Just as computers can play chess better than humans, yet humans still enjoy playing chess, he imagines a future where we learn for the intrinsic joy of it, even if AI can do the thinking for us.

The Algorithmic Turn and Frictionless Design

As Carl Hendrick explored recently on “The Learning Dispatch,” there's a possibility that teaching and learning themselves are more computationally tractable than we had assumed:

“If teaching becomes demonstrably algorithmic, if learning is shown to be a process that machines can master . . . what does it mean for human expertise when the thing we most value about ourselves... turns out to be computable after all?””

The problem lies in the design of most AI tools — they are designed for user friendly efficiency and task completion. Yet such efficiency counters the friction needed for learning. The Harvard study on AI tutoring showed promise precisely because the system was engineered to resist the natural tendency of LLMs to be maximally helpful. It was constrained to scaffold rather than solve.

As Hendrick notes, the fact is that human pedagogical excellence does not scale well, while AI improvements can scale exponentially. If teaching is indeed computationally tractable, then a breakthrough in AI tutoring could be an actuality. But even with better design for learning, unless both teachers and students wield such powerful tools effectively, they could lead to a paradoxical situation in which we have the perfect tools for learning, but no learners capable of using them.

Brain Rot & the Trap of the Novice

The danger of AI, then, is that rather than leading us to the promised land of more learning, it may instead impair our ability—both individually and generationally—to learn over time. Rather than going to a gym to work out “for fun” or for perceived social status, many may elect to opt out of the rat race altogether. The power of AI thus misdirected as an avoidance strategy, deflecting as much thought and effort and care from our lives as conceivably possible.

The term “brain rot” describes a measurable cognitive decline when people only passively process information.

A study on essay writing with and without ChatGPT found that “The ChatGPT users showed the lowest brain activity” and “The vast majority of ChatGPT users (83 percent) could not recall a single sentence” of the AI-generated text submitted in their name. By automating the difficult cognitive steps, the students lost ownership of the knowledge.

Such risk is highest for novices. A novice could be defined by a need to develop automatized internal knowledge in a domain. Whereas an expert can wield AI as a cognitive enhancement, extending their own expertise, a novice tends to use it as a cognitive shortcut, bypassing the process of learning needed to stand on their own judgment.

If we could plug a Matrix-style algorithm into our brains to master Kung Fu instantly, we all surely would. As consumers, we have been conditioned to expect the highest quality we can gain with minimal effort. So is it any surprise that our students are eager to take full advantage of a tool designed for the most frictionless task completion? Why think, when a free chatbot can produce output that plausibly looks like you thought about it?

Simas Kicinskas, in University education as we know it is over, details how “take-home assignments are dead . . .[because] AI now solves university assignments perfectly in minutes,” and that students use AI as a “crutch rather than as a tutor,” getting perfect answers without understanding because “AI makes thinking optional.”

But really, why should we place all the burden of betterness on the shoulders of our students, when they are defaulting to what is clearly human nature?

The Barbell Approach

Kicinskas suggests that despite the pervasive current use of AI to shortcut thinking, “Universities are uniquely positioned to become a cognitive gym, a place to train deep thinking in the age of AI.”

He proposes “a barbell strategy: pure fundamentals (no AI) on one end, full-on AI projects on the other, with no mushy middle. . . [because] you need cognitive friction to train your mental muscles.”

The NY Times article highlighted a similar dynamic in that MIT study cited earlier: students who initially used only their brains to write drafts recorded the highest brain activity once they were allowed to use ChatGPT later. Students who started with ChatGPT never reached parity with the former group.

“The students who had originally relied only on their brains recorded the highest brain activity once they were allowed to use ChatGPT. The students who had initially used ChatGPT, on the other hand, were never on a par with the former group when they were restricted to using their brains, Dr. Kosmyna said.”

In other words, AI can enhance our abilities, but only after we have already put in the cognitive effort and work for a first draft.

So Kicinskas is onto something with the barbell strategy. We start with real learning, the learning that requires desireable difficulty, friction, and effort that is pitched at the right level for where the learner is at that moment in order to gain greater fluency with that concept or skill.

Once some level of ability and knowledge has been acquired (determined by the success criteria set for that particular task, course, subject, and domain) adding AI can accelerate and enhance the exploration of that problem space.

Using AI for Cognitive Lift, Rather than Cognitive Crutch

We must therefore design and use AI in more alignment with the “barbell” strategy.

At the beginning of a student's journey, or at the beginning of the development of our own individual products, we need to double down on the fundamentals. We must carve out that space for independent thought as well as for the analog and social interaction we require to gain new insights.. This is how we build the inner scaffold required for true expertise.

On the other side of the barbell, we can more enthusiastically embrace the capacity of AI to scale our ability for processing and communicating information. Once we have done the heavy lifting to clarify our thinking, we can use these tools to extend our reach and traverse vast landscapes of data.

The danger lies in that “mushy middle,” wherein we can all too easily follow the path of least resistance and allow others, including AI, do all our thinking for us by taking our attention away from our own goals. We must choose to think for ourselves not because we have to for survival, but because the friction of generating our own thought is what gives us our agency.

In a previous post, I explored how both language and learning is a movement from fuzziness to greater precision. It is possible that AI can greatly accelerate us in that journey, even as it is possible that it could greatly stymie our growth. The key is that we must subject our fuzzy, half formed intuitions first to greater resistance until they crystallize into more precise and communicable thought. If we bypass this struggle, we doom ourselves to perpetual fuzziness, unable to distinguish between AI automated slop and AI assisted insight.

Postscript: How I used AI for this Post

I use AI extensively in both my personal and professional life, and writing this post was no exception. I thought it might be helpful to illustrate some of the arguments I made above by detailing exactly how AI both posed a risk to my own agency and served to enhance it during the creation of this essay.

I began by collecting sources. I had come across several articles and a podcast that felt connected, sensing emerging themes that related to my previous posts on LLMs. I started sketching out some initial thoughts by hand, then uploaded my sources into Google's NotebookLM.

My first impulse was to pull on the thread of “computational irreducibility.” I knew there was an interesting tension in language between regularity and irregularity, so I used Deep Research to find more sources on the topic. This led me down a rabbit hole. By flooding my notebook with technical papers, the focus shifted to abstractions likeKolmogorov Complexity and NP-completeness—fascinating, but a distraction from the pedagogical argument I wanted to make. Realizing this, I had the AI summarize the concept of irreducibility and then deleted the technical source files to clear the noise.

I then used the notebook to explore patterns between my remaining sources. Key themes began coalescing. It was here that I made a classic mistake: I asked Google Gemini to draft a blog post based on those themes.

The result wasn't bad, but it wasn't mine. It completely missed the actual ideas that I was trying to unravel. I realized I was trying to shortcut the “irreducible” work of synthesis. To be fair to my intent at the time, however, I was really just interested in seeing whether the AI gave me any ideas I hadn't thought of, from a brainstorming stance. It wasn't very useful, however, so I discarded that approach, went back to my sources, and spent time thinking through the connections as I began drafting out something new.

I then began to draft the post in Joplin, which is what I now use for notes and blog drafts. I landed on the analogy of the Hollywood training montage as the way to begin, and I then pulled up Google Gemini in a split screen and began wordsmithing some of what I wanted to say. As I continued drafting, I used Gemini as an editorial support. It advised syntactical revisions and fixed a number of mispellings. I then used it to help me expand on a half-formed conclusion, as well as for cutting an extended naval-gazing section that was completely unnecessary.

Gemini tends to oversimplify in its recommendations, however, and I didn't take all of it's suggestions. I generated some images in NotebookLM based on all the sources, and also enhanced an image I had already made previously using Gemini. Finally, I did a few additional rounds of feedback between NotebookLM to reconsider my draft in relation to all the sources in my notebook, and then returned with that feedback in Gemini, and again went through my draft on a split screen. This additional process gave me some good suggestions for reorganization and enhancement of some of the content.

In the end, I almost misled myself by trying to automate the thinking process too early. It was only when I returned to the “gym”—drafting the core ideas myself—that the AI became useful. My experience writing this confirms the barbell strategy: draft what you want to say first to build the conceptual structure, then use AI to draw that out further, and to polish and enhance it. Be very cautious in the mushy middle.

#AI #LLMs #cognition #mastery #learning #education #tutoring #scaffolding #differentiation #barbell

More Productive Than an Hour of Instruction?

Tue, 23 Sep 2025 12:02:09 +0000

The Surprising Cognitive Science of a Walk in the Park

The capacity for intense focus in our students is a finite resource—a cognitive fuel tank that can, and does, run low. We see the results in the classroom: irritability, impatience, and a fraying of impulse control. But what if one of the most powerful tools for refueling that tank wasn't a new pedagogical strategy, but something far more fundamental?

Five years ago, I wrote about the profound impact that greenery can have on health and learning in The Influence of Greenery on Learning. When I recently listened to Dr. Marc Berman, Director of the Environmental Neuroscience Lab at the University of Chicago, expand on this research on the Many Minds podcast, it prompted me to revisit that post. I was humbled to realize how many of his foundational studies I had completely overlooked. This new understanding reveals that nature is not just an amenity, but a necessity for cognition.

At the start of the episode, Berman unpacks one of the theories I had very briefly mentioned on why greenery might be so rejuvenative: Attention Restoration Theory. According to Berman, the theory posits that our capacity for intense focus, or directed attention, is a finite resource—a cognitive fuel tank that can, and does, run low. When it’s depleted, we can see the results at home and in the classroom: irritability, impatience, and a fraying of impulse control.

Natural environments, on the other hand, engage our involuntary attention—the effortless, bottom-up engagement of our senses captured by the gentle rustling of leaves or the movement of light through the clouds, and it allows our depleted resources for directed, intense focus to restore themselves. Berman terms this “soft fascination.” This is wholly distinct from the “harsh fascination” of a chaotic urban scene, with its blaring horns and noise, which consumes our mental resources.

The cognitive benefits are significant. One of the studies that kickstarted Berman’s research showed a 20% improvement in cognitive performance after a walk in nature. This boost occurred even when participants didn't particularly enjoy the walk, demonstrating a powerful, mood-independent effect.

This research has profound implications for educational equity. A follow-up study found that individuals with major depressive disorder (MDD) see even more significant cognitive gains from a nature walk. Conversely, a walk in an urban environment can actually worsen their cognitive performance. This suggests that the lack of green space in many under-resourced communities can be actively harmful to our most vulnerable students. Access to restorative natural environments should therefore not be seen as a luxury, but as a prerequisite for equitable learning.

But what is it about nature that is so restorative? Berman’s explication identifies specific “active ingredients.” It turns out my hunch about fractals was on the right track. His team analyzed what they call low-level visual features to quantify what makes a scene feel “natural.”

Key among these are:

Fractalness and Compressibility: Natural scenes have high “fractalness”—the repetition of similar patterns at different scales. This visual structure means they are also more “compressible,” like a JPEG file. Our brains find this informational efficiency less demanding to process, which frees up cognitive bandwidth.
Curved Edges: Natural environments have a high density of non-straight, curved edges, whereas our built environments are dominated more by rigid, straight lines. These curves are not only easier on the eyes, but as one study found, they are also correlated with a viewer's tendency to reflect on deeper topics like their life's journey and spirituality.

Berman furthermore points to additional sensory qualities of nature that go beyond the mere visual:

Auditory Stimuli: Brief exposure to natural sounds like birdsong, wind, or flowing water has been shown to improve cognitive performance when compared to urban noise.
Olfactory Stimuli: The air itself carries restorative properties. The scent of damp earth after rain or the airborne chemicals (terpenes) emitted by pine trees can impact our well-being through the olfactory pathway.

For restoration to occur, according to Attention Restoration Theory, an environment must provide a sense of “Being Away” from daily pressures, have enough richness to get lost in (“Extent”), and support a person’s intentions (“Compatibility”). When these elements combine, the mind can truly recharge.

Now pivot that to an educational setting. Imagine a school that embodies these principles. Instead of a long, featureless corridor (no “Extent”), picture a hallway that curves and uses natural materials with fractal patterns like wood grain. Imagine the school itself providing a space for “Being Away” from stressors, a place for creativity and inspiration. By incorporating more trees and natural design principles into our schools, we can improve learning.

Thankfully, we don’t need a week-long immersion in a forest; studies confirm that a restorative “dose” of nature can be as short as 20 minutes. In a world of education reform obsessed with short-term metrics, this research demands we look at a more fundamental input: the physical environment itself. It forces us to ask a provocative question: could 6 hours of instruction plus 2 hours in a park be more productive than 8 straight hours behind a brick wall? The science increasingly suggests that the answer is yes.

For a full, fascinating dive into the research, I highly recommend listening to the entire podcast episode, and then poking around into some of Berman’s studies!

#greenery #learning #attention #neuroscience #schools #ecosystems #wellbeing #AttentionRestorationTheory #environmentalneuroscience #equity

(Note: this was cross-posted on my other blog, Schools & Ecosystems)

LLMs, Statistical Learning, and Explicit Teaching

Wed, 18 Sep 2024 01:51:31 +0000

The Surprising Success of Large Language Models

“The success of large language models is the biggest surprise in my intellectual life. We learned that a lot of what we used to believe may be false and what I used to believe may be false. I used to really accept, to a large degree, the Chomskyan argument that the structures of language are too complex and not manifest in input so that you need to have innate machinery to learn them. You need to have a language module or language instinct, and it’s impossible to learn them simply by observing statistics in the environment.

If it’s true — and I think it is true — that the LLMs learn language through statistical analysis, this shows the Chomskyan view is wrong. This shows that, at least in theory, it’s possible to learn languages just by observing a billion tokens of language.”

–Paul Bloom, in an interview with Tyler Cowen

Challenging the Hypothesis of Innateness

For decades, the Chomskyan view has dominated our understanding of language development. This view argues that language structures are too complex to be learned solely from environmental input and therefore must require some kind of innate linguistic machinery in the brain (a “universal grammar”).

Yet as the quote above from Paul Bloom makes explicit, what LLMs have demonstrated–as a proof of concept–is that grammatical structures for language does not need to be innate. That machines can learn language via statistical associations alone, rather than explicitly programmed grammatical rules.

We have explored in a previous series on this blog the idea that language may not be a completely innate property of our brains, but rather more of a cultural phenomenon. This parallels the insight–much more widely accepted now–that learning to read is not innate.

The success of LLMs in acquiring language-like abilities through mere statistical analysis of texts demonstrates that it's possible to learn languages via statistical associations alone.

The Power of Statistical Learning

This revelation–that LLMs can learn language via statistical associations alone, rather than through any explicitly programmed rules–challenges our traditional understanding of language development and points to the power of implicit statistical learning.

However, unlike human children, who can rapidly learn language from relatively sparse input, current frontier LLMs require astronomical amounts of data to be trained. Yet the fact that machines can learn in this way suggests that the structure of language itself lends itself to such implicit learning.

This insight extends beyond language development and into literacy. We have previously examined seminal papers by Philip Gough and co arguing that learning to read words is more akin to learning a cipher than breaking a code. Rather than learning explicit rules, as from a codebook, we internalize patterns of sounds, letters, and meanings in an algorithmic fashion.

There is a fascinating line of research focused on “statistical learning,” and while there remains much to be learned about this domain, there seems to be an interesting convergence between this research as it relates to reading and as it relates to LLMs.

Reading nerds are already well acquainted with Mark Seidenberg, as he is a steady presence in the public sphere of communication and debates about reading instruction. What may be somewhat less known about him is that his oeuvre of research has been into computational, connectionist models of reading that have demonstrated how learning to read is a process of statistical learning between sounds, spelling, and meaning. It’s not that he hides this, by the way, but rather that the community of educators that are deep into the “science of reading” stuff don’t seem to be as enticed by abstract stuff like computational models and statistical learning.

But the convergence between connectionist accounts of learning language and learning to read and the advent of LLMs are important to understand. Not just from a nerdy stance, which has been mine throughout all these posts, but rather because LLMs have–again, as a proof of concept–demonstrated that implicit learning of statistical associations are fundamental not only to language and to reading, but to our knowledge and experience of the world.

Connectionist Models: Bridging AI and Human Learning

In fact, Seidenberg himself has repeatedly attempted to communicate the understanding that implicit statistical learning is just as fundamental to learning to read as it is to learning language.

He stirred up some recent controversy on this topic when he suggested that the “SOR” movement has over-corrected in response to previous squishy balanced literacy approaches by focusing too hard on explicit instruction as the cure-all for everything. See his provocative presentation and writing on this topic here: https://seidenbergreading.net/2024/06/24/where-does-the-science-of-reading-go-from-here-2/

To summarize his argument, which dovetails with where we started with LLMs, learning to read can not all be taught explicitly, and there is an opportunity cost to an over-reliance on the explicit teaching of “rules” over providing more opportunity for actual reading and writing to build up the statistical associations needed to become fluent:

“The purpose of explicit learning is to scaffold implicit learning about print, sound, meaning. Explicit instruction is the tip of the iceberg. The larger part under the surface is learned implicitly instead of teaching the whole iceberg.”

—slides on “Where does the Science of Reading go from here?”

In other words – only provide enough explicit instruction as needed to successfully spend more time engaged in an increasing volume of reading, writing, and talking.

Balancing Explicit and Implicit Learning in Language and Reading Instruction

In a paper, “The Impact of Language Experience on Language and Reading,” Seidenberg and Maryellen MacDonald also point to the fact that learning to read is easier for children with more advanced spoken language skills, while those with less exposure (due to greater variability of linguistic input) face greater challenges. This is because children exposed to multiple dialects or languages are learning to navigate multiple language systems, each with its own set of statistical linguistic patterns.

For multilingual and multidialectal learners, it is therefore especially critical to find the right combination of statistical learning and explicit teaching. According to the paper, consistent and increased exposure to the language of instruction is important. This exposure should be complemented by explicit teaching of both oral and written language patterns. And by explicitly comparing and contrasting home languages and dialects with the language used at school–both orally and in writing–students can develop metalinguistic awareness and a deeper understanding of varying language structures. This approach, implemented strategically within a welcoming and supportive classroom, allows students to leverage their existing linguistic knowledge while acquiring new language skills.

Another way of thinking about this, as we’ve explored in another post, is the movement from fuzziness to precision. By seeing, hearing, speaking, and writing an increasing volume of language, students can rapidly begin to make statistical associations. However, especially in the initial stages of learning a new language or learning to read, more effort will be required to gain greater precision, and thus, more mistakes will be a part of the learning process, and thus more feedback is needed to course correct at the very beginning.

I’ve written elsewhere about the importance of striking a balance between close reading of shared grade-level texts that are worth reading, while ensuring that each and every student reads a steady volume of texts that are more accessible. I’ve also written here about the need for “daily textual feasts” to increase the volume of rich language, knowledge, and critical thinking, as per Dr. Alfred Tatum.

Rethinking Language and Literacy Instruction

In sum, the surprising and awesome ability of LLMs, derived from mere statistical associations, has challenged traditional assumptions about the innate nature of language and, potentially, the role of explicit and implicit instruction in language and literacy learning.

This underscores the need for a comprehensive approach to teaching of reading and language, in which explicit teaching is strategically counterbalanced alongside implicit learning opportunities.

#AI #learning #language #LLMs #reading #explicit #implicit

The Interplay of Language, Cognition, and LLMs: Where Fuzziness Meets Precision

Sun, 28 Jul 2024 14:00:33 +0000

In our series on AI, LLMs, and Language so far we’ve explored a few implications of LLMs relating to language and literacy development:

1) LLMs gain their uncanny powers from the statistical nature of language itself; 2) the meaning and experiences of our world are more deeply entwined with the form and structure of our language than we previously imagined; 3) LLMs offer an opportunity for further convergence between human and machine language; and 4) LLMs can potentially extend our cognitive abilities, enabling us to process far more information.

In a previous series, “Innate vs. Developed,” we’ve also challenged the idea that language is entirely hardwired in our brains, highlighting the tension between our more recent linguistic innovations and our more ancient brain structures. Cormac McCarthy, the famed author of some of the most powerful literature ever written, did some fascinating pontificating on this very issue.

In this post, we’ll continue picking away at these tensions, considering implications for AI and LLMs.

Fuzziness and Precision in Language Development and Use

To start us off, I want to ground our exploration in two concepts we’ve covered previously in “An Ontogenesis Model of Word Learning in a Second Language”:

Fuzziness: “inexact or ambiguous encoding of different components or dimensions of the lexical representation that can be caused by several linguistic, cognitive, and learning-induced factors. These factors include, among others, changes in neural plasticity, the complexity of mapping L2 semantic representations on the existing L1 semantic representations and of mapping L2 forms on the semantic representations, and problems with L2 phonological encoding”
Optimum: “the ultimate attainment of a representation (or its individual components), i.e., the highest level of its acquisition, when the representation is properly encoded and no longer fuzzy”

I think these concepts are useful not only for thinking of learning new words in a language, but also for how we interact with LLMs and the language they are trained upon.

From Fuzziness → Optimum

When we first learn a language, whether while in the womb, in school, or after moving to a new community, what we hear and understand is fuzzy. The first thing we attune to is the prosody of the language: its tones, volume, and duration. We can’t yet fully distinguish words and sentences within a stream of speech, nor syllables from phonemes, nor vowels from consonants. Let alone connect those sounds (or signs) to meaning and communicate with them to others.

Yet as we gain greater discernment across hearing, vision, movement, and speaking, our representations of a language becomes more flexible and more precise. As I’ve written about elsewhere, connecting speech directly to its form in writing can enhance language and reading and writing development simultaneously. Oral and written language – and reading and writing – can develop reciprocally. Developing one supports refining the other.

Why would that be, given we didn’t invent the technology of writing until far down the timescale of human evolution?

Precision in Language and Cognition

Maybe it’s because the written form of a language requires greater precision in the representation in our minds. When greater precision is required, it takes more time and effort, at least initially, to produce.

As an example, you may have heard of the term “receptive bilinguals.” These are individuals who can understand the gist of an everyday conversation in another language, but may struggle to speak or produce it fluently. This is because they may have had fairly significant exposure to the language, especially in childhood, but their mental representations remain “fuzzy” because they rarely produce the language either orally or in written form.

The more that we hear and read AND produce a word – and particularly when we produce it both orally and in writing – the more likely and quickly we are to reach optimum.

We see this process play out in real time with babies. They listen to our sounds and watch our faces, then begin to babble, mimicking us. They begin connecting those sounds to things and ideas. And then they begin to gain a more precise understanding and use of a word, from there stringing multiple words together into sentences, again starting haphazardly and working towards greater flexibility and precision.

Fuzziness, Precision, and Specialization in Language, Cognition, Computation, and Literacy

LLMs have demonstrated that there is far more knowledge, meaning, and comprehension of the world embedded within the statistical relationships of the words and phrases we use than we previously suspected.

As we’ve also explored, there are fuzzier and more precise terms and concepts in a language. The more abstract and “decontextualized” an event or idea (meaning that the event or idea is not readily available in the context of that environment or moment) the more precise, vivid, or specialized our language becomes in the effort to describe it. This can lead us all the way to the extreme of computational language, which is highly precise, much harder for humans to learn, and quite alien in comparison to the general fuzziness of our everyday language used to communicate about everyday things.

The reason read-alouds are so very powerful in the beginning of childhood (and arguably, through adolescence, perhaps even beyond) is because they provide children with exposure to and immersion in this more decontextualized type of language and more abstract and broad understandings of the world. This helps prepare them for when they later engage with written forms of language and increasingly discipline-specific forms of discourse.

As language learning develops towards greater precision, networks in the brain are forged and strengthened. One of the reasons why early childhood is so incredibly important to language and literacy and motor development is because the brain supercharges the neural connections it is forming in all directions. Dendrites spring up like fungus after a rain. But learning new things requires a bit more effort as we age because we work far more on pruning our existing connections for efficiency.

Yet no matter our age, developing these increasingly robust cross-brain connections, and then increasingly specializing and refining them for specific domains and uses, can increase our mental resilience.

We can see this process of specialization play out in real time with young children as they learn to read and write. As they gain greater precision with representations of language through spelling, writing, and volume of reading, their brains increasingly forge further connections between the architecture used with executive function, speech, vision, and motor control, while then specializing and refining them.

Developing language and literacy in multiple languages – to the point of optimum – even further connects, specializes, and refines those networks. And when one is bi- or multi-literate on disciplinary topics – with the specialized and precise language required for communicating flexibly about those topics – then those networks are yet further refined.

This is similar, arguably, as with the development of cognition. Cognition—a fancy way of saying “awareness, knowledge, and understanding”—includes the facets of executive function and memory that are also tapped into when developing language, yet are surprisingly separable from language in the brain, in terms of the processes identified through brain scans, at the same time.

I think a useful way to think of this distinction may be the difference between the unconsciousness or the lack of awareness we may have about something PRIOR to learning it, and the unconsciousness and lack of awareness we have AFTER learning it to optimum. When we have attained fluency with a skill or pushed our knowledge into long-term memory, we no longer need to apply much effort – nor thought – to drawing upon it. It is the degree of effort that is required in order to learn or use something that determines the level of cognition we need to initially draw upon. And while we can certainly expand our cognitive ability and other aspects of our learning potential, there are also hard upper limits – such as the bottlenecks of our working memory and our attention.

We overcome those bottlenecks by committing important information to long-term memory through regular use and communication, automatizing regularly used skills through practice, and leveraging the institutionalization of knowledge-based communities and the technologies of writing (texts) and digitization to process and communicate and further refine larger volumes of information.

The Limitations and Potential of LLMs

While human children rapidly develop language and literacy from comparably minimal amounts of input and interaction in their world, LLMs are trained on vast bodies of text, the majority in written form (thus far). Their training is developed to refine and make more precise their abilities to predict the concatenations of continued tokens and words from what we have fed them.

Similar to human brains, LLMs move from a fuzzy-to-precise spectrum as they refine the “weights” they assign to linguistic tokens across their many layers. Early or small models of LLMs, akin to our “receptive bilingual” example earlier, demonstrate some receptive capabilities, but their generated outputs are highly fuzzy, as they did not have sufficient neural layers, training, and feedback (i.e. sufficient input and production) to achieve something close to optimum in their generation of human-like language.

But to state the obvious, LLMs do not experience the world as we do. They have no bodies, no sensory input, no social interactions (unless you count the part of their training that requires humans to provide them with corrective feedback). As a reminder, the fact that they have the capabilities they do–derived merely from the accumulated statistical relationships of parts of words–is remarkable. They do not “think,” at least, not in the manner in which our own cognition functions, and they do not continuously build and further refine their knowledge–yet–from ongoing interactions and input from other AI and with us.

LLMs are like if we took away all the other parts of our brain—those more ancient parts that continue solving problems and help us steer our way home and keeps our hearts beating—and only left the parts dedicated to language. That they are able to do all they can from mere statistical relationships forged from language alone is–again–remarkable, but it also shows us their limitations.

To be frank, that the dialogue has been so singularly focused on the “intelligence” of LLMs, with the goal of forming “artificial general intelligence” (AGI) seems remarkably off base to me. What I am far more interested in is the potential of these models to teach us something about our own development of language and literacy–and thus, how we can better teach those abilities–and to extend our own cognitive abilities.

Enhancing Cognition with AI

Towards this end, I want to suggest some implications for education that takes us away from fears about AI making kids dumber or taking away jobs from teachers.

AI and LLMs can enhance our cognitive abilities by helping us to:

Process Large Amounts of Information to Gain Knowledge: AI and LLMs are getting better and better (seemingly every week) in sifting through vast amounts of information, such as databases, research, transcripts, and other documents, to help us summarize, answer questions, paraphrase, and understand the relevant knowledge contained in them. Furthermore, they are getting better and better at translating across multiple languages and in reading multiple modalities. You can feed an LLM an image with text in another language and it can read it.
Augment Our Own Thinking and Writing: LLMs work really well in helping us spitball ideas or redraft our own writing. The fear that they will stop kids from being taught to write is misplaced – the writing produced by LLMs is only as good as what they are given. Yes, they are great at boilerplate forms of writing! But that’s the exact kind of writing that we do want to automate and reduce our own time and thinking on. When it comes to deeper writing and thinking like this series and post, it ain’t writing it for me. But I do find it really helpful when I get stuck or when I want to get suggestions for revision.

In Sum

The effectiveness of our use of AI and LLMs hinges on the quality of our input.

As with previous tools like Google Search, the more precise and informed our prompts, the more powerful and accurate their responses.

Another way of framing this idea: LLMs can help us further widen or refine our own ideas and language. They are far less useful in just handing them to us. They mirror and leverage what we provide to them.

There is a lot of talk about the “hallucinations” of LLMs, but perhaps a better way to frame it is as “pixelation,” or grain size. There are larger and smaller grain sizes of pixels. The coarser the grain, the less clear it is. The finer the grainer, the sharper it becomes. The more vague and broad the grain size we feed them, the more BS they will spit. The more precise and narrow grain sizes we provide, the more accurate and useful their responses will be. They can then help us move into different grain sizes from there (either widen our lens, or narrow our lens).

This means that we need to keep teaching our kids stuff. The more knowledge they have, the more precise and flexible their ability to wield language, the better they can use powerful tools like AI.

We can help kids to use AI in this way, and we can create tech-free spaces in our schools where they need to put in the cognitive effort and time they need to build their fluency with language and literacy and read texts that build their knowledge. And then when we engage them with the tech, we teach them how to use it to extend, rather than diminish, their own potential.

There’s implications here for teachers too – in fact, I think the most exciting potential for AI is actually freeing teachers up to spend more time teaching, and less time marking up papers and analyzing data. But that’s for another post.

#AI #LLMs #cognition #language #literacy #learning #education Discuss...

The Pathway of Human Language Towards Computational Precision in LLMs

Sun, 19 May 2024 15:05:11 +0000

Regularity and irregularity. Decodable and tricky words. Learnability and surprisal. Predictability and randomness. Low entropy and high entropy.

Why do such tensions exist in human language? And in our AI tools developed to both create code and use natural language, how can the precision required for computation co-exist alongside this necessary complexity and messiness of our human language?

An Algebraic Symphony of Language and Meaning

In our last post, we examined how there is a statistical and algebraic nature to language that drives the power of LLMs, and that the form and meaning of a language may be much more intertwined than we assume, given just how much meaning LLMs are able to approximate via computation of statistical arrays alone.

This interlacement of form and meaning is in relation to where and how words show up repeatedly in sentences and texts, not simply in the form of words themselves. Because all languages contain words that have the same form but different meanings. Some words that share the same form have entirely unrelated meanings (homophony), while other words with the same form have closely related meanings (polysemy). Yet LLMs are able to use them in a more or less “natural” manner due to the high dimensional mappings of word parts in statistical relation to one another – such that word analogies can be calculated mathematically:

“For example, Google researchers took the vector for biggest, subtracted big, and added small. The word closest to the resulting vector was smallest.” –Large language models, explained with a minimum of math and jargon, Timothy Lee & Sean Trott

That the algebraic and statistical relationship of words in natural language can drive computational models' generative capabilities suggests that language itself may reflect the limitations and potential of AI. And the thing with natural, human language is that while it is endlessly generative, it also tends to be imprecise. For our human usage, gestures and the context of our social interaction, who and when we are speaking to, plays a big role. As long as we get our main message across, we’re good.

Human language is fundamentally communicative and social, and there’s feelings involved.

The Imprecision of Human Expression

Imagine yourself in a bustling restaurant in an international airport, surrounded by people from diverse linguistic backgrounds. You're trying to communicate with a traveller whose language you don't speak. What do you do?

You resort to body language. You gesture hyperbolically and make exaggerated facial expressions. You point to objects, mime actions, and mouth simple words you hope the other person might use as a basis for basic understanding.

Depth, nuance, and complexity are not possible (beyond each individual’s imagination) in this most elemental of interactions.

So what is required for depth, nuance, and complexity?

A shared language, whether spoken, written, or signed. In which a small set of sounds, letters, or signs are concatenated in a wide assortment of ways, both commonplace and surprising, to convey a wide assortment of ideas and feelings.

Yet a shared language, while providing a platform for greater depth, may still remain imprecise. What is meant to be conveyed is not always exactly what is understood.

There are furthermore gradations of precision in language, beginning with the ephemeral and contextual nature of spoken and signed language, moving into the more ossified form of written language, in which spelling must be exact and word selection must be more intentional. There is also a movement from the language we use with our family, with frequent, commonly used words, to the language we use when writing an academic paper, with domain-specific, rarer words. In education, we often refer to this type of language as Tier 2 or 3 vocabulary.

If a person is equipped with more of that academic, domain-specific language, then greater precision in communication can be achieved. Yet the challenge of whether the listener hears and interprets what is intended remains. For example, in this article in Scientific American, “People Have Very Different Understandings of Even the Simplest Words”, they discuss how the more abstract a word, the more it can be tied to an emotional valence and someone’s identity and experiences, rather than a precise meaning.

The Computational Imperative

But in some ways, this inherent fuzziness of our language may be a feature, rather than a bug. It gives us a complex adaptive system for navigating, creating, and communicating in a world of complex adaptive systems.

For computers and computations, however, exactness and precision in language is required – either a line of code input runs the correct function as an output or it doesn’t. So it’s quite interesting that one of the most immediately powerful use cases so far of LLMs seems to be as a natural language interface to develop and review code.

Stephen Wolfram, in a long and interesting explainer on how LLMs work, “What Is ChatGPT Doing … and Why Does It Work?”, explores some of this tension between computational and natural language.

“Human language is fundamentally imprecise, not least because it isn’t “tethered” to a specific computational implementation, and its meaning is basically defined just by a “social contract” between its users. But computational language, by its nature, has a certain fundamental precision—because in the end what it specifies can always be “unambiguously executed on a computer”. Human language can usually get away with a certain vagueness. (When we say “planet” does it include exoplanets or not, etc.?) But in computational language we have to be precise and clear about all the distinctions we’re making.”

Computational Irreducibility and the Limits of Predictability and Learning

One of the limitations Wolfram raises between human and computational language is what he terms “computational irreducibility,” a term he uses to describe the difficulty in making accurate predictions for a highly complex system, such as for weather or climate systems. For such systems, it would require performing step-by-step computation based on an initial state, and thus can’t be swiftly calculated by compressing data.

In some ways, this “compression” of information is what we are doing with language as we use more “Tier 2” and “Tier 3” – or academic – words in our speech or writing. There is a greater density of information provided in academic speech and writing, in which more abstract words are used to convey complex concepts, and our sentences tend to become more compound and complex. The simpler, more frequent words, phrases, and sentences we use in our everyday speech are more regular and thus, more learnable.

. . . there’s just a fundamental tension between learnability and computational irreducibility. Learning involves in effect compressing data by leveraging regularities. But computational irreducibility implies that ultimately there’s a limit to what regularities there may be.

. . . there’s an ultimate tradeoff between capability and trainability: the more you want a system to make “true use” of its computational capabilities, the more it’s going to show computational irreducibility, and the less it’s going to be trainable. And the more it’s fundamentally trainable, the less it’s going to be able to do sophisticated computation.

Irregularity and Regularity in Language

What’s interesting to note here is that all languages have constructive tensions between regularity and irregularity. This tension may be a process of language being honed over time to be more learnable within our cognitive constraints. We’ve explored some of this before in our post, Irregularity Enhances Learning (Maybe), in which we examined a paper by Michael Ramscar that suggested there is some level of tension between language forms that show up again and again, and the language forms that are more infrequent, but thus inherently gain more of our attention. This relates to the theory of “statistical learning” with which we not only learn language, but also when we map a language to its written form.

For Wolfram, that LLMs are as powerful as they are suggests that human language is actually more statistically regular than we may have thought:

“my strong suspicion is that the success of ChatGPT implicitly reveals an important “scientific” fact: that there’s actually a lot more structure and simplicity to meaningful human language than we ever knew—and that in the end there may be even fairly simple rules that describe how such language can be put together.”

And instead what we should conclude is that tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought.

In other words, the reason a neural net can be successful in writing an essay is because writing an essay turns out to be a “computationally shallower” problem than we thought. And in a sense this takes us closer to “having a theory” of how we humans manage to do things like writing essays, or in general deal with language.

And so thus far the unrealized potential, for Wolfram, is that with a greater underlying capability in AI for computational language, it may be able to bridge our more “computationally shallow” human language with the precision required for more complex computations:

”its very success gives us a reason to think that it’s going to be feasible to construct something more complete in computational language form. And, unlike what we’ve so far figured out about the innards of ChatGPT, we can expect to design the computational language so that it’s readily understandable to humans.”

Decontextualized Language: The Pathway to Precision

On this pathway towards integration of human language and computational language, it’s interesting to consider how in our own language development, we are able to better “compress information” and develop greater precision in our thinking and communication as we learn and incorporate rarer and more abstract language into our own. We’ve spoken before about “decontextualized language” – the language that takes us beyond the immediate context and moment, and how such language can take us beyond our own delimited feelings and experiences, and into a realm of interpersonal and cultural thought, knowledge, and perspectives. This is the language of storybooks, of science, and – at it’s greatest extreme – of code. We begin teaching this form of language when we engage in storytelling with our children and reading with them and talking to them about books. It becomes increasingly dense and complex as we move into disciplinary study.

There is some evidence that training LLMs on this specific form of language is more powerful – such as this study training a “tiny LLM” on children’s stories. And if you think about what LLMs have been getting trained on thus far – it’s a corpus of written language, not training on conversations using everyday language. As we’ve explored in depth on this blog, written language is not synonymous with oral language – by nature of it being written, it is already more “decontextualized,” and requires more inference and perspective-taking. That LLMs are trained on such a corpus may be, in fact, why their algebraic and statistical magic can be so surprisingly powerful. There is a greater density of information in the written forms of our languages.

Implications for Teaching and Learning

What might all of this say about teaching and learning? Well, so far, one of the facets we’ve highlighted from LLMs is that the statistical nature of language alone can take us pretty far, which suggests that alongside of social interaction and peer engagement and communication, we want to increase the volume of that language exposure and use. And in terms of the nature of the language we want to increase: the more that the form of that language combines precision with abstraction, the greater computational power it can provide. Turning up the dial on decontextualized language use and exposure – in other words – providing our children with “textual feasts,” to use Alfred Tatum’s term, may be the key to enhanced learning.

Sources for Further Exploration

If you are interested in further exploring some of the tensions we began this post with – between regularity and irregularity in language, here’s some further interesting reads to geek out on:

“Source codes in human communication” by Michael Ramscar
“Expectation-based syntactic comprehension” by Roger Levy
“Cognitive approaches to uniformity and variability in morphology” by Petar Milin, Neil Bermel, and James Blevins

#language #computation #algorithms #learning #LLMs #cognition

The Algebra of Language: Unveiling the Statistical Tapestry of Form and Meaning

Sat, 27 Apr 2024 16:44:50 +0000

”. . . the fact, as suggested by these findings, that semantic properties can be extracted from the formal manipulation of pure syntactic properties – that meaning can emerge from pure form – is undoubtedly one of the most stimulating ideas of our time.”

—The Structure of Meaning in Language: Parallel Narratives in Linear Algebra and Category Theory

In our last post, we began exploring what Large Language Models (LLMs) and their uncanny abilities might tell us about language itself. I posited that the power of LLMs stems from the statistical nature of language.

But what is that statistical nature of language? A couple of years ago, I happened to listen to a podcast conversation between physicist Sean Carroll and mathematician Tai-Danae Bradley that touched on this topic that I found quite fascinating. So it came back to my mind as I was pondering all of this. In the conversation, Bradley describes the algebraic nature of language due to the concatenation of words. She notes that the statistics and probabilities of word co-occurrences can serve as a proxy for grammar rules in modeling language, which is why LLMs can generate coherent text without any explicit grammar rules.

She also shares a theory called the Yoneda Lemma:

“The Yoneda Lemma says if you want to understand an object, a mathematical object, like a group or a space, or a set, the Yoneda Lemma says that all of the information about that object is contained in the totality of relationships that object has with all other objects in its environment.”

She then links that mathematical concept to linguistics:

“. . . there’s a linguist, John Firth, I think in a 1957 paper, he says, “You shall know a word by the company it keeps. . . So what’s the meaning of fire truck? Well, it’s kind of like all of the contexts in which the word fire truck appears in the English language. . . everything I need to know about this word, the meaning of the word fire truck, is contained in the network of ways that word fits into the language.”

Since this interview, frontier LLMs have demonstrated that there is quite a bit of meaning that can be derived from the context and co-occurrences in which words show up in a body of language.

In a more recent paper, Bradley and co-authors Gastaldi and Terilla make the statement that I began this post with, which I will re-post here again, as it’s worth pondering:

”. . . the fact, as suggested by these findings, that semantic properties can be extracted from the formal manipulation of pure syntactic properties – that meaning can emerge from pure form – is undoubtedly one of the most stimulating ideas of our time.” [bold added]

They go on to further state:

“Therefore, the surprising properties exhibited by embeddings are less the consequence of some magical attribute of neural models than the algebraic structure underlying linguistic data found in corpora of text.”

In other words: LLMs (a type of artificial neural network) derive their generative linguistic capabilities from the algebraic and statistical properties of the texts they are trained upon. And the fact that they can do so suggests that the form and structure of language is intimately intertwined with its meaning.

In previous post, I referred to a Bloom and Lahey model from 1978, which delineates three components of language, form, meaning, and use:

Over the past few decades of linguistic research and language teaching, there may have been trends in a focus on one of those components over the other — in the past teachers of English as a second language, for example, may have put a stronger emphasis on the teaching of grammar, while more recent TESOL teachers may put a stronger focus on meaning over form. A more current strand of linguistics research focuses on “usage-based” theories.

There is some parallel in the edu sphere related to reading, in that there have been varying emphases in the research and practice on code-based (form) vs. meaning-based skills (i.e. the Simple View of Reading), with a more recent shift back to code-based emphasis, now seemingly defined by a perpetual tug-of-war between the two.

Rarely made explicit in any of these shifts in focus has been the assumption that form and meaning can be completely disentangled. After all, a writing system is somewhat arbitrary as a pairing of spoken sounds to symbols. This is, according to a 1980 account by Gough and Hillinger, one of the reasons that learning to decode can be so very difficult–because there isn't meaning in those symbols in-and-of themselves. It is rather the abstraction of what they represent that we need to learn.

Yet what if form and meaning are much more closely interwoven than we may have assumed? What if, in fact, a large quantity of meaning can be derived merely from an accumulated volume of statistical associations of words in sentences?

That LLMs have the abilities they do, given that they have not acquired language in the way that humans have (via social and physical interaction in the world) and without cognition, would seem to suggest that that the “mere” form and structure of a language possesses far more information about our world than we would have assumed – and that meaning is deeply and fundamentally interwoven with form.

More to ponder!

Some additional interesting sources on these topics to further explore (thanks to Copilot for the suggestions):

#AI #language #learning #statistics #mathematics #cognition #machinelearning

Language, Cognition, and LLMs

Tue, 23 Apr 2024 14:48:41 +0000

“Semantic gradients,” are a tool used by teachers to broaden and deepen students' understanding of related words by plotting them in relation to one another. They often begin with antonyms at each end of the continuum. Here are two basic examples:

Now imagine taking this approach and quantifying the relationships between words by adding numbers to the line graph. Now imagine adding another axis to this graph, so that words are plotted in a three dimensional space in their relationships. Then add another dimension, and another . . . heck, make it tens of thousands more dimensions, relating all the words available in your lexicon across a high dimensional space. . .

. . . and you may begin to envision one of the fundamental powers of Large Language Models (LLMs).

LLMs Are Powered by Language: Or, Words as a Vast Sea of Interrelated Statistical Arrays of Tokens

At root, the most powerful current forms of AI derive their capacities from decomposing human language into vast arrays of numbers based on their high dimensional statistical relationships and then predicting probabilistically what the next tokens are most likely to be.

There’s a kind of alchemical transformation that occurs that seems to maintain the meaning in the generative pronouncements of the frontier LLMs, all the more amazing because so far the very engineers who have designed the structure for these operations to occur do not fully understand what the models are doing to arrive at their seemingly oracular destinations.

In other words – the power of LLMs seemingly derives from the statistical power of language. There is something in the nature of language itself that seems to provide these computations of vast arrays of numbers with a lattice of our world, enabling LLMs to gain uncanny abilities from superpowered next word prediction. That LLMs have the generative powers they have—and that they have them without any consciousness or social interaction whatsoever—bolsters the argument that there is something about language itself, not just our brains, that is powerful.

An Aside on Power Law Scaling

One of the interesting features of human language is that it exhibits power scaling laws, as with other complex adaptive systems such as animals, cities, or businesses, as I recently examined in this post about Geoffrey West's fascinating book, Scale. The frequency of word usage, the length of sentences and texts, and the number of words in a language all follow power law distributions. This means that a small number of words are used frequently, while most words are used infrequently, and long sentences and texts are less common than shorter ones. As an interesting parallel, power law scaling is exhibited not only by language itself and through its generative manifestations in LLMs, but furthermore through the data—and the data centers and energy—required for training and using LLMs. Thus far, there is no apparent ceiling for LLM advancement in capability beyond that of the ceiling on the scalability of computer chips, data centers, and training data.

Innate vs. Developed Language: A Review of Our Path Traversed Thus Far

In our series “Innate vs. Developed”, we have explored the nature of language, challenging a widely held view that language is completely and innately hardwired in the human brain. Drawing upon “The Language Game” and “Rethinking Innateness” as sources of inspiration, we have considered the notion that language is an emergent, culturally-evolved phenomenon that mounts atop an “inner scaffold” that exists within our brains and further refines and specializes our neural networks through simple repeated social interactions over time.

We also considered how developing proficiency in reading and writing yet further extends and reinforces these channels across our brains – and how developing proficiency in multiple languages and literacies makes those networks even yet more robust.

We went further afield and investigated Cormac McCarthy’s ponderings on a seeming division between language and the ancient parts of our brain that exist before and beyond language. We also investigated the paradoxical nature of language, in that it can both enhance and potentially occlude our connection to our unconscious selves and to our natural world.

I promised at the end of the first post in this series that I would “maybe dig into the relation of cognition and language and literacy a little, and riff on the implications for AI, ANNs, and LLMs.” It’s taken me some time to let all of this ripen, especially given the rapid pace at which LLMs are developing. I think I’m finally starting to gain some perspective on LLMs that may allow me to indulge in a little riffing.

Sources for Spelunking

Before said indulgence in my next post, I’ll first outline a few sources I will draw upon at the outset so you can go off and explore on your own before being further biased by my own rambling.

First, if you are interested in learning more about that analogy of a high dimensional semantic gradient and gaining insight into how LLMs kinda work, I recommend three sources shared by Ethan Mollick (he himself is also an excellent source):

Second, if you want to explore some interesting aspects of language itself that are related to LLMs, check out the following:

An Anticipation of Where We May Go From Here

From these and other sources, including dabbling with Copilot and Claude and Gemini, I will ponder some of the following points on what computational neural networks may be able to tell us about language and what language may be able to tell us about LLMs – and, ultimately, perhaps, what this all may be able to tell us about teaching and learning:

The surprisingly inseparable interconnection between form and meaning
Blundering our way to computational precision through human communication; Or, the generative tension between regularity and randomness
The human (and now, machine) capacity for learning and using language may simply be a matter of scale
Is language as separable from thought (and, for that matter, from the world) as Cormac McCarthy said? . . . which actually ended up becoming more about fuzziness and precision in language, but hey!
Implicit vs. explicit learning of language and literacy

#language #literacy #LLMs #computation #statistical #learning #ai

Research Highlight 4: Structuring Classroom Learning for Student Success and Agency

Fri, 23 Feb 2024 18:01:43 +0000

Thanks to a podcast, Emerging Research in Educational Psychology, from professor Jeff Greene speaking with professor Erika Patall about a meta-analysis she was the lead author on, I learned about her paper that looked across a large number of studies to synthesize findings on the impact of classroom structure. I thought some of the high-level takeaways were well worth highlighting with you for our 4th research highlight in this series!

Citation: Patall, E. A., Yates, N., Lee, J., Chen, M., Bhat, B. H., Lee, K., Beretvas, S. N., Lin, S., Man Yang, S., Jacobson, N. G., Harris, E., & Hanson, D. J. (2024). A meta-analysis of teachers’ provision of structure in the classroom and students’ academic competence beliefs, engagement, and achievement. Educational Psychologist, 59(1), 42–70. https://doi.org/10.1080/00461520.2023.2274104

I think it’s no surprise to most educators that providing structure for kids, both in terms of the classroom environment and culture, and in terms of the design of instructional tasks, is critical to improving student learning. Part of this work is what we often term “classroom management,” but as the paper describes, the work is far more encompassing than that:

“In sum, creating structure is a multifaceted endeavor that involves a diverse assortment of teacher practices that can be used independently or in various combinations, as well as to various extents, and are all intended to organize and guide students’ school-relevant behavior in the process of learning in the classroom.”

So what’s the top line item from this extensive meta-analysis?

Students benefit from predictable environments for learning

“Students universally need predictable environments that support their attempts to experience and develop competence at all school levels (e.g., Aelterman et al., 2019; Ryan & Deci, 2017; Skinner et al., 2008).”

OK. Makes sense. But where it gets interesting is that a standard assumption could be, as the authors hypothesized initially, that it’s mostly younger kids that need more structure, whereas older students need less. But that’s not what they found!

All students benefit from structure

Instead, they found that all students consistently benefited from structured learning no matter the age or grade-level:

“There was no instance, regardless of what moderator we considered, in which the relationship between classroom structure and either student engagement or competence beliefs was negative. Moreover, there was no instance in which classroom structure interventions had a negative effect on achievement.”

Classroom and school structure should build student autonomy

The other important pattern that emerged was that structure that focuses more on control of student behavior, rather than on building student autonomy, can be counter-productive:

“. . . moderator analyses with the correlational studies revealed that the relationship between classroom structure and achievement was statistically significantly stronger when structure was delivered within the context of support for autonomy and positive emotion.”

This is important in reframing how we may think about classroom management strategies at large. The ultimate goal of structured learning is to build greater student agency and autonomy.

While this may be visibly evident in kindergarten, for example, when students shift from more open-ended play to sitting for longer periods of time at desks engaged in more overtly academic tasks, this trajectory can be just as important in 6th grade, when students are learning to take on greater loads of homework and reading a higher volume of texts on their own, or in 8th grade, when students are learning to construct and debate viable and complex arguments from multiple perspectives and sources.

How are we building their autonomy with paraphrasing and summarizing evidence, both in oracy and in writing? How are we building their agency and automaticity with precise and fluent reading and spelling of multisyllabic words? These questions also show us a pathway to the type of structure we need to provide in terms of modeling, feedback, and repeated opportunities over time, based on the discipline and the grade-level we teach.

“These findings provide support for a rarely tested key principle from perspectives on classroom management and motivation, namely, that good classroom structure guides students in planning and self-regulating their own behavior, helping them to know how to act effectively within the classroom environment (e.g., Emmer & Stough, 2001; Skinner & Belmont, 1993).”

This larger goal supports our planning and vision at the school-level as well. How are we consistently teaching our students to plan and set goals to accomplish tasks? Do they know how to listen and take notes? Do they know what types of questions they can ask to learn something from an expert or from a text? Do they know how to study, how to manage their time and attention and materials? How to marshal and gather resources when they don’t have them at hand?

What are we teaching across our grade-levels and classrooms so that when students graduate from our school they are equipped for success at the next level of their lives, and we have evidence of their progress towards these expectations along the way?

While rewards and punishment may be a part of teaching students to do these things, the goal must always be towards developing that greater autonomy:

“Taking a slightly stronger position, self-determination theory has routinely emphasized that well-intended strategies for supporting learning like rewards and surveillance can back-fire because they have potential to be experienced as pressure or attempts to control students, even as they simultaneously provide information about competence (e.g., Reeve, 2009; Deci et al., 2001).”

Ask students about their perceptions of their learning

Our schools are getting much more accustomed to harvesting and using an abundance of data about students, but sometimes what can get missed in all of that data collection and analysis is talking to students themselves about their experiences with learning:

“Though not without bias, asking students themselves about their perceptions of the environment reveals the strongest associations between structure and engagement.”

In my own experience, I was sometimes surprised to learn when I talked one-on-one with students who were the most challenging – the ones who would be out in the hallways, cursing out their teachers, egging on and fighting with other kids, etc – they were the ones who would express the most interest in having more structure and guidance from adults in their lives. They wouldn’t say it directly – I don’t think they always knew how to express it, but it came out in comments like, “why aren’t there more teachers out in the hallways?,” or that their favorite teacher was the gym teacher because “he told us what to do.”

Kids can be challenging, especially teenagers, but they are often screaming out for more structure and guidance from adults in various ways. And the more they are left stranded to make their own decisions, the more they will act out.

That said, we also know that when we provide that structure in a way that the student doesn’t like, it can blow up in our faces. Students may perceive attempts to provide structure as more about control than about supporting their learning:

“Anecdotally, teachers have often noted that structure and support for autonomy seem at face value to be at odds with one another, with teachers sometimes feeling like they need to prioritize communicating their own expectations, organizing, and guiding student behavior, while limiting students’ choices and opportunities to influence learning activities, particularly when students misbehave or are at risk of poor achievement (Jang et al., 2010; Reeve, 2009). However, rather than being at odds with one another, it is important to recognize that the effects of structure vary to the extent that structure is open to interpretation depending on how it is delivered and in what broader context (e.g., Cheon et al., 2020; Ryan & Deci, 2017).”

In Sum

So what are some key takeaways from this meta-analysis? Consider the following points based on developmental expectations for that age and grade:

Key points for teachers:

Structure matters: Provide a predictable and supportive learning environment that fosters students' sense of competence.
Balance is key: Combine structure with autonomy-supportive practices that encourage student choice and ownership.
Focus on self-regulation: Guide students in developing their own planning and behavior management skills.
Seek student feedback: Regularly gauge students' perceptions of the learning environment to ensure it's meeting their needs.

Key points for school leaders:

Professional development: Equip teachers with strategies for creating structured yet autonomy-supportive classrooms.
Supportive culture: Foster a school environment that values both structure and student agency.
Data-driven decisions: Use student feedback on the learning environment to inform instructional practices.

#research #classroom #management #behavior #psychology #environment #learning #agency #autonomy

Learning and the Brain: Keeping that Goldilocks Balance

Sat, 14 May 2022 01:34:55 +0000

I wrote a little while ago about Andrew Watson’s excellent book, “The Goldilocks Map.” I had an opportunity to attend a Learning and the Brain conference, which was what sparked Andrew’s own journey into brain research and learning to balance openness to new practice with a healthy dose of skepticism. In fact, Andrew was one of the keynote presenters at this conference – and I think his trenchant advice provided an important grounding for consideration of many of the other presentations.

I think there’s something in the nature of presenting to a general audience of educators that compels researchers to attempt to derive generalized implications of their research that can all too easily overstep the confines of their very specialized and specific domains. For example, Mary Helen Immordino-Yano gave a powerful keynote on her ongoing research into emotions and their relation to learning. There were intriguing implications for education from brain scans and surveys of individual children, such as the insight that emotional engagement activates the same part of the brain (the brain stem) that keeps us alive at a subconscious level. This reflects a deeper form of learning that changes consciousness and is only accessed when attention is directed internally, rather than outwardly. Furthermore, and counterintuitively, such emotional engagement is most activated by admiration for others based on the nature of their virtue, rather than merely by a demonstration of their ability.

Her talk was accompanied by useful and trenchant guiding questions for us to consider as implications for education:

What might this mean for emotional well-being? Character development?
What might it mean for how we use technology?
How might this change how we think about productivity? critical thinking?

Yet there was a moment – a very small moment that was more of an aside – when Immordino-Yano drew out themes around “meaningful learning” (i.e. personal connections to ideas, rather than emotions related to outcomes) to make a critique of our entire system of education. There is plenty to critique in our motley assortment of localized systems in the U.S.– but it was a moment that activated my own skepticism, as it must be remembered that Immordino-Yano’s research involves individual kids at a clinic watching a video and responding to questions, and then receiving an fMRI while rewatching the video. Hardly the conditions of a classroom, and extrapolating from such findings to the education system at large may be overstepping those specific findings.

To be clear, I found Immordino-Yano’s keynote to be the most intriguing and powerful of the entire conference – but thus I found it all the more instructive to attempt to maintain that “Goldilocks” balance of a healthy mix of openness and skepticism when considering how findings from research may apply to schools and classrooms.

Another presentation from neural scientist Andre Fenton also made me reflect on the lines between specialized research and extrapolations to classroom practice. Fenton provided a very detailed overview of his research into cognitive control training with mice in a laboratory, and to his credit, he did not make many general extrapolations beyond a few analogies, such as the quote, “What we think we become,” and some general advice such as considering how labs in science class can give kids an opportunity to “discover what is salient and ignore what isn’t” — to give “kids an opportunity to be judicious in how they process the information given to them.”

Relevant to such findings, Andrew Watson warned during his presentation to “never change your practice from research based on non-human animals.” There are indeed intriguing aspects of executive function and cognitive control training as it relates to mice we can consider from Fenton’s research, but until we have psychological studies with humans related to such findings, there may be little we can yet extrapolate on to classroom practice.

As I grappled with this, I realized that this was perfectly OK. We don’t always NEED to immediately overgeneralize specialized findings to classroom practice! We can be intrigued, we can be provoked, we can learn about the specific conditions and findings in relation to the research, and ruminate on what they might mean – but we must resist jumping to overzealous conclusions, and instead maintain our thirst for further research and learning.

Speaking of zealotry, in his keynote, Steven Pinker acknowledged that humanity et al. seems to be losing its collective mind, and provided a call for a cool-headed commitment to rationality and that “cognitive tools should be at the fingertips of every kid.” Pinker doesn’t believe people are irrational; he believes we are “logical about content relevant to our own lives and subject-matter knowledge,” but that we “have more trouble with formal rationality,” the “abstract rules and formulas that can be applied to any content.” We therefore need to make the tools of rationality “second nature,” and ensure the norms of rationality are upholded by our organizations and institutions, including educational ones. How we do this, however, wasn’t entirely clear beyond perhaps explicitly teaching concepts such as confirmation bias, systems of logic, and game theory.

There was something tucked into his talk that I found echoed in other talks as well, including by Andrew Watson, Ulrich Boser, and Jonathan Gottschall, which is that rather than seeking confirmatory evidence for our beliefs, we must acknowledge our own fallibility and instead seek evidence that challenges our thinking. We must seek to falsify our own beliefs to rationally interrogate their veracity.

In Jonathan Gottschall’s talk, he presented the paradox of stories for our species, which is that we possess a unique power that we can harness to expand our perspectives, knowledge, and empathy, but that there is a dark side to storytelling in that we are all too easily captivated by them on venues such as social media, and these seemingly innocuous stories often promote in-group and out-group dynamics through the casting of a villain. To combat this negative undertow of stories, Jonathan Gottschall urges us to maintain skepticism towards our own narratives, not just “the other side’s.”

With that wise advice in mind, I should note that my narrative account of the conference leaves out some truly insightful and compelling insights and information that I gained from talks such as from Carolyn Strom, Daniel Willingham, Ulrich Boser, and William Stixrud, not to mention some further implications from Immordino-Yano’s research findings.

There’s always more to learn! I hope I get an opportunity to attend another Learning and the Brain conference in the future.

#conference #learning #brain #research #neuroscience #skepticism #empiricism #narratives

An Ontogenesis Model of Word Learning in a Second Language

Mon, 14 Mar 2022 01:18:58 +0000

A recent paper caught my eye, Ontogenesis Model of the L2 Lexical Representation, and despite the immediate mind glazing effect of the word “ontogenesis,” I found the model well worth digging into and sharing here—and it may bear relevance to conversations on orthographic mapping.

Bordag, D., Gor, K., & Opitz, A. (2021). Ontogenesis Model of the L2 Lexical Representation. Bilingualism: Language and Cognition, 1–17. https://doi.org/10.1017/S1366728921000250

How we learn words and all their phonological, morphological, orthographic, and semantic characteristics is a fascinating topic of research—most especially in the areas of written word recognition and in the learning of a new language.

This paper thus struck me as an especially insightful attempt to synthesize much of that research. To be clear: this is a model that has not been directly tested, but it seems well-aligned to other theories like orthographic mapping and the lexical quality hypothesis, as well as explain some of the tension between regularity and irregularity in word forms and frequency.

“In intentional word learning from definitions, L2 words with easily encoded orthographic form are better retained. In incidental word learning, words with unusual form are more salient and more easily detected.”

I enjoyed especially the visualizations of phonological, orthographic, and semantic mapping and how they can develop at different rates and trajectories but with interdependence.

A couple of terms that are key to the ontogenesis model (the authors should perhaps come up with a catchier name):

Fuzziness: “inexact or ambiguous encoding of different components or dimensions of the lexical representation that can be caused by several linguistic, cognitive, and learning-induced factors. These factors include, among others, changes in neural plasticity, the complexity of mapping L2 semantic representations on the existing L1 semantic representations and of mapping L2 forms on the semantic representations, and problems with L2 phonological encoding”
Optimum: “the ultimate attainment of a representation (or its individual components), i.e., the highest level of its acquisition, when the representation is properly encoded and no longer fuzzy”

These concepts give us a way of visualizing, as per the graphs above, how different dimensions of a word may develop over time. Our goal, of course, is to reach optimum encoding across the sounds, spelling, and meaning so that it is anchored in our long-term memory (i.e. fluent, automatic access and retrieval).

“Each lexical entry can comprise representations from the three domains, and each representation is interconnected with other representations of the same type. Each domain representation can thus develop its own, idiosyncratic network of connections to other representations. Together they constitute the phonological, orthographic, and semantic networks in the mental lexicon.

“The model sees a word’s lexical integration as a gradual process, in which connections to other representations grow in number and strength until the optimum is potentially reached. The optimum in this dimension can be described as an adequately rich network of appropriate connections. Fuzziness in this dimension then refers primarily to an inadequate number of connections to other representations (typically too few) and/or to their inadequate strength (typically too weak), as well as inappropriate connections (e.g., an erroneous connection between the phonological forms of through and dough due to the influence of orthography).”

The added complexity of learning words in a new language is that there are variable interactions across phonological, orthographic, and semantic dimensions with our native language.

“Depending on the grapheme-phoneme relationship between the L1 and L2 and within L2, simultaneous acquisition of orthographic information may thus move the phonological representation closer to or further away from its optimum (and vice versa). Furthermore, the effect of L1 orthography on spoken word recognition in L2 is modulated by L2 proficiency and word familiarity

…a new L2 form representation is connected not only to other, previously established, L2 form representations, but also to L1 forms. The OM thus differentiates between two subnetworks within the form network: an IntraNetwork and an InterNetwork. The IntraNetwork refers to the connections between a given L2 form and other L2 forms, as discussed above. The InterNetwork refers to cross-language connections, i.e., the connections between a given L2 form and L1 forms.”

An interesting and insightful model! I look forward to seeing further studies drawing upon it.

#language #literacy #models #learning #phonology #secondlanguageacquisition #multilingual

Discuss...