models — Language & Literacy

Reviewing Claims I’ve Made on LLMs

Mon, 07 Oct 2024 00:10:15 +0000

When I typically begin a series of blogs to conduct nerdy inquiry into an abstract topic, I don't generally know where I'm going to end up. This series on LLMs was unusual in that in our first post, I outlined pretty much the exact topics I would go on to cover.

Here's where I had spitballed we might go:

The surprisingly inseparable interconnection between form and meaning
Blundering our way to computational precision through human communication; Or, the generative tension between regularity and randomness
The human (and now, machine) capacity for learning and using language may simply be a matter of scale
Is language as separable from thought (and, for that matter, from the world) as Cormac McCarthy said?
Implicit vs. explicit learning of language and literacy

Indeed, we then went on to explore each of these areas, in that order. Cool!

Some Hypotheses from This Series

What theories have we raised through this exploration?

1) LLMs gain their uncanny powers from the statistical nature of language itself; 2) the meaning and experiences of our world are more deeply entwined with the form and structure of our language than we previously imagined; 3) LLMs may offer us an opportunity to further the convergence between human and machine language; 4) AI can potentially extend our cognitive abilities, enabling us to process and understand far more information; 5) Both human and machine learning progresses from fuzzy, imprecise representations to higher precision, and the greater the precision, the greater the effort and practice (or “compute”) that is required; and 6) LLMs challenge Chomsykan notions of innateness and suggest that implicit, statistical learning alone can lead to gaining the grammatical structure and meaning of a language.

While I’ve been mostly positive and excited about the potential of AI (aside from pointing out how it is accelerating the looming ecological catastrophe that seems to be our trajectory) I should probably pause here to acknowledge that there may be important counterpoints to many of these (perhaps somewhat starry-eyed) hypotheses.

Onto the Counterclaims

Let's take a more critical look at some of my claims:

1) I claim that language is fundamental to the generative powers of LLMs.

Yet Andrej Karpathy, who is no stranger to LLM development, tweeted:

It's a bit sad and confusing that LLMs (“Large Language Models”) have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something.

They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can “throw an LLM at it.”

I agree that LLMs are performing “statistical modeling of token streams,” and that “for any arbitrary vocabulary of some set of discrete tokens, you can ‘throw an LLM at it.’”

We now have multimodal LLMs that are modeling out of token streams of audio, visual, and text, and will no doubt have ones feeding from additional streams of sensory data as they are increasingly paired with cameras on humans, objects, and robots.

Yet I also think Karpathy undersells that when LLMs suddenly exploded into general public awareness and fascination, it was merely a “historical” fact that they were trained upon vast amounts of human generated text and were able to reproduce and generate human language. As we’ve explored in this and a previous series, there is something about human language itself uniquely adapted for our brain circuitry and the propagation of our culture within social interaction in our world. And being able to communicate with a powerful computational model through the medium of conversational human language has been a revolutionary advent. We are just in the beginning stages of grokking it.

As I tweeted in response to Karpathy, token streams may be applied to anything, but human language seems to be uniquely suited to the advancement of combined human and machine learning. Not only because we rely on it for communication – but furthermore due to the algebraic and statistical nature of our language.

Recent case in point: the viral attention currently on NotebookLM’s Audio Overview. Listening to a conversation, however artificial, resonates with us, because that's what's in our social nature. And, surprisingly, it does a fairly good job of surfacing information from across multiple multimodal sources (and soon, across languages) that we find interesting, relevant, and meaningful.

Speaking of NotebookLM Audio Overview. . . here’s one derived from all the blog posts (except this one) from this series, as well as the sources–outlined in post 1–that inspired them all: https://notebooklm.google.com/notebook/a4f35399-e288-4293-b2d2-0489e6b1f037/audio

4) I claim there is great potential for AI to extend our cognitive capabilities

Yet there is a strong case of an equal and commensurate danger that use of LLMs can reduce our cognitive capabilities.

Learning more formal content and skills, like what we learn in school or in a job, requires deliberate effort until we develop an unconscious fluency. If students learning new concepts and skills externally automate their practice of new learning (such as writing or math) to an LLM, then they will not–ironically–be able to develop the automatized internal knowledge and capacity they need to wield powerful tools like AI more effectively.

When “experts” use tools like AI, they know where the gaps are in the output and are able to use it strategically to enhance their own production and output. A few examples of this:

Simon Willison, a programmer who is also a great communicator, uses different LLMs to support his projects, and writes and speaks about how he does so. Here’s a podcast, for example, where he explains how he uses them.
Nicholas Carlini, a research scientist at Google DeepMind, similarly wrote about how he uses AI to support his work.
Cal Newport, who writes extensively about how to do “deep work” in a world of distractions, recently wrote in The New Yorker how he has found ChatGPT useful to his writing.

All the people above are highly skilled at what they do – so when they explore and then figure out how to use AI to support their work, they do so in a way that does not diminish their own hard-earned ability, but rather enhances and extends their capabilities.

On the other hand, for students–who are by definition novices in the skills and knowledge they are learning–an over-reliance on AI tools may limit their ability to develop skills such as literacy, critical thinking, problem-solving, and creativity.

Recent reports on AI in education, such as from Cognitive Resonance, Center for American Progress, and Bellwether, have rightfully raised this concern.

And all educators, whether K-12 or in higher ed, are seeing an increasing use of AI by students to complete homework assignments, so this danger of truncating the development of internal capacity is real.

I think the steps we can to take to address this are two-fold:

limit the use of digital technology for learners at the earliest stages of learning, whether learners are preK-3 or learners being introduced to a new concept
move practice of essential skills directly into the classroom as much as possible, while considering how AI could be used to extend, rather than diminish, any practice and feedback outside of the classroom

In a post on ethical use of AI, Jacob Kaplan-Moss argues that fully automated AI is unethical in the public sector due to its inherent biases and potential for unfairness in high-stakes situations. In contrast, the assistive use of AI can enhance human decision-making.

This assistive vs. automated use of AI may be a useful frame for thinking of how AI can be used most ethically and effectively in education. We want AI to be used to assist the learning process, rather than simply automating the solving of math problems or writing essays. This view aligns with Ethan Mollick’s idea of “co-intelligence” as well.

So far, I find the most powerful and interesting assistive applications for AI are more focused on educators (“the experts”), rather than on students (“the novices”). Teachers can leverage AI to support administrative tasks, analysis of student data, and consider additional enhancements of their instruction based on student data.

That said, I don’t think the assistive use cases of AI are only limited to “experts” in a domain. AI can also help to equip those without knowledge and expertise in a specific area with the language they need to navigate learning or real-world communications more effectively. And there are some really interesting use cases of AI for feedback on student thinking and writing, when structured with specific guidelines and criteria and with the teacher in the loop.

But in the context of classroom learning, such uses must be very strategically designed and cautiously incorporated. For example, see this explanation from professor Michael Brenner on how he has begun incorporating AI into his pedagogy. But note this example is from a graduate level math class, so again, that novice vs. expert dynamic is quite different from what we would need to consider at a preK-8 level. But even at that graduate level, you can see there is quite a bit of complexity the instructor needed to consider and think through to design his course to leverage LLMs so strategically.

There’s a lot more to unpack here on all sides of the equation. I’ll leave this one here for now, accepting non-closure, and I hope to dig further into these tensions and opportunities in both this space and in my professional work.

6) I claim that LLMs have shown that language can be learned without any innate programming or structure – therefore demonstrating the power of statistical, implicit learning

I’d moved into the “Chomsky is wrong” camp for a while now, but I happened to listen to an interview of Jean-Rémi King recently, a scientist at Meta AI, by Stephen Wilson on The Language Neuroscience podcast (did I tell you I’m a nerd?). Towards the end of the conversation, King warns against writing off Chomsky too readily, and that there is something intrinsic to the human brain in its readiness for language.

I uploaded the relevant portion of the transcript from the interview, and asked Claude AI for a concise summary of King's main claims, which it willingly obliged (while I’m sure it drew upon an unconscionable amount of energy):

King argues that human brains likely don't use the same “next word prediction” principle as large language models for language acquisition, primarily because humans are exposed to far less linguistic data than these models.

He contends that while language models have shown impressive capabilities, they are extremely inefficient compared to human language learning, suggesting that we're missing some fundamental principles of how humans acquire language so efficiently.

While I think I’ve tried to temper most of my pronouncements throughout this series, I think it’s important to acknowledge that the fact that LLMs can learn language from statistical associations of word tokens alone does not mean that is exactly how we humans must also learn language.

It is rather a proof of concept that language can be learned in this way (without any innate grammar or teaching of rules). But as King points out, this is via a scale of input that is ridiculously and exponentially larger than that of any child.

That said, there are other Artificial Neural Networks (ANNs), such as in the research of Gašper Beguš, that learn from raw speech in an unsupervised manner, more closely mimicking human language acquisition. His lab has found interesting similarities between these ANNs and the human brain in processing language sounds – a parallel to King’s own research, which has found that LLM models can generate brain-like representations when predicting words from context.

And there will continue to be research into tinier models trained off sparser, and potentially richer, data.

But as King points to, there’s just so much more we need to learn. And this is exactly where I find all of this the most exciting.

Where I may be most rightfully critiqued in my last post, and perhaps in other posts, may be in extrapolating from the theoretical demonstration of LLMs to implications for classrooms.

So let me state my position a bit more clearly in case there was any confusion that I am falling onto the side of the Goodmans or something. Children need consistency, stability, clarity, and coherency in their learning experiences, and teaching what is most important to know for a given subject directly and explicitly is critical. For children at the earliest stages of learning abstract skills and content, such as learning to read, explicit and well-structured teaching is essential. At the same time, however, we need to ensure that students have abundant structured opportunities to apply and practice what they are learning – and this is where ensuring they are spending more time reading, writing, and talking–connected to the content of what we are teaching–is essential.

If you have more critiques that I am missing in any of the above, please do share!

Egads, I think I may actually have ANOTHER post left in me after all of this. Who knew LLMs would be such an interesting topic?!

#language #literacy #AI #LLMs #cognition #research #computation #models

Language—like reading—may not be innate

Sat, 12 Aug 2023 07:48:06 +0000

Language is a uniquely human phenomenon that develops in children with remarkable ease and fluency. Yet questions remain about how we acquire language. Is it innately wired in our brain, or do we learn all facets rapidly from birth?

Two books – Rethinking Innateness and The Language Game – provide us with some fascinating perspectives on language learning that bears implications for how we think about learning to read and write, and furthermore, for how we talk about the power and limitations of AI.

A Review of Where We’ve Been

In a previous series, we pursued an interesting debate about whether learning to read is more unnatural than learning oral or signed languages. We also investigated the notion, frequently stated by “science of reading” proponents, that “our brains were not born to read,” while our brains are “hard-wired” for language.

While I agree with researchers Gough, Hillinger, Liberman and others that written language is more complex and abstract than oral language and—hence—more difficult to acquire, I’m not convinced that calling it unnatural is most accurate. Instead, I suggest terming it effortful.

In one of the earlier papers we examined, Liberman argued that oral language is pre-cognitive, meaning that it requires no cognition to learn and thus is more natural to acquire. He used this claim to counter the Goodmans’ assertion that oral and written language were largely synonymous, and that kids therefore could learn to read merely through exposure to literacy, rather than explicit instruction in the alphabetic principle (“whole language”). While I most definitely don’t agree with the Goodmans, I paused on Liberman’s claim with some skepticism, as there are a subset of kids who also struggle to develop speech and language skills, just as there are a subset of kids who struggle to develop reading and writing skills.

Liberman also made another strong claim that I paused on: that the evolution of oral language is biological, while written language is cultural (which parallels arguments that language is “biologically primary” while reading and writing are “biologically secondary,” which I have also questioned, given that making the distinction is harder than it seems when social and cultural advancements are deeply interwoven with human existence over generations of time). But I mostly accepted this premise, as it seems to be self-evident that language is baked into our brains. After all, babies begin to attune to languages spoken around them even while still in the womb.

Liberman does not stand on his own in these assertions, I should hasten to add. I just bring one of his papers up because we spent time with it here. Noam Chomsky, for example, has long argued for a universal grammar, which is taught in foundational courses on linguistics, and the related study of generative grammars is alive and well.

Why is this important? It’s important because whether we consider language “natural” or written language “unnatural” bears implications for how we decide to teach them (or not). If we think of language as completely innate, then perhaps we don’t think it requires much of any teaching that is explicit, systematic, or diagnostic. Or conversely, if we think of written language as wholly unnatural, we may not consider how to strategically design opportunities for implicit learning, volume, and exposure.

Yet I have just read two books, written in two different decades, that provide some really interesting critiques against the widely adopted supposition that language is innate.

Language Models

The first book, Rethinking Innateness: A Connectionist Perspective on Development, by Elizabeth Bates, Jeffrey Elman, Mark H. Johnson, Annette Karmiloff-Smith, Domenico Parisi and Kim Plunkett, was published in 1996, and approaches language from the lens of neuroscience, explaining connectionist models and their implications for neural development and learning. These models are not only part of the lineage of the current renaissance of Large Language Models, such as ChapGPT, but also part of a lineage of models that have informed our theoretical understanding of how children learn to read, and may continue to inform explorations of “statistical learning.”

I was led to this book from a recommendation by Marc Joanisse, a researcher at Western University, when he commented on my tweet (are we still calling them that?) about research on artificial neural networks that suggests they can accurately model language learning in human brains.

It was a great recommendation, and I found the book extremely relevant to ongoing conversations about AI and LLMs today, in addition to providing key insights from connectionist models into language and literacy development that challenge assumptions around innateness, such as:

Simulations show that simple learning algorithms and architectures can enable rapid learning and sophisticated representations, such as those seen in younger infant competencies, without any innate knowledge.
U-shaped learning and discontinuous change also occur in neural networks without innate knowledge, due to architecture, input, and time spent on learning. This parallels studies of the development of linguistic abilities in children, such as the learning of past-tense and pronouns.
The way in which neural networks learn new things can be simple, yet the learning yields surprisingly complex results. This complexity emerges as the product of many simple interactions over time (this point, written in 1996, seems incredibly prescient to me as a reader in 2023 using Claude2 to distill and summarize my notes from each book for this post).
Connectionist models show global effects can emerge from local interactions rather than centralized control. Connectionist models also show how structured behaviors can emerge in neural networks through exposure to and interactions with the environment, without explicit rules or representations programmed in (which makes me think of statistical learning).

Language Games

The second book, The Language Game: How Improvisation Created Language and Changed the World, by Morten H. Christiansen and Nick Chater, was published last year in 2022, and focuses more on cultural evolution and social transmission of language, arguing that language is akin to a game of charades that is honed and passed on from generation to generation. I happened to check it out from the library and read it concurrently with Rethinking Innateness, and there was some great synergy between the two, especially around challenging the notion that language is innate. Some of the key points of the book:

Language relies on and recruits existing cognitive mechanisms, becoming increasingly specialized through extensive practice and use.
Language evolves culturally to fit the human brain, not the reverse.
Language is shaped for learnability and for coordinating with other learners, not for abstract principles and rules. Children follow paths set by previous generations.
This cultural transmission across generations shapes language to be more learnable through reuse of memorable chunks (“constructions”).
Due to working memory limitations, more memorable chunks survive, causing a design without a designer. These chunks become increasingly standardized over time.
Language input must be processed immediately before it is lost (what the authors call the “Now-or-Never” bottleneck).
Chunking sounds into words and phrases buys more time to process meaning.
Gaining fluency with increasingly larger and more complex constructions of language requires extensive practice.

Across Connectionism and Charades

Together, these books provide a picture of language as an emergent, complex cultural and statistical phenomena that has evolved from simple learning mechanisms across generations. Rather than an innate universal grammar baked into children’s brains, language itself has adapted and molded over time to become essential to our human inheritance, as with clothing, pottery, or fire. Language emerges through social human communication and interaction. It becomes increasingly complex, yet also streamlined and standardized, without any explicit rules governing it beyond the constraints of our brains, tongues, and cognition.

This isn’t to say there isn’t something unique about the human brain architecture in comparison to our closest animal brethren—there clearly is—but rather that language has adapted symbiotically to that architecture, like a parasite, rather than specific parts of our brain that are genetically pre-determined for language.

Like reading, using language drives increasing specialization of our brain—and this specialization, in turn, drives greater cognitive ability and communicative reach.

There’s a lot here to unpack and synthesize, but I wanted to begin bringing these together, because just as I feel myself pushing against the zeitgeist when I argue that calling learning to read “unnatural” isn’t quite right, so too are arguments that learning language is not “innate” swimming against the tide. These two counterclaims are interwoven, and I think worth further exploring.

Consider this post the first in an exploratory series. We’ll geek out on language development and its similarities and differences to literacy development, maybe dig into the relation of cognition and language and literacy a little, and riff on the implications for AI, ANNs, and LLMs.

#language #literacy #natural #innateness #unnatural #reading #neuralnetworks #research #brains #linguistics #models

An Ontogenesis Model of Word Learning in a Second Language

Mon, 14 Mar 2022 01:18:58 +0000

A recent paper caught my eye, Ontogenesis Model of the L2 Lexical Representation, and despite the immediate mind glazing effect of the word “ontogenesis,” I found the model well worth digging into and sharing here—and it may bear relevance to conversations on orthographic mapping.

Bordag, D., Gor, K., & Opitz, A. (2021). Ontogenesis Model of the L2 Lexical Representation. Bilingualism: Language and Cognition, 1–17. https://doi.org/10.1017/S1366728921000250

How we learn words and all their phonological, morphological, orthographic, and semantic characteristics is a fascinating topic of research—most especially in the areas of written word recognition and in the learning of a new language.

This paper thus struck me as an especially insightful attempt to synthesize much of that research. To be clear: this is a model that has not been directly tested, but it seems well-aligned to other theories like orthographic mapping and the lexical quality hypothesis, as well as explain some of the tension between regularity and irregularity in word forms and frequency.

“In intentional word learning from definitions, L2 words with easily encoded orthographic form are better retained. In incidental word learning, words with unusual form are more salient and more easily detected.”

I enjoyed especially the visualizations of phonological, orthographic, and semantic mapping and how they can develop at different rates and trajectories but with interdependence.

A couple of terms that are key to the ontogenesis model (the authors should perhaps come up with a catchier name):

Fuzziness: “inexact or ambiguous encoding of different components or dimensions of the lexical representation that can be caused by several linguistic, cognitive, and learning-induced factors. These factors include, among others, changes in neural plasticity, the complexity of mapping L2 semantic representations on the existing L1 semantic representations and of mapping L2 forms on the semantic representations, and problems with L2 phonological encoding”
Optimum: “the ultimate attainment of a representation (or its individual components), i.e., the highest level of its acquisition, when the representation is properly encoded and no longer fuzzy”

These concepts give us a way of visualizing, as per the graphs above, how different dimensions of a word may develop over time. Our goal, of course, is to reach optimum encoding across the sounds, spelling, and meaning so that it is anchored in our long-term memory (i.e. fluent, automatic access and retrieval).

“Each lexical entry can comprise representations from the three domains, and each representation is interconnected with other representations of the same type. Each domain representation can thus develop its own, idiosyncratic network of connections to other representations. Together they constitute the phonological, orthographic, and semantic networks in the mental lexicon.

“The model sees a word’s lexical integration as a gradual process, in which connections to other representations grow in number and strength until the optimum is potentially reached. The optimum in this dimension can be described as an adequately rich network of appropriate connections. Fuzziness in this dimension then refers primarily to an inadequate number of connections to other representations (typically too few) and/or to their inadequate strength (typically too weak), as well as inappropriate connections (e.g., an erroneous connection between the phonological forms of through and dough due to the influence of orthography).”

The added complexity of learning words in a new language is that there are variable interactions across phonological, orthographic, and semantic dimensions with our native language.

“Depending on the grapheme-phoneme relationship between the L1 and L2 and within L2, simultaneous acquisition of orthographic information may thus move the phonological representation closer to or further away from its optimum (and vice versa). Furthermore, the effect of L1 orthography on spoken word recognition in L2 is modulated by L2 proficiency and word familiarity

…a new L2 form representation is connected not only to other, previously established, L2 form representations, but also to L1 forms. The OM thus differentiates between two subnetworks within the form network: an IntraNetwork and an InterNetwork. The IntraNetwork refers to the connections between a given L2 form and other L2 forms, as discussed above. The InterNetwork refers to cross-language connections, i.e., the connections between a given L2 form and L1 forms.”

An interesting and insightful model! I look forward to seeing further studies drawing upon it.

#language #literacy #models #learning #phonology #secondlanguageacquisition #multilingual

Discuss...