“The success of large language models is the biggest surprise in my intellectual life. We learned that a lot of what we used to believe may be false and what I used to believe may be false. I used to really accept, to a large degree, the Chomskyan argument that the structures of language are too complex and not manifest in input so that you need to have innate machinery to learn them. You need to have a language module or language instinct, and it’s impossible to learn them simply by observing statistics in the environment.
If it’s true — and I think it is true — that the LLMs learn language through statistical analysis, this shows the Chomskyan view is wrong. This shows that, at least in theory, it’s possible to learn languages just by observing a billion tokens of language.”
In a previous series, “Innate vs. Developed,” we’ve also challenged the idea that language is entirely hardwired in our brains, highlighting the tension between our more recent linguistic innovations and our more ancient brain structures. Cormac McCarthy, the famed author of some of the most powerful literature ever written, did some fascinating pontificating on this very issue.
In this post, we’ll continue picking away at these tensions, considering implications for AI and LLMs.
“Over cultural evolution, the human species was so pressured for increased information capacity that they invented writing, a revolutionary leap forward in the development of our species that enables information capacity to be externalized, frees up internal processing and affords the development of more complex concepts. In other words, writing enabled humans to think more abstractly and logically by increasing information capacity. Today, humans have gone to even greater lengths: the Internet, computers and smartphones are testaments to the substantial pressure humans currently face — and probably faced in the past — to increase information capacity.”
According to the perspectives of the authors in the paper quoted above, the capacity to process and manage vast quantities of information is a defining characteristic of human intelligence. This ability has been extended over time through the development of tools and techniques for externalizing information, such as via language, writing, and digital technology. These advancements have, in turn, allowed for increasingly abstract and complex thought and technologies.
The paper by Jessica Cantlon & Steven Piantadosi further proposes that the power of scaling is what lies behind human intelligence, and that this power of scaling is what further lies behind the remarkable results achieved by artificial neural networks in areas such as speech recognition, LLMs, and computer vision, and that these accomplishments have not been achieved through specialized representations and domain-specific development, but rather through the use of simpler techniques combined with increased computational power and data capacity.
Regularity and irregularity. Decodable and tricky words. Learnability and surprisal. Predictability and randomness. Low entropy and high entropy.
Why do such tensions exist in human language? And in our AI tools developed to both create code and use natural language, how can the precision required for computation co-exist alongside this necessary complexity and messiness of our human language?
”. . . the fact, as suggested by these findings, that semantic properties can be extracted from the formal manipulation of pure syntactic properties – that meaning can emerge from pure form – is undoubtedly one of the most stimulating ideas of our time.”
In our last post, we began exploring what Large Language Models (LLMs) and their uncanny abilities might tell us about language itself. I posited that the power of LLMs stems from the statistical nature of language.
“Semantic gradients,” are a tool used by teachers to broaden and deepen students' understanding of related words by plotting them in relation to one another. They often begin with antonyms at each end of the continuum. Here are two basic examples:
Now imagine taking this approach and quantifying the relationships between words by adding numbers to the line graph. Now imagine adding another axis to this graph, so that words are plotted in a three dimensional space in their relationships. Then add another dimension, and another . . . heck, make it tens of thousands more dimensions, relating all the words available in your lexicon across a high dimensional space. . .
. . . and you may begin to envision one of the fundamental powers of Large Language Models (LLMs).
Thanks to a podcast, Emerging Research in Educational Psychology, from professor Jeff Greene speaking with professor Erika Patall about a meta-analysis she was the lead author on, I learned about her paper that looked across a large number of studies to synthesize findings on the impact of classroom structure. I thought some of the high-level takeaways were well worth highlighting with you for our 4th research highlight in this series!
Citation: Patall, E. A., Yates, N., Lee, J., Chen, M., Bhat, B. H., Lee, K., Beretvas, S. N., Lin, S., Man Yang, S., Jacobson, N. G., Harris, E., & Hanson, D. J. (2024). A meta-analysis of teachers’ provision of structure in the classroom and students’ academic competence beliefs, engagement, and achievement. Educational Psychologist, 59(1), 42–70. https://doi.org/10.1080/00461520.2023.2274104
I think it’s no surprise to most educators that providing structure for kids, both in terms of the classroom environment and culture, and in terms of the design of instructional tasks, is critical to improving student learning. Part of this work is what we often term “classroom management,” but as the paper describes, the work is far more encompassing than that:
“In sum, creating structure is a multifaceted endeavor that involves a diverse assortment of teacher practices that can be used independently or in various combinations, as well as to various extents, and are all intended to organize and guide students’ school-relevant behavior in the process of learning in the classroom.”
Paper Citation: Philip Capin, Sharon Vaughn, Joseph E. Miller, Jeremy Miciak, Anna-Mari Fall, Greg Roberts, Eunsoo Cho, Amy E. Barth, Paul K. Steinle & Jack M. Fletcher (2023) Investigating the Reading Profiles of Middle School Emergent Bilinguals with Significant Reading Comprehension Difficulties, Scientific Studies of Reading, DOI: 10.1080/10888438.2023.2254871
A few months ago, a study crossed my radar that caused me to stop, print it out, mark it up, and then begin digging into related studies, which is what I do when a study grabs my attention.
Getting into research is akin to getting into Miles Davis—if you like a given song or album, you may start checking out the other musicians he plays with, and they'll lead you into a new and ever expanding fractal universe, because Davis had a knack for collaborating with musicians who were geniuses in their own right. A few examples: John Coltrane, Tony Williams, Keith Jarrett, Herbie Hancock, John McLaughlin, Wayne Shorter, Jack DeJohnette, the list goes on and on.
In a previous post, Thinking Inside and Outside of Language, we channelled Cormac McCarthy and explored the tension between language and cognition. We dug in even further and considered Plato's long ago fears of the deceptive and distancing power of written language in Speaking Ourselves into Being and Others into Silence: The Power of Language, and how bringing a critical consciousness to our use of language could temper unconscious biases and power dynamics.
If you find any of that interesting, I recommend reading this short interview, How to Quiet Your Mind Chatter in Nautilus Magazine with Ethan Kross, an experimental psychologist and neuroscientist at the University of Michigan.
Two relevant quotes:
“What we’ve learned is that language provides us with a tool for coaching ourselves through our problems like we were talking to another person. It involves using your name and other non-first person pronouns, like “you” or “he” or “she.” That’s distanced self-talk.”
“The message behind mindfulness is sometimes taken too far in the sense of 'you should always be in the moment.' The human mind didn’t evolve to always be in the moment, and we can derive enormous benefit from traveling in time, thinking about the past and future.”
This has been a great year for education research. I thought it could be fun to review some of what has come across my own limited radar over the course of 2023.
The method I used to create this wrap-up was to go back through my Twitter timeline starting in January, and pull all research related tweets into a doc. I then began sorting those by theme and ended up with several high-level buckets, with further sub-themes within and across those buckets. Note that I didn’t also go through my Mastodon nor Bluesky feeds, as this was time-consuming enough!
The rough big ticket research items I ended up with were:
Multilinguals and multilingualism
Reading
Morphology
The influence of physical or cultural environment
The content of teaching and learning
The precedence of academic skills over soft skills