“The success of large language models is the biggest surprise in my intellectual life. We learned that a lot of what we used to believe may be false and what I used to believe may be false. I used to really accept, to a large degree, the Chomskyan argument that the structures of language are too complex and not manifest in input so that you need to have innate machinery to learn them. You need to have a language module or language instinct, and it’s impossible to learn them simply by observing statistics in the environment.
If it’s true — and I think it is true — that the LLMs learn language through statistical analysis, this shows the Chomskyan view is wrong. This shows that, at least in theory, it’s possible to learn languages just by observing a billion tokens of language.”
We know that the explicit teaching of unfamiliar words that students will encounter in written text is important. But what about the language that is used by teachers throughout the school day? What implicit learning opportunities are constrained or afforded through the model of the language that a teacher uses while teaching, and what are the impacts on student learning?
We recently examined Phillip Gough and Michael Hillinger’s 1980 paper, Learning to Read: An Unnatural Act, in which they made a neat analogy of learning to decode an alphabetic writing system to cryptanalysis. As a part of this cryptanalysis, children aren’t simply learning to decode, but more precisely, learning to decipher the written code. This distinction highlights that learning to read in English is not driven by paired-associative learning, but rather by internalizing an algorithm, a statistical, systematic, quasi-regular mapping.
This point is a sharp one because what they were saying is that we can’t teach such a cipher directly. We can’t just hand a kid the codebook.
So when I saw a reference recently to another Gough paper called Reading, spelling, and the orthographic cipher, co-written in 1992 with Connie Juel and Priscilla Griffith, I knew I needed to read this one, too.
The first thing that happened to reading is writing. For most of our history, humans have been able to speak but not read. Writing is a human creation, the first information technology, as much an invention as the telephone or computer.
—Mark Seidenberg, Language at the Speed of Sight
What is (un)natural about learning to read and write? We began our quest with this question, prompted by two references in a line in a David Share paper.
Like learning to read (English) which Gough famously dubbed “unnatural” [43], see also [3], becoming aware of the constituent phonemes in spoken words does not come “naturally”.
—Share, D. L. (2021). Common Misconceptions about the Phonological Deficit Theory of Dyslexia. Brain Sciences, 11(11), 1510.
This led us to unpack three foundational papers from 1976 to 1992 that have provided us with some surprising twists and turns and even moments, dare I say, of clarity.