| Virgil | Lachlan Kermode |
| Editor | Lachlan Kermode |
An LLM, or Large Language Model, is the name given to the kind of machine learning architecture that powers conversational interfaces like ChatGPT, Claude, and Gemini.
Why distinguish large language models from (regularly sized) language models? The conventional wisdom is that the qualifier ‘large’ refers both to the enormous amount of text on which such systems are trained—which comprise hundreds of billions of words from books, websites, and other sources—and also to the massive number of parameters that they contain. These parameters are sometimes represented as ‘neurons’ (see neural net), as the original motivation for their technical mechanism was inspired by biological neurons [1].
The core neural net architecture of a ‘language model’, then, has been around since the 1960s. But the acronym LLM only really entered the scientific and software engineering vocabulary in 2017, when the growing adoption of the transformer architecture began to render the language generated by computational models intelligible and even insightful, at times. This is also, perhaps, why we distinguish large language models from language models: there is a widespread sense that they only began to ‘work’—that is, generate language that was convincing in a new register than previous attempts— once the architectures and training data both became large enough.
It became more common to toss around the terminology of LLMs in the layman’s lexicon around late 2022, when ChatGPT was launched to the public, giving the transformer architecture a stage and a spotlight…
[WIP]