The Language of Language Machines

LLM (Large Language Model)

Steward Lachlan Kermode
Editor Lachlan Kermode

An LLM, or Large Language Model, is the name given to the kind of machine learning architecture that powers conversational interfaces like ChatGPT, Claude, and Gemini.

Why distinguish large language models from (regularly sized) language models? The conventional wisdom is that the qualifier ‘large’ refers both to the enormous amount of text on which such systems are trained—which comprise hundreds of billions of words from books, websites, and other sources—and also to the massive number of parameters that they contain. These parameters are sometimes represented as ‘neurons’ (see neural net), as the original motivation for their technical mechanism was inspired by biological neurons [1].

The core neural net architecture of a ‘language model’, then, has been around since the 1960s. But the acronym LLM only really entered the scientific and software engineering vocabulary in 2017, when the growing adoption of the transformer architecture began to render the language generated by computational models intelligible and even insightful, at times. This is also, perhaps, why we distinguish large language models from language models: there is a widespread sense that they only began to ‘work’—that is, generate language that was convincing in a new register than previous attempts— once the architectures and training data both became large enough.

It became more common to toss around the terminology of LLMs in the layman’s lexicon around late 2022, when ChatGPT was launched to the public, giving the transformer architecture a stage and a spotlight. LLMs are to ChatGPT as search engines are to Google: though the latter term is actually a specific company’s product, it has become a stand-in for the technical architecture that underlies it on account of being the main vector of that architecture’s popularization. In the 2020s, LLM is also the name of the machine learning architecture to which one typically refers when one speaks about ‘AI’, a phraseology that tends to index the ‘latest and greatest’ technique in computing automation at any given historical moment.

In addition to the transformative experience of automatic language generation associated with ChatGPT, the architectural particulars of LLMs are also associated—though usually not entirely accurately—with several recent scientific and technical breakthroughs. The strong association that LLMs have to these breakthroughs give them an outsized gravitas in the current discourse on the social disruption that AI and AGI are augured to effect. We limit ourselves to two of the most prominent applicative (mis)associations or, if you prefer, apparitions of LLMs here.

Google Deepmind’s AlphaFold [2], a neural net that can predict protein folding structure with a level of accuracy not previously possible, is perhaps most important apparition of an LLM, as its developers were awarded the Nobel Prize for Chemistry in 2024 [3]. A case in point, AlphaFold’s technical architecture is often glossed rather imprecisely as ‘AI’, and is often miscredited as an application of LLMs to the domain of scientific-qua-chemical data. Though AlphaFold is a neural net that shares some architectural features with LLMs such as attention, it does not operate over ‘language’ (represented as linear chains of tokens), and is therefore not really a Large Language Model.

AI-generated images and video such as Stable Diffusion [4] and OpenAI’s SORA [5]. Analogously to AlphaFold, Stable Diffusion shares architectural features such as attention with LLMs, but also employs techniques from computer vision research in the 2010s (the previous ‘wave’ of techniques temporarily dubbed ‘artificial intelligence’) such as the UNet architecture [6], an approach originally designed to segment sections of images in biomedical applications. SORA, too, is not an LLM, but a recombination of architectural insights in neural net construction across various domains. It is the multimodality of these models—their ability to generate images or video from textual prompts—that is probably what accounts for their common confusion with LLMs.

Bibliography