What are LLMs?
Given the explosion of interest in language models that OpenAI has unleashed upon the world, I keep getting the same questions from many of my friends – and indeed from random people I talk to: What is an LLM?
To reach more of you, here is how I would describe it to a lay person (not schooled in machine learning):
LLM = Large Language Model.
LLMs are machine learning models that have been trained to respond to human queries in a natural language (that is you can just ask it questions as if talking to a human, in contrast to learning special query languages). And to reply in natural language as well (that is, the model will generate replies that sound as if a human had written them).
It is large because these models have been trained on huge amounts of data and have many billions of parameters. You can think of a parameter like a single neuron in a brain although that’s not quite accurate). Here are the the parameters for the various GPT models:
GPT-1: 117 million
GPT-2: 1.5 billion
GPT-3: 175 billion
GPT-4: 170 trillion
What is GPT?
GPT = Generative Pre-Trained Transformer.
Generative
: First you teach the model, then you see if the model can generate things like it’s been taught. Imagine teaching a child to draw a hand, then you test the child to see if it can generate a hand on paper by itself.
Pre-Trained
: These models are trained once to set all the parameters (fix all the neurons in place). Then it is used. So no new information can modify the pre-trained parameters.
Transformer
: The relationship between stuff you show to the model such as a sequence of words, are transformed to an internal model that is machine usable. It is almost like a machine-readable look-up table.
Imagine we have a black and white photograph of a baby. A computer can’t “see” the photograph in the way that we do (since internally it’s all electrical bits and bytes). So to create a computer model we can do the following:
We could digitise the photograph into pixels, where each pixel has value from 0-255 depending on how bright (255) or dark (0) it is. Now let’s feed the photograph into the computer model by telling it the value of each pixel, pixel by pixel, starting from the top left (when we get to the right edge, we start from the first pixel the next row down) to the bottom right.
We append the word “baby” next to this series of numbers. So we have transformed
the photo of the baby into a digital representation, and linked it with a name (“baby”) that the computer can use. A transformer therefore is a system that takes in such complex data and converts it into some representation that we can then use for further computation.
This “representation” that we have created is called an embedding because in neural networks, it is embedded in the machine learning model, such as an LLM
.