Chances are, you’ve likely come across the term “large language models,” or LLMs, when discussions about generative AI occur. However, these aren’t exactly the same as well-known chatbots like ChatGPT, Google Gemini, Microsoft Copilot, Meta AI, and Anthropic’s Claude.
These AI chatbots can create impressive outputs, but they don’t truly comprehend the meaning of words as we do. Instead, they serve as the interface through which we engage with large language models. The technologies behind these models are trained to identify how words are used and which ones tend to occur together frequently, enabling them to predict future words, sentences, or paragraphs. Grasping how LLMs function is essential for understanding the workings of AI. As AI becomes more prevalent in our everyday online interactions, this is knowledge worth acquiring.
Here’s what you need to know about LLMs and their role in AI.
What is a language model?
A language model can be envisioned as a predictor of word patterns.
“A language model is something that attempts to forecast what language humans generate,” explained Mark Riedl, a professor at Georgia Tech School of Interactive Computing and an associate director at the Georgia Tech Machine Learning Center. “What characterizes a language model is its ability to predict future words based on previous ones.”
This forms the basis for autocomplete features when texting and also for AI chatbots.
What is a large language model?
A large language model encompasses extensive volumes of text from diverse sources. These models are quantified in terms of “parameters.”
So, what exactly is a parameter?
Large language models utilize neural networks, which are machine learning systems that accept input and execute mathematical computations to generate an output. The variables involved in these calculations are known as parameters. A large language model may contain 1 billion parameters or even more.
“We recognize they are large when they can generate coherent, fluid paragraphs,” Riedl stated.
How do large language models learn?
LLMs acquire knowledge through a fundamental AI technique called deep learning.
“It’s quite similar to the way a child is taught—you present numerous examples,” noted Jason Alan Snyder, global CTO at ad agency Momentum Worldwide.
In simpler terms, you provide the LLM with a wide range of content (termed training data), such as books, articles, code, and social media updates, to assist it in understanding the usage of words across various contexts, as well as the subtle nuances in language. The methods of data collection and training employed by AI companies have sparked some controversy and lawsuits. Publishers like The New York Times and creators of copyrighted content are claiming that tech firms have utilized their copyrighted material without obtaining proper permission.
(Disclosure: Ziff Davis, CNET’s parent organization, filed a lawsuit against OpenAI in April, claiming the company violated Ziff Davis’ copyrights during the training and operation of its AI systems.)
AI models process far more than any person could read in their lifetime—on the order of trillions of tokens. Tokens help AI models dissect and analyze text. You might think of an AI model as a reader in need of assistance. The model deconstructs a sentence into smaller segments, or tokens—which equate to four characters in English, or about three-quarters of a word—allowing it to grasp each component and the overall message.
Subsequently, the LLM can evaluate how words relate to one another and ascertain which words commonly co-occur.
“It’s akin to constructing a vast map of word connections,” Snyder remarked. “Then it begins to engage in this fascinating activity, predicting what the next word might be… and it checks its prediction against the actual word in the data, adjusting the internal map based on its accuracy.”
This prediction and adjustment occur billions of times, allowing the LLM to constantly enhance its grasp of language, improve its pattern recognition, and forecast subsequent words. It can even assimilate concepts and information from the data to answer questions, produce creative text formats, and translate languages. However, they don’t comprehend the meaning of words in the way we do—they solely recognize statistical relationships.
LLMs also refine their responses through reinforcement learning based on human feedback.
“You receive a judgment or preference from humans regarding which response was preferable given the input it received,” mentioned Maarten Sap, an assistant professor at Carnegie Mellon University’s Language Technologies Institute. “From there, you can instruct the model on how to enhance its responses.”
What functions do large language models perform?
When given a sequence of words, an LLM will forecast the subsequent word in that sequence.
For instance, take the phrase, “I went sailing on the deep blue…”
Most individuals would likely suggest “sea” since sailing, deep, and blue are all terms commonly associated with the sea. In other words, each word provides context for the next.
“These large language models, due to their numerous parameters, can retain a wide variety of patterns,” Riedl stated. “They excel at identifying these cues and making very accurate predictions about what follows.”
What types of language models exist?
There are several subcategories you may have heard of, such as small, reasoning, and open-source/open-weights models. Some of these models are multimodal, meaning they are trained not only on text but also include images, video, and audio. Although all these are still language models performing analogous functions, there are essential distinctions to be aware of.
Are small language models a reality?
Absolutely. Companies like Microsoft have developed smaller models that are intended to function “on device” without necessitating the same computational resources as an LLM, yet still empower users to harness the capabilities of generative AI.
What are AI reasoning models?
Reasoning models are a type of LLM. These models provide insight into a chatbot’s thought process while addressing your inquiries. You might have encountered this if you’ve used DeepSeek, a Chinese AI chatbot.
What about models that are open-source and open-weights?
They are still LLMs! These models aim to be more transparent in their operations. Open-source models allow anyone to view their construction and are generally available for anyone to adapt and create their variants. Open-weights models offer some visibility into how the model prioritizes specific traits when making decisions.
What strengths do large language models possess?
LLMs excel at discerning the relationships between words and generating text that feels natural.
“They receive an input, which may often be a set of commands like ‘Do this for me,’ or ‘Explain this,’ or ‘Summarize this,’ and are capable of extracting those patterns from the input to produce a lengthy, cohesive response,” Riedl remarked.
Where do large language models face challenges?
Firstly, they are poor at providing accurate information. In fact, they sometimes fabricate details that appear plausible, such as when ChatGPT referenced six fictitious court cases in a legal document or when Google’s Bard (the predecessor to Gemini) incorrectly attributed the first images of an exoplanet to the James Webb Space Telescope. These occurrences are termed hallucinations.
“They are exceedingly unreliable in that they often confabulate and create falsehoods,” Sap explained. “They are neither trained nor designed to generate truthful outputs.”
They also struggle with requests that differ significantly from anything they have encountered previously. This is due to their focus on recognizing and responding to established patterns.
A prime illustration is a math problem with an unusual set of numbers.
“It might fail to perform that calculation accurately because it isn’t truly solving math,” Riedl noted. “It attempts to link your math question to prior examples of math questions it has previously seen.”
While they excel in predicting words, they are not effective at forecasting future events, which include planning and decision-making.
“The concept of planning in a way akin to humans, by considering various contingencies and alternatives and making choices, appears to be a considerable obstacle for our current large language models,” Riedl added.
Finally, they have difficulties with current events since their training data usually extends only up to a certain point, rendering anything that occurs afterward outside their knowledge base. Given their inability to differentiate between fact and likelihood, they can confidently present incorrect information about recent events.
They also engage with the world differently than we do.
“This creates challenges for them in understanding the subtleties and intricacies of current events, which often requires an awareness of context, social dynamics, and real-world implications,” Snyder stated.
How are LLMs incorporated with search engines?
We are observing the development of retrieval capabilities that extend beyond the training of the models, including linking with search engines like Google so the models can perform web searches and incorporate those findings into the LLM. This enables them to better comprehend queries and provide more timely responses.
“This ensures our linkage models remain current and relevant, as they can actually access new information online and integrate it,” Riedl noted.
That was the intention, for example, some time ago with AI-driven Bing. Rather than leveraging search engines to enhance its answers, Microsoft aimed to improve its own search engine by more effectively interpreting the true intent behind user queries and better ranking the results for these queries. Last November, OpenAI launched ChatGPT Search, which included access to information from select news publishers.
However, there are limitations. Utilizing web search could worsen hallucinations if sufficient fact-checking processes are not established. Additionally, LLMs would have to develop the ability to evaluate the credibility of online sources prior to referencing them. Google experienced this reality with the flawed initial launch of its AI Overviews search results. The search company later refined its AI Overviews outputs to minimize misleading or potentially harmful summaries. Nevertheless, even recent findings indicate that AI Overviews struggle to reliably provide the correct year.