What is Large Language Model (LLM)?
A deep learning model trained on vast text data that can understand and generate human language across a broad range of tasks.
Definition
A Large Language Model (LLM) is a neural network with billions of parameters trained on massive text datasets to predict the next token in a sequence. Through this training process, LLMs develop surprisingly general capabilities: they can write code, answer questions, summarize documents, translate languages, reason through problems, and converse naturally. Claude, GPT-4, Gemini, and Llama are all LLMs. They are the foundation of the current AI revolution.
Why it matters
LLMs are the core technology underlying every AI product discussed in this site and most of the industry. Understanding what they are, how they work, their capabilities and limitations, enables better product decisions, more effective prompting, and more realistic expectations about what AI can and cannot do.
How it works
LLMs are transformer-based neural networks trained via next-token prediction on trillions of tokens of text (web pages, books, code, scientific papers). During training, the model learns representations of language so rich that general reasoning capabilities emerge. At inference time, the model generates text one token at a time, sampling from a probability distribution over the vocabulary.
Examples in practice
Claude, GPT-4, Gemini
The most capable frontier LLMs, used as the reasoning core in AI products, coding agents, enterprise chatbots, and research tools.
Llama 3, Mistral
Open-weight LLMs that can be run on your own infrastructure. Used by teams with data sovereignty requirements or very high inference volumes where API costs are prohibitive.
