RAG addresses a core limitation of large language models: they only know what was in their training data. For domains that change frequently — support docs, internal knowledge bases, market data — that's insufficient. RAG solves this by querying external sources first, then passing the retrieved context to the model as additional input.
The typical flow: a user asks a question; a retrieval system (often a vector database) finds relevant documents; those documents are concatenated into the prompt; the LLM generates an answer grounded in that context. Chunking strategy, embedding model choice, and retrieval ranking all affect quality. When done well, RAG produces accurate, citeable answers without retraining the model.