Chunking Strategies for RAG: How to Split Your Documents

Why Chunking Is the Foundation of Good RAG

Before a retrieval-augmented generation system can answer a single question, you need to break your source documents into chunks. These chunks become the units your vector database stores and retrieves. Get the chunking wrong and your RAG system returns irrelevant context no matter how good your embeddings or language model are.

The core tension is this: small chunks give you precise retrieval but can lose surrounding context. Large chunks preserve context but introduce noise that confuses the model and wastes your context window. Your goal is to find the sweet spot for your specific document types and query patterns.

The Three Main Chunking Strategies

Fixed-size chunking is the simplest approach. You split every document into chunks of exactly N characters or tokens, with an optional overlap between adjacent chunks. It is fast, predictable, and easy to implement. The downside is that it ignores document structure entirely, so a chunk might start mid-sentence or cut a key idea in half.

Recursive character text splitting is the standard production approach. You define a list of separators in order of preference: double newline, single newline, period, space, then empty string. The splitter tries to break at paragraph boundaries first, then sentence boundaries, then word boundaries. This preserves semantic units far better than fixed-size splitting.

Semantic chunking is the most advanced strategy. Instead of splitting on character boundaries, you compute embeddings for each sentence and split where the embedding similarity drops below a threshold. This produces semantically coherent chunks but is slower and requires an embedding call during ingestion. Save it for corpora where paragraph structure is inconsistent or missing.

Chunk Size and Overlap: Finding the Right Balance

For most RAG applications, start with a chunk size between 256 and 1024 tokens. Shorter documents like FAQ entries or product descriptions work well at 256 to 512 tokens. Long-form content like legal documents or technical manuals often benefits from 512 to 1024 tokens. Set your overlap to 10 to 15 percent of your chunk size to ensure key context near chunk boundaries is not lost.

Your retrieval quality also depends on how many chunks you retrieve per query, the top-k parameter. A smaller chunk size usually means you need a higher k to surface enough context. A larger chunk size lets you use a smaller k but risks including irrelevant material. Run offline evaluations with different combinations before committing to a configuration in production.

Recursive Chunking in Practice

The code below shows a fixed-size baseline and the recursive approach side by side. Run them on the same document and compare. You will typically see recursive chunks ending at natural paragraph or sentence boundaries, while fixed chunks cut arbitrarily mid-thought.

pip install langchain-text-splitters

from langchain_text_splitters import (
    CharacterTextSplitter,
    RecursiveCharacterTextSplitter,
)

with open("your_document.txt") as f:
    raw_text = f.read()

# Fixed-size baseline
fixed_splitter = CharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=64,
    separator="",
)
fixed_chunks = fixed_splitter.split_text(raw_text)
print("Fixed chunks: {}".format(len(fixed_chunks)))

# Recursive splitting (recommended for production)
recursive_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=64,
    separators=["\n\n", "\n", ". ", " ", ""],
)
recursive_chunks = recursive_splitter.split_text(raw_text)
print("Recursive chunks: {}".format(len(recursive_chunks)))
print("First chunk preview: {}".format(recursive_chunks[0][:200]))

A chunk ending mid-sentence often has a misleading embedding because the semantic meaning is incomplete. Recursive splitting dramatically reduces this problem by respecting natural language boundaries. Once you have your chunks, attach metadata like the source filename and page number before inserting into your vector store so you can surface citations later.

Prompt

"I have a corpus of technical API documentation. Each page covers a single endpoint and runs about 2000 words. What chunk size and overlap would you recommend, and should I prepend the endpoint name to every chunk to improve retrieval quality?"

Want to build this live with Aki?

Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →

Chunking Strategies for RAG: How to Split Your Documents

Key Takeaways

Why Chunking Is the Foundation of Good RAG

The Three Main Chunking Strategies

Chunk Size and Overlap: Finding the Right Balance

Recursive Chunking in Practice

Want to build this live with Aki?

Aki Wijesundara

Ready to Launch Your AI Career?

Table of Contents

Share Article

Get Weekly AI Career Tips