RAG Chunking and the Architecture of Information

By Dan Taylor, Partner & Head of Innovation
July 25, 2025

Working out how to optimize your content for LLMs?

LLMs group related concepts together into a single chunk. This process is called chunking and it breaks content into small, distinct units of information (or ‘chunks’), as opposed to presenting information as an undifferentiated mess. Then LLMs can quickly identify the benefit to the target reader.

So, imagine if you had a tool that could visualize how your content is chunked by LLMs.

That’s what our search engineers did.

They built a plug-in that gives you a blueprint for how LLMs read your content.

This is a hot topic because organisations that are aware of how information chunking works are better positioned to benefit from AI search.

This article reframes chunking not as a technical implementation detail, but as a lens through which we can understand the future of search and content structure.

Chunking in AI

As Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems become core components of AI-driven search and digital experiences, the concept of “chunking” has emerged as a mainstream concept for anyone in the business of organising and retrieving information.

Understanding how chunking works offers critical insights into how generative systems retrieve, reason, and assemble information.

Read on for detailed insight and examples into how SEOs can implement chunking strategies in their search and content initiatives.

RAG reasoning

In modern implementations of RAG, chunking isn’t just about breaking content into manageable blocks, it’s about structuring knowledge in a way that enables reasoning.

Recent advances in systems like Search-o1, PIKE-RAG, and RAGFlow demonstrate how chunking affects what information is retrieved and how LLMs reason through it.

RAG-enhanced reasoning workflows typically:

Decompose a query into sub-questions
Retrieve relevant chunks for each sub-question
Build a reasoning chain by iterating over these chunks

This means the shape and granularity of your chunks determine what kind of reasoning steps the model can take. Poor chunking can:

Obscure important relationships
Cause the model to miss key details
Force reasoning steps to be overly shallow or disconnected

Good chunking, by contrast, enables iterative inference. Each chunk retrieved becomes the foundation for the next question, the next hop, the next hypothesis – potentially the next Query Fan-Out.

Think of chunking as the layout of a workspace. It’s clean, legible, and logically organised, reasoning flows more naturally.

What is chunking in a RAG system?

In Retrieval-Augmented Generation, chunking refers to the preprocessing step where documents are divided into smaller, meaningful pieces (chunks) that can be embedded, stored in a vector database, and later retrieved in response to a query.

Each chunk is treated as a retrievable unit of context.

The quality of these chunks, their boundaries, semantic coherence, and metadata, directly influences the performance of retrieval and, by extension, the relevance and clarity of the generated response.

Why chunking matters for information retrieval

From an information retrieval (IR) perspective, chunking is about trade-offs:

Recall vs. precision: Larger chunks contain more context but may introduce noise. Smaller chunks are precise but might miss supporting context.
Indexing efficiency: Uniform chunk sizes simplify vector database storage and lookup.
Semantic coherence: Semantically meaningful chunks yield higher relevance in retrieval.

Well-chunked data is easier to retrieve accurately, reason over effectively, and rank meaningfully. Poor chunking leads to retrieval of irrelevant, redundant, or incomplete context.

Core chunking strategies (and IR implications)

Modern RAG implementations use a wide range of chunking strategies. Each brings its own assumptions about structure, meaning, and retrievability.

Here are 11 primary methods used for chunking.

1. Fixed-length chunking

Divides text by token or character count, often with overlap.

from langchain_text_splitters import CharacterTextSplitter 
splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50) 
chunks = splitter.split_text(document)

Strength: Simple and efficient with a predictable chunk size
Weakness: May break semantic units mid-sentence or mid-paragraph

2. Sentence-based chunking

Splits content at sentence boundaries using NLP tools.

import nltk 
nltk.download('punkt') 
sentences = nltk.sent_tokenize(document)

Strength: Maintains coherent, atomic units of meaning
Weakness: May lack broader context for complex answers

3. Paragraph-based chunking

Divides by paragraphs, typically suitable for editorial content.

paragraphs = document.split("\n\n")

Strength: Captures self-contained ideas or arguments
Weakness: Paragraph lengths may be highly variable

4. Sliding window chunking

Creates overlapping chunks with a fixed stride to preserve continuity.

def sliding_window(tokens, window_size=500, stride=100): 
    return [tokens[i:i + window_size] for i in range(0, len(tokens), stride)]

Strength: Maintains context across boundaries
Weakness: Can introduce redundancy and increases indexing size

5. Semantic chunking

Uses embeddings or ML to split text based on topical coherence.

from langchain_text_splitters import RecursiveCharacterTextSplitter 
splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n", ". "], chunk_size=500, chunk_overlap=100) 
chunks = splitter.split_text(document)

Strength: Tailors chunks to meaning, not format
Weakness: Computationally expensive

6. Recursive chunking

Applies hierarchical splitting, for example:

section → paragraph → sentence → tokens 
def recursive_split(text, max_tokens): 
    if len(text.split()) <= max_tokens: 
        return [text] 
    parts = text.split("\n\n") 
    if len(parts) == 1: 
        parts = text.split(". ") 
    chunks = [] 
    for part in parts: 
        chunks.extend(recursive_split(part, max_tokens)) 
    return chunks

Strength: Preserves document logic
Weakness: Requires fallbacks and good structure recognition

7. Context-enriched chunking

Appends summaries or adjacent information to each chunk to increase context.

def add_contextual_metadata(chunks): 
    return [ 
        f"Context: {chunks[i-1] if i > 0 else ''}\n\nContent: {chunk}" 
        for i, chunk in enumerate(chunks) 
    ]

Strength: Improves continuity across chunks
Weakness: Increases chunk size and retrieval complexity

8. Agentic chunking

Uses LLMs to dynamically decide chunk boundaries based on content understanding.

# Pseudocode for LLM-assisted chunking 
llm_prompt = f"Break this document into meaningful segments: {document}" 
chunks = llm.call(llm_prompt)

Strength: Adaptive to nuance and structure
Weakness: High inference cost, not deterministic

9. Subdocument chunking

Combines chunks with high-level summaries of their source document.

summarized_chunks = [ 
    f"Summary: {summary}\n\nContent: {chunk}" for chunk, summary in zip(chunks, summaries) 
]

Strength: Supports multi-level reasoning and hierarchy
Weakness: More metadata management required

10. Hybrid chunking

Combines multiple chunking methods (e.g., semantic + sliding window) for tailored pipelines.

Strength: Flexible and customisable
Weakness: Higher engineering overhead

11. Modality-specific chunking

Applies different rules for tables, code, images, and plain text.

def modality_chunk(doc): 
    if "```" in doc: 
        return chunk_code(doc) 
    elif "|" in doc: 
        return [doc]  # Table 
    else: 
        return semantic_chunk(doc)

Strength: Improves handling of mixed-format content
Weakness: Requires dedicated logic per modality

Content chunking strategy suitability by content type

Different chunking mechanisms are better suited to specific types of content.

Content Type	Potentially Most Effective Strategy
Long-form articles	Semantic, Paragraph, Recursive
Short blog posts	Sentence, Fixed-Length
Product pages	Semantic, Context-Enriched, Subdocument
Lead-gen landing pages	Hybrid, Context-Enriched, Semantic
Technical documentation	Recursive, Modality-Specific, Semantic
Code & API references	Modality-Specific, Recursive
FAQs & How-tos	Sentence, Context-Enriched

Chunking in context: Reasoning, Metadata, and retrieval logic

Chunking is no longer just a method of segmentation, it’s a way to prepare information for inference and iterative reasoning.

Modern RAG systems are often expected to go beyond “retrieve and answer.”

They must synthesise multiple sources, infer relationships, and identify missing context.

Chunk granularity affects reasoning depth: Overly fine chunks increase retrieval volume but dilute semantic continuity. Overly coarse chunks may include irrelevant material.

Chunk order and structure influence retrieval sequences: Hierarchically organised chunks (e.g., via recursive or semantic chunking) enable stepwise traversal across a domain.

Metadata is retrieval logic: It’s not just labels. Metadata fields like chunk_type, retrieval_score, or anticipated_question drive filtering, scoring, and follow-up queries.

RAG as a reasoning engine: In agentic systems like PIKE-RAG or Search-o1, chunk traversal mirrors a reasoning path, each chunk retrieved potentially triggers the next sub-question or inference.

In this way, chunking determines what is thinkable.

A well-chunked corpus (the complete collection of documents or content that has been chunked and prepared for retrieval by the RAG system) allows the model to “walk” through a logical structure, discover gaps, iterate toward specificity, or gracefully end reasoning.

For SEOs and information architects, this means we’re not just trying to make content crawlable, but making it navigable in a reasoning graph.

How SEOs use content chunking to their advantage

While SEOs aren’t building RAG pipelines, we are in the business of structuring information for discoverability and interpretation.

From an information retrieval perspective, these chunking insights suggest a few key lessons.

Structure content for semantic retrieval

Modern retrieval depends on clear boundaries such as sections, lists, headings, and FAQs. Align your HTML and content format to create retrievable units of meaning.

Design content as pre-chunks

Think of each section as a self-contained answer candidate. This mirrors how vector stores handle pre-chunked, embedded units. Use semantic boundaries that align with user intent.

Focus on meaningful metadata

Treat structured data not just as SEO markup, but as IR metadata. It enhances retrievability in LLMs. Section-level tags, question labels, and timestamps improve downstream relevance.

Model the document as a retrieval graph

Ask yourself, “How will a system navigate this content?”

Will it traverse headings like a tree? Will it jump between FAQs like nodes? Design internal linking, hierarchy, and structure with this mental model.

Prepare for reasoning systems, not just search engines

Future retrieval systems will reason, not just match. Chunking is your chance to guide the path of that reasoning. Consider:

What gets retrieved?
What follows what?
When is the answer complete?

TLDR

Start with semantic splitting: Use document structure, like headings, FAQs, and paragraphs as the foundation for chunking.
Keep chunk sizes manageable: Aim for 300–800 tokens as a general guideline to stay within LLM and embedding context windows, unless you have specific LLMs and parameters to target.
Apply recursive splitting where necessary: If a section is too large, break it down progressively, such as paragraph → sentence → tokens.
Use overlap to preserve continuity: A 10–20% token overlap between adjacent chunks helps maintain coherence in most LLMs.
Add meaningful metadata: Tag chunks with helpful fields such as titles, section names, or timestamps to support downstream retrieval logic.
Choose simple, portable formats: Use JSON or JSON-LD to structure and transmit chunk metadata.

Chunking is cognitive design

Chunking is a theory of how we package and transmit meaning. For RAG systems, it’s the interface between raw data and intelligent response.

For SEOs, content chunking is a call to move beyond headlines and keywords, toward designing documents that align with how modern systems actually retrieve and reason. Whether you’re structuring a knowledge base, a product catalogue, or a lead gen landing page, chunking reminds us that clarity is retrieval power.

We’re not just optimi s ing for Search algorithms anymore. SEOs who embrace content chunking stand to have an outsized advantage in AI search.

Find out more about the benefits of AI search for your business

No pressure. No obligation. Just 20 minutes of actionable insight.

On your free video call, we’ll cover:

Your business and marketing goals
How our SEO approach reduces reliance on paid ads
How we de-risk your brand visibility with future-ready strategy
How AI search can give you a competitive edge

At the end of the call, you’ll know exactly what to do next whether or not you work with us. No sales. Just value and options.

Arrange discovery session

Email: [email protected] (behind “Arrange Discovery Session”)