Skip to main content
Chunking divides content into smaller pieces before embedding and storing in a vector database. The strategy you choose affects search quality and retrieval accuracy.
from agno.knowledge.chunking.semantic_chunking import SemanticChunking
from agno.knowledge.reader.pdf_reader import PDFReader

reader = PDFReader(
    chunking_strategy=SemanticChunking(),
)

Why Chunking Matters

Consider processing a recipe book with different strategies:
StrategyResult
Fixed Size (5000 chars)May split recipes mid-instruction
SemanticKeeps complete recipes together based on meaning
DocumentEach page becomes a chunk
The right strategy returns complete, relevant results. The wrong one returns fragments.

Available Strategies

Using with Readers

Pass a chunking strategy to any reader:
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.chunking.fixed_size_chunking import FixedSizeChunking
from agno.knowledge.reader.pdf_reader import PDFReader
from agno.vectordb.pgvector import PgVector

reader = PDFReader(
    chunking_strategy=FixedSizeChunking(chunk_size=3000),
)

knowledge = Knowledge(
    vector_db=PgVector(table_name="docs", db_url=db_url),
)

knowledge.insert(path="documents/", reader=reader)

Choosing a Strategy

Content TypeRecommended StrategyWhy
General textSemanticMaintains meaning and context
Structured docsDocumentPreserves sections and hierarchy
Markdown filesMarkdownRespects heading structure
CSV/tabular dataCSV RowEach row is a logical unit
Source codeCodeSplits at function and class boundaries
Mixed contentRecursiveHandles multiple separator types
Need consistencyFixed SizePredictable chunk dimensions
Each reader has a sensible default, but you can override it based on your content and retrieval needs.

Configuration

Most strategies accept configuration options:
# Fixed size with overlap
FixedSizeChunking(
    chunk_size=5000,       # Characters per chunk
    overlap=200,           # Overlap between chunks
)

# Semantic with threshold
SemanticChunking(
    similarity_threshold=0.7,  # Lower = more splits
)

# Recursive with custom separators
RecursiveChunking(
    separators=["\n\n", "\n", ". ", " "],
    chunk_size=4000,
)

Chunk Size Guidelines

Chunk SizeTrade-off
Small (1000-3000 chars)More precise retrieval, may lose context
Default (5000 chars)Balanced precision and context
Large (8000+ chars)More context, less targeted results
Smaller chunks work better for specific questions. Larger chunks work better when context matters.

Next Steps