Why Chunking Matters
Consider processing a recipe book with different strategies:| Strategy | Result |
|---|---|
| Fixed Size (5000 chars) | May split recipes mid-instruction |
| Semantic | Keeps complete recipes together based on meaning |
| Document | Each page becomes a chunk |
Available Strategies
Fixed Size
Split into uniform chunks by character count
Semantic
Split at natural breakpoints based on meaning
Recursive
Split using multiple separators hierarchically
Document
Preserve document structure (sections, pages)
Markdown
Split by heading structure
CSV Row
Each row becomes a chunk
Agentic
AI determines optimal boundaries
Code
Split at function and class boundaries using AST analysis
Custom
Build your own strategy
Using with Readers
Pass a chunking strategy to any reader:Choosing a Strategy
| Content Type | Recommended Strategy | Why |
|---|---|---|
| General text | Semantic | Maintains meaning and context |
| Structured docs | Document | Preserves sections and hierarchy |
| Markdown files | Markdown | Respects heading structure |
| CSV/tabular data | CSV Row | Each row is a logical unit |
| Source code | Code | Splits at function and class boundaries |
| Mixed content | Recursive | Handles multiple separator types |
| Need consistency | Fixed Size | Predictable chunk dimensions |
Configuration
Most strategies accept configuration options:Chunk Size Guidelines
| Chunk Size | Trade-off |
|---|---|
| Small (1000-3000 chars) | More precise retrieval, may lose context |
| Default (5000 chars) | Balanced precision and context |
| Large (8000+ chars) | More context, less targeted results |