Chunking Parameters

Validated on 15 Apr 2026 • Last edited on 27 Apr 2026

DigitalOcean Knowledge Bases let you store, index, and retrieve data from private files, websites, Spaces buckets, and other sources to power retrieval-augmented generation with your own content.

Chunking divides documents into smaller units for indexing and retrieval. This reference describes chunking parameters and how they interact with embeddings models.

For guidance on choosing a strategy, see our chunking best practices.

Parameters

The following parameters determine how each chunking strategy divides and structures your documents. All parameters must remain within the embeddings model’s token limits.

Parameter Applies To Definition Recommendation
max_chunk_size Section-Based
Semantic
Fixed Length
Maximum number of tokens per chunk. Minimum is 100. Maximum depends on the embeddings model.

For fixed length chunking, this value is the exact split size.
Chunking strategy recommendations:

• Section-Based: 800 for stable, readable chunks.
• Fixed Length: 500 for predictable cost and performance.
• Semantic: 700 for balanced semantic precision and manageable chunk count.
semantic_threshold Semantic Sensitivity to semantic shifts (ranges from 0.0 to 1.0). Lower values allow more variation and produce fewer chunks. Higher values enforce stricter similarity and may split sentences. 0.5 balances chunk quantity with meaningful semantic grouping.
parent_chunk_size Hierarchical Token size of parent chunks, used to provide broad context. Must be larger than the child chunk size. 1500 for wide context windows without excessive token cost.
child_chunk_size Hierarchical Token size of child chunks, used for retrieval. Must be smaller than the parent chunk size. 300 for focused retrieval.

Model-specific recommendations:

• GTE Large (v1.5): 400
• E5 Large (v2): 256
• BGE M3: 400
• All-MiniLM-L6-v2: 128
• Multi-QA-mpnet-base-dot-v1: 256
• Qwen3 Embedding 0.6B: 400.

We can't find any results for your search.

Try using different keywords or simplifying your search terms.