module
Memo::Chunking
Overview
Text chunking for semantic search
Splits large text into semantically meaningful chunks based on configurable limits:
- Text < no_chunk_threshold tokens: Keep whole (no chunking)
- Text > no_chunk_threshold tokens: Split on paragraph breaks (\n\n)
- Paragraphs > max_tokens: Further split on sentences
- Sentences < min_tokens: Combine with next sentence
Extended Modules
Defined in:
memo/chunking.crInstance Method Summary
-
#chunk_text(text : String, config : Config::Chunking) : Array(String)
Chunk text into segments based on configuration
-
#estimate_tokens(text : String) : Int32
Estimate token count (rough approximation: chars / 4)
Instance Method Detail
def chunk_text(text : String, config : Config::Chunking) : Array(String)
#
Chunk text into segments based on configuration
Returns array of chunk text strings