Chunk at semantic boundaries
Good defaults are heading changes, list boundaries and section-level code examples. Splitting in the middle of those structures usually harms retrieval quality.
Semantic chunking is mostly about respecting document structure. If headings, paragraphs and code examples survive extraction, chunking becomes a controlled operation instead of a repair job.
Good defaults are heading changes, list boundaries and section-level code examples. Splitting in the middle of those structures usually harms retrieval quality.
A chunker should adapt to content shape, not force every section into the same size bucket. Clean extraction gives that flexibility without losing topical coherence.
Internal links
Chunk by meaning instead of arbitrary token windows by preserving headings, lists and tables before segmentation.
Reduce noisy HTML and preserve semantic structure so every prompt and chunk carries more useful signal per token.
Expose web content to agents and retrieval systems through a reader-style API that prioritizes clarity over browser markup.