Optimize before embedding
The cheapest chunk is the one you never create. Boilerplate removal and structure preservation should happen before embeddings, not after retrieval disappoints.
RAG cost is heavily shaped by input quality. If your ingestion layer passes noisy HTML forward, you pay more to embed, store, retrieve and prompt content that never helps answer the question.
The cheapest chunk is the one you never create. Boilerplate removal and structure preservation should happen before embeddings, not after retrieval disappoints.
Use one tokenizer for before and after metrics so teams can set budgets and watch regressions over time. Heuristic size estimates are not enough in production.
Repeated navigation and marketing modules multiply across crawls. Removing them once in the ingestion layer produces compounding savings across indexing and prompt usage.
Internal links
Reduce noisy HTML and preserve semantic structure so every prompt and chunk carries more useful signal per token.
Chunk by meaning instead of arbitrary token windows by preserving headings, lists and tables before segmentation.
Shrink token spend by converting bloated HTML into compact Markdown before chunking, prompting or embedding.