What the API needs to preserve
Production ingestion needs stable headings, paragraphs, lists, tables and code blocks because those structures are what retrieval systems and agents actually rely on.
AIngestor returns Markdown that is easier to chunk, cache and diff than raw HTML, while still keeping the original document hierarchy intact.
- Normalize public URLs and raw HTML into one output format.
- Report tokens before and after conversion for budget visibility.
- Keep useful links instead of flattening everything into plain text.