Document Parsing - Raycaster AI

Overview

When you upload a PDF or DOCX file, Raycaster Doc automatically parses it into structured Markdown in the background. This parsed content powers AI features like chat document reading, semantic search, and review analysis — without you having to do anything.

How Parsing Works

Upload triggers parsing

When a file is uploaded or replaced, a SHA-256 hash of the source file is computed.

Cache check

If parsed content already exists for this exact file hash, parsing is skipped entirely (cache hit).

Parse job queued

For new files, a background parse job is queued and processed asynchronously.

Parsed content stored

The output — per-page Markdown files and extracted media — is stored in a separate cache bucket.

Parsing by File Type

Format	Parser	Output
PDF	Mistral OCR	Per-page Markdown + extracted images
DOCX	Reducto	Per-page Markdown + media
DOC	Reducto (legacy)	Per-page Markdown
Markdown / Plaintext	Native (no parsing needed)	Indexed directly

Cache Properties

Hash-based — Identical files uploaded by different users reuse the same parsed output
Idempotent — Re-uploading the same file doesn’t trigger redundant parsing
Automatic cleanup — When an artifact is deleted or a project is removed, its cached content is cleaned up

Parse Status

Each artifact tracks its parsing state internally:

Status	Meaning
`none`	No parsed content exists yet
`pending`	Parse job is queued or in progress
`ready`	Parsed content is available for AI features

What Uses Parsed Content

Parsed Markdown is consumed by several AI features behind the scenes:

Chat view tool — When the agent reads a document in text mode, it uses parsed Markdown for PDFs and DOCX files
Semantic search — Parsed content is chunked, embedded, and indexed in the vector database
Review runs — The review agent reads parsed content to analyze documents

You never interact with parsed files directly. The document viewer always shows the original source file. Parsing is a backend optimization that makes AI features fast and accurate.

Limitations

PDF parsing supports files up to 25 MB
Very large documents may take a few minutes to parse
Complex layouts (multi-column, heavy tables) may have reduced parsing accuracy — use the visual mode in chat for layout-sensitive analysis

Documentation Index

​Overview

​How Parsing Works

​Parsing by File Type

​Cache Properties

​Parse Status

​What Uses Parsed Content

​Limitations

Overview

How Parsing Works

Parsing by File Type

Cache Properties

Parse Status

What Uses Parsed Content

Limitations