Files and Retrieval Advanced Guide #

Files & Retrieval is a processing chain, not a single switch. Upload limits, extraction, OCR, full-context injection, vectorization, semantic recall, RAG assembly, and context compaction all affect final answer quality. Administrators should know which setting controls which part of the chain and what happens when a dependency is unavailable.

Processing Chain #

After a user uploads a file, the system checks upload limits, stores and identifies the file, extracts text or runs OCR, then chooses full-context injection or retrieval. If extraction, indexing, or retrieval is unavailable, the system falls back to limited full text, image/attachment context, or the user question only.

From an administrator perspective, the chain has six stages:

Check file count, size, type, and user quota.
Store the file and identify type.
Extract text, parse documents, or run OCR.
Decide between full context and vector retrieval.
Recall relevant content for the conversation.
Compress context when needed while preserving key information.

Each administrator setting maps to one control point in this chain.

Configuration Overview #

Open Files & Retrieval in the admin console. Configure from front to back: upload limits, extraction, full context, vector retrieval, semantic enhancement, RAG, and context compression.

Group	Controls	If Missing
Upload limits	Whether files can enter the system	Upload may fail or storage may grow uncontrolled.
Extraction	Whether files become readable text	Users can preview files, but the model may not answer from text.
OCR	Whether images and scanned PDFs become text	Image text and scanned PDFs may not be searchable.
Full context	Whether small files enter context directly	Small files may not be complete, or large files may crowd context.
Vector retrieval	Whether large/multiple files can retrieve chunks	RAG is unavailable and falls back to full text or user question.
Semantic enhancement	Whether historical conversation chunks are recalled	Long conversations lose earlier semantic context.
RAG	How evidence chunks are filtered and injected	Too little recall misses facts; too much adds noise.
Context compression	How long context is preserved	Long sessions may exceed budget or lose early facts.

Runtime Service Checks #

Files & Retrieval provides runtime status and test actions for key services, including Tika, Docling, MinerU, Tesseract OCR, RapidOCR, and Embedding. After changing service address, auth token, timeout, or model name, save the configuration before testing the service.

Service	What to verify
Tika / Docling / MinerU	Document parsing service reachability, authentication, and timeout.
Tesseract OCR / RapidOCR	OCR service reachability and image-text recognition path.
Embedding	Vector service address, model, and authentication.

A passing runtime test does not guarantee every file will parse successfully. Before release, upload samples such as PDF, Word, sheets, images, scanned files, and code, then check preview, extraction, index status, and RAG recall separately.

Upload Limits #

Setting	Function	Guidance
Attachments per message	Limits files attached to one message.	Prevent one message from triggering too much processing.
Default size limit	Default single-file size limit, shown in MB.	Set for common documents, then override images/documents separately.
User storage quota	Total storage per user; 0 means unlimited.	Recommended for team or public deployments.
MIME allowlist	Allows only selected file types.	Open only business-needed types and test extraction.
Image size limit	Overrides image attachment size.	Tighten when OCR or vision cost is high.
Document size limit	Overrides document attachment size.	Large documents should usually use RAG.

Extraction decides whether files become readable text. Different services fit different files:

Service Type	Best For	Notes
General document extraction	PDF, Word, sheets, slides, text	Extracts text from ordinary documents.
OCR	Images and scanned PDFs	Recognizes image text and scanned pages.
Complex document parsing	Layout-heavy PDFs, reports, papers	Preserves structure, paragraphs, and layout better.
LLM OCR	Hard images or documents	Higher cost; use with controlled scope.

Extraction and OCR #

Setting	Function	Guidance
Extraction engine	Built-in, Tika, Docling, or MinerU.	Built-in is lightweight; Tika covers common documents; Docling/MinerU fit complex PDFs.
Tika/Docling/MinerU address, key, timeout	Connects document extraction services.	Required when selecting the corresponding engine.
MinerU source	Cloud or self-hosted service.	Match your deployment and credentials.
Image OCR	Sends image attachments through OCR.	Enable for screenshots, receipts, and image text.
PDF OCR fallback	Runs OCR when PDF text extraction fails or is poor.	Enable for scanned PDFs.
OCR engine	Rapid OCR, Tesseract, Paddle, Tencent, Aliyun, or LLM OCR.	Local engines lower cost; cloud engines improve stability; LLM OCR fits complex layouts.
OCR service settings	Address, key, timeout, region, endpoint, model, prompt as required.	Required fields depend on the selected OCR engine.

Full-context injection puts file text directly into the model context. It is complete and works well for small files, contracts, short code, precise review, and paragraph-by-paragraph analysis. The risk is context pressure: very large files can crowd out the user question, history, or other evidence.

Recommendation: use full context for small files and vector retrieval for large files. A long PDF should not be forced into full context when the user asks a local question.

Full Context #

Setting	Function	Guidance
Full-context limits	Enables size, token, and PDF page thresholds.	Recommended to prevent large files from crowding context.
Full-text size limit	Maximum text bytes for full injection; 0 means unlimited.	Set for small documents; large files should use RAG.
Full-context token limit	Token budget for full injection; 0 means unlimited.	Leave room for user question, history, and evidence.
Full-context PDF page limit	Maximum PDF pages for full injection; 0 means unlimited.	Multi-page PDFs usually fit RAG better.

Vector retrieval chunks files, creates embeddings, and builds indexes. It is the core path for knowledge bases, multiple files, large documents, and long-term material management. If the embedding model changes, old vectors and new vectors should not be mixed; rebuild the index.

Vector Retrieval #

Setting	Function	Guidance
Enable Embedding	Enables file vectorization and retrieval.	Required before RAG or semantic enhancement.
Embedding address and key	Connects the embedding service.	Must match model and dimensions.
Embedding model	Model used for vectorization.	Changing it marks old vectors stale.
Embedding timeout	Wait time for one embedding operation.	Increase for remote or large batches.
Vector dimensions	Dimension used for writes and search.	Must match model output; changes require reindex.
Normalize vectors	Normalizes vectors before storage/search.	Keep consistent across indexing and search.
Auto vectorization	Triggers async embedding after upload.	Recommended for retrieval readiness.
Batch size	Text count per embedding batch.	Balance throughput and timeout risk.
Chunk size	Token-estimated chunk size.	Too small breaks context; too large hurts recall precision.
Chunk overlap	Overlap between adjacent chunks.	Helps avoid cutting important facts.
Top K	Final chunks returned for injection.	Start small, then increase as needed.

Index Status #

Status	Meaning	Action
Ready	Index works with current config.	Retrieval is usable.
Stale	Model or dimension changed.	Rebuild index.
Pending	Vectorization is queued or running.	Wait before testing retrieval.
Failed	Vectorization failed.	Check extracted text, embedding service, and chunk settings.
Empty	No index exists yet.	Configure Embedding and upload sample files.

Semantic enhancement recalls related historical conversation chunks, while RAG retrieves file evidence. Message embedding must be enabled before semantic context recall. For RAG, too few chunks miss facts; too many chunks add noise and consume context. Tune from a small high-quality recall set.

Semantic Enhancement and RAG #

Setting	Function	Guidance
Message embedding	Embeds conversation messages after each turn.	Required before semantic context recall.
Semantic context recall	Recalls historical related chunks.	Useful for long sessions and ongoing projects.
RAG switch	Allows files to enter retrieval-augmented context.	Requires Embedding.
Similarity threshold	Minimum similarity for injected chunks.	Too high may return empty; too low adds noise.
Injection token budget	Limits evidence token usage.	Reserve room for prompt, history, and current question.
Fetch multiplier	Fetches more candidates before filtering.	Useful when documents are noisy.
Ready wait time	Waits for vectorization before sending.	Helps immediately after upload; too high slows replies.
Query history turns	Adds recent user turns to retrieval query.	Improves multi-turn follow-ups.
Retrieval cache TTL	Caches retrieval results.	Helps repeated questions; avoid long TTL for changing files.
Hybrid retrieval	Combines BM25 and vector retrieval.	Good for exact terms, numbers, and identifiers.

Context compression helps long conversations and large evidence sets keep important information. A compression model should be stable, inexpensive, and good at summarization. Async compression reduces blocking, but summaries may update slightly later. Test that facts, numbers, conclusions, and open tasks are preserved.

Context Compression #

Setting	Function	Guidance
Token budget truncation	Trims history based on model context window.	Recommended.
Compression turn threshold	Triggers compression by conversation turns.	Lower for long-running conversations.
Compression token threshold	Triggers compression by token size.	Keep below model context limit.
Recent turns preserved	Keeps recent raw turns before summarizing older content.	Preserve current task details.
Highlights per role	Number of summary items per role.	Balance detail and budget.
Snippet character limit	Max characters per compressed snippet.	Prevent one long message from dominating summary.
Evidence retention days	Retains RAG, history, and tool evidence; 0 means no auto-expiry.	Adjust for audit and storage needs.
LLM compression	Uses a model for semantic summaries.	Higher quality with extra cost.
Async compression	Runs compression in the background.	Reduces reply blocking, with slight delay.
Compression model	Model used for summaries.	Use a stable, inexpensive summarization model.
Failure threshold	Falls back to template summaries after repeated failures.	Protects the main chat flow.
Summary prompts	Custom full/light summary instructions.	Preserve facts, numbers, constraints, and open tasks.

Fallback Rules #

Condition	Behavior	Check
Upload check fails	File is rejected.	Count, size, MIME, quota.
Extraction unavailable	File remains previewable but has no searchable text.	Engine, address, timeout.
OCR unavailable	Image or scanned text cannot become searchable text.	OCR switches and engine settings.
Full context exceeds limits	System tries retrieval or fallback.	Size, token, PDF page limits.
Embedding missing	RAG and semantic recall are unavailable.	Embedding address, model, dimensions.
Index stale or failed	Retrieval is incomplete or unreliable.	Reindex, extracted text, chunk settings.
RAG returns empty	No irrelevant chunks are injected.	Similarity, top K, hybrid retrieval.
Context too long	Truncation or compression runs.	Token budget and compression settings.

Recommended Setup Path #

Configure upload count, size, MIME, image/document limits, and user quota.
Configure extraction and test common file types.
Enable image OCR and PDF OCR fallback only when needed.
Enable full-context limits for small documents.
Configure Embedding model, dimensions, chunks, and auto vectorization.
Upload sample files and confirm index status is ready.
Enable RAG and tune similarity, top K, token budget, and hybrid retrieval.
For long sessions, enable message embedding, semantic recall, and context compression.

Practical Tips #

Do not enable everything at once. Stabilize upload, preview, and extraction first, then add OCR, vectorization, and RAG. After changing the embedding model, vector dimensions, chunking, or extraction engine, test with sample files. When users say the model cannot answer from a file, check upload limits, extraction result, index status, RAG settings, and context budget in that order.

Complete Parameter List #

These are the current frontend configuration keys used by Files & Retrieval. They are useful for deployment review and troubleshooting.

Pipeline Stage	Setting Keys
Upload limits	`storage.max_message_files`, `file.allowed_mime_types`, `storage.max_upload_file_bytes`, `storage.user_storage_quota_bytes`, `file.image_max_bytes`, `file.doc_max_bytes`
Extraction engines	`extract.engine`, `extract.tika_base_url`, `extract.tika_auth_token`, `extract.tika_timeout_seconds`, `extract.docling_base_url`, `extract.docling_auth_token`, `extract.docling_timeout_seconds`, `extract.mineru_source`, `extract.mineru_base_url`, `extract.mineru_auth_token`, `extract.mineru_timeout_seconds`
OCR switches	`extract.image_ocr_enabled`, `extract.pdf_ocr_fallback_enabled`, `extract.ocr_engine`
Tesseract / RapidOCR / Paddle	`extract.tesseract_ocr_base_url`, `extract.tesseract_ocr_auth_token`, `extract.tesseract_ocr_timeout_seconds`, `extract.rapidocr_base_url`, `extract.rapidocr_auth_token`, `extract.rapidocr_timeout_seconds`, `extract.paddle_ocr_base_url`, `extract.paddle_ocr_auth_token`, `extract.paddle_ocr_timeout_seconds`
Cloud OCR	`extract.tencent_ocr_secret_id`, `extract.tencent_ocr_secret_key`, `extract.tencent_ocr_region`, `extract.tencent_ocr_endpoint`, `extract.tencent_ocr_timeout_seconds`, `extract.aliyun_ocr_access_key_id`, `extract.aliyun_ocr_access_key_secret`, `extract.aliyun_ocr_region`, `extract.aliyun_ocr_endpoint`, `extract.aliyun_ocr_timeout_seconds`
LLM OCR	`extract.llm_ocr_base_url`, `extract.llm_ocr_auth_token`, `extract.llm_ocr_model`, `extract.llm_ocr_timeout_seconds`, `extract.llm_ocr_prompt`
Full-context injection	`file.full_context_limit_enabled`, `file.file_full_context_max_bytes`, `file.full_context_max_tokens`, `file.full_context_pdf_max_pages`
Embedding	`file.embedding_enabled`, `file.embedding_host`, `file.embedding_key`, `file.rag_model`, `file.embedding_timeout_seconds`, `file.embedding_output_dimensions`, `file.embedding_normalize`, `file.embed_trigger_on_upload`, `file.embed_batch_size`, `file.embed_chunk_size_tokens`, `file.embed_chunk_overlap_tokens`
Message semantic enhancement	`chat.message_embedding_enabled`, `chat.semantic_context_enabled`
RAG	`chat.rag_enabled`, `file.rag_top_k`, `chat.rag_min_similarity`, `chat.rag_token_budget`, `chat.rag_fetch_multiplier`, `chat.rag_wait_ready_ms`, `chat.rag_query_history_turns`, `chat.rag_retrieval_cache_ttl_seconds`, `chat.rag_hybrid_enabled`

The user-side chat.file_mode setting is only a personal file preference. If administrators have not enabled Embedding or RAG, choosing rag falls back according to platform capabilities.

DEEIX Chat