Files & Retrieval #

Files & Retrieval controls upload, extraction, OCR, full-context injection, vector retrieval, semantic enhancement, RAG, and context compaction enhancement. It determines whether uploaded files can be previewed, extracted, retrieved, and used in conversation context.

Entry Point #

Open Files & retrieval in the admin console. The page is grouped into Upload limits, Extraction, Full context, Embedding retrieval, Semantic enhancement, RAG, and Context compaction enhancement.

These settings affect later file processing for all users. Before changing them, confirm storage, extraction services, Embedding service, and task models are available.

For detailed upload, extraction, full-context, vectorization, RAG, and compaction parameters, see Files & Retrieval Advanced.

Upload Limits #

Upload limits control attachments per message, default file size limit, user storage quota, MIME allowlist, and separate image or document size limits.

The MIME allowlist decides which file types users may upload. Size limits and user quota protect storage cost and processing resources. Too strict hurts upload experience; too loose increases storage and parsing pressure.

Extraction #

Extraction decides how documents, PDFs, spreadsheets, presentations, code, and text are processed. The platform can connect Tika, Docling, MinerU, Tesseract OCR, RapidOCR, PaddleOCR, Tencent Cloud OCR, Aliyun OCR, and LLM OCR.

Different services fit different file types. Tika is good for general document text extraction. OCR is useful for scans and image text. Docling or MinerU fit complex document parsing. Use connection tests to confirm service address and credentials.

Full Context #

Full context injection puts file content directly into conversation context. It fits small files, precise review, contract clause checks, and code snippet analysis.

Administrators can limit full-context text size, token budget, and PDF pages. Limits prevent large files from consuming the whole context. Disabling limits relies more on model context capacity and user judgment.

Vector Retrieval #

Vector retrieval converts file content into embeddings so the model can retrieve relevant chunks by question. It fits large files, many files, and knowledge-base style Q&A.

When configuring Embedding, enable the service and fill service address, request model, and related parameters. After changing the Embedding model, old vectors are marked stale and should be rebuilt from the index status panel.

Vector Index Status #

Vector index status shows signature, ready, stale, pending, failed, and empty states. When stale vectors are detected, use Reindex.

Reindex submits background work. With many files, it can take time. During rebuild, retrieval results for some files may be incomplete.

Semantic Enhancement and RAG #

Semantic enhancement lets the system recall more relevant history messages or file content. RAG settings control retrieved chunks, recall strategy, evidence use, and context assembly.

Message embedding must be enabled before semantic context recall. RAG is useful when there is lots of material, questions span files, or full-context injection would be too expensive.

Context Compaction Enhancement #

Context compaction enhancement helps long conversations or large-material workflows. It can compress earlier content near context limits, reduce blocking in the current reply path, and preserve key information.

The compaction model can follow the current model or use a dedicated task model. Choose a stable, cost-controlled model that summarizes well.

Practical Tips #

Configure upload limits and basic extraction first, then enable OCR, Embedding, and RAG. Use full context for small files and vector retrieval for large files. Rebuild indexes after changing the Embedding model. Before allowing a new file type, test extraction and preview with sample files.

Setting Key Reference #

Files & Retrieval combines storage.*, file.*, extract.*, and some chat.* settings.

Group	Setting Keys
Upload limits	`storage.max_message_files`, `file.allowed_mime_types`, `storage.max_upload_file_bytes`, `storage.user_storage_quota_bytes`, `file.image_max_bytes`, `file.doc_max_bytes`
Extraction engines	`extract.engine`, `extract.tika_base_url`, `extract.tika_auth_token`, `extract.tika_timeout_seconds`, `extract.docling_base_url`, `extract.docling_auth_token`, `extract.docling_timeout_seconds`, `extract.mineru_source`, `extract.mineru_base_url`, `extract.mineru_auth_token`, `extract.mineru_timeout_seconds`
OCR	`extract.image_ocr_enabled`, `extract.pdf_ocr_fallback_enabled`, `extract.ocr_engine`, plus each OCR engine's URL, key, region, endpoint, timeout, and prompt fields.
Full-context injection	`file.full_context_limit_enabled`, `file.file_full_context_max_bytes`, `file.full_context_max_tokens`, `file.full_context_pdf_max_pages`
Vectorization	`file.embedding_enabled`, `file.embedding_host`, `file.embedding_key`, `file.rag_model`, `file.embedding_timeout_seconds`, `file.embedding_output_dimensions`, `file.embedding_normalize`, `file.embed_trigger_on_upload`, `file.embed_batch_size`, `file.embed_chunk_size_tokens`, `file.embed_chunk_overlap_tokens`
Semantic enhancement	`chat.message_embedding_enabled`, `chat.semantic_context_enabled`
RAG	`chat.rag_enabled`, `file.rag_top_k`, `chat.rag_min_similarity`, `chat.rag_token_budget`, `chat.rag_fetch_multiplier`, `chat.rag_wait_ready_ms`, `chat.rag_query_history_turns`, `chat.rag_retrieval_cache_ttl_seconds`, `chat.rag_hybrid_enabled`

Embedding readiness depends on file.embedding_enabled, file.embedding_host, and file.rag_model. RAG settings fully apply only when Embedding is ready and chat.rag_enabled is enabled.

DEEIX Chat