Files & Retrieval is a processing chain, not a single switch. Upload limits, extraction, OCR, full-context injection, vectorization, semantic recall, RAG assembly, and context compaction all affect final answer quality. Administrators should know which setting controls which part of the chain and what happens when a dependency is unavailable.

Processing Chain#

After a user uploads a file, the system checks upload limits, stores and identifies the file, extracts text or runs OCR, then chooses full-context injection or retrieval. If extraction, indexing, or retrieval is unavailable, the system falls back to limited full text, image/attachment context, or the user question only.

From an administrator perspective, the chain has six stages:

  1. Check file count, size, type, and user quota.
  2. Store the file and identify type.
  3. Extract text, parse documents, or run OCR.
  4. Decide between full context and vector retrieval.
  5. Recall relevant content for the conversation.
  6. Compress context when needed while preserving key information.

Each administrator setting maps to one control point in this chain.

Configuration Overview#

Open Files & Retrieval in the admin console. Configure from front to back: upload limits, extraction, full context, vector retrieval, semantic enhancement, RAG, and context compression.

GroupControlsIf Missing
Upload limitsWhether files can enter the systemUpload may fail or storage may grow uncontrolled.
ExtractionWhether files become readable textUsers can preview files, but the model may not answer from text.
OCRWhether images and scanned PDFs become textImage text and scanned PDFs may not be searchable.
Full contextWhether small files enter context directlySmall files may not be complete, or large files may crowd context.
Vector retrievalWhether large/multiple files can retrieve chunksRAG is unavailable and falls back to full text or user question.
Semantic enhancementWhether historical conversation chunks are recalledLong conversations lose earlier semantic context.
RAGHow evidence chunks are filtered and injectedToo little recall misses facts; too much adds noise.
Context compressionHow long context is preservedLong sessions may exceed budget or lose early facts.

Upload Limits#

SettingFunctionGuidance
Attachments per messageLimits files attached to one message.Prevent one message from triggering too much processing.
Default size limitDefault single-file size limit, shown in MB.Set for common documents, then override images/documents separately.
User storage quotaTotal storage per user; 0 means unlimited.Recommended for team or public deployments.
MIME allowlistAllows only selected file types.Open only business-needed types and test extraction.
Image size limitOverrides image attachment size.Tighten when OCR or vision cost is high.
Document size limitOverrides document attachment size.Large documents should usually use RAG.

Extraction decides whether files become readable text. Different services fit different files:

Service TypeBest ForNotes
General document extractionPDF, Word, sheets, slides, textExtracts text from ordinary documents.
OCRImages and scanned PDFsRecognizes image text and scanned pages.
Complex document parsingLayout-heavy PDFs, reports, papersPreserves structure, paragraphs, and layout better.
LLM OCRHard images or documentsHigher cost; use with controlled scope.

Extraction and OCR#

SettingFunctionGuidance
Extraction engineBuilt-in, Tika, Docling, or MinerU.Built-in is lightweight; Tika covers common documents; Docling/MinerU fit complex PDFs.
Tika/Docling/MinerU address, key, timeoutConnects document extraction services.Required when selecting the corresponding engine.
MinerU sourceCloud or self-hosted service.Match your deployment and credentials.
Image OCRSends image attachments through OCR.Enable for screenshots, receipts, and image text.
PDF OCR fallbackRuns OCR when PDF text extraction fails or is poor.Enable for scanned PDFs.
OCR engineRapid OCR, Tesseract, Paddle, Tencent, Aliyun, or LLM OCR.Local engines lower cost; cloud engines improve stability; LLM OCR fits complex layouts.
OCR service settingsAddress, key, timeout, region, endpoint, model, prompt as required.Required fields depend on the selected OCR engine.

Full-context injection puts file text directly into the model context. It is complete and works well for small files, contracts, short code, precise review, and paragraph-by-paragraph analysis. The risk is context pressure: very large files can crowd out the user question, history, or other evidence.

Recommendation: use full context for small files and vector retrieval for large files. A long PDF should not be forced into full context when the user asks a local question.

Full Context#

SettingFunctionGuidance
Full-context limitsEnables size, token, and PDF page thresholds.Recommended to prevent large files from crowding context.
Full-text size limitMaximum text bytes for full injection; 0 means unlimited.Set for small documents; large files should use RAG.
Full-context token limitToken budget for full injection; 0 means unlimited.Leave room for user question, history, and evidence.
Full-context PDF page limitMaximum PDF pages for full injection; 0 means unlimited.Multi-page PDFs usually fit RAG better.

Vector retrieval chunks files, creates embeddings, and builds indexes. It is the core path for knowledge bases, multiple files, large documents, and long-term material management. If the embedding model changes, old vectors and new vectors should not be mixed; rebuild the index.

Vector Retrieval#

SettingFunctionGuidance
Enable EmbeddingEnables file vectorization and retrieval.Required before RAG or semantic enhancement.
Embedding address and keyConnects the embedding service.Must match model and dimensions.
Embedding modelModel used for vectorization.Changing it marks old vectors stale.
Embedding timeoutWait time for one embedding operation.Increase for remote or large batches.
Vector dimensionsDimension used for writes and search.Must match model output; changes require reindex.
Normalize vectorsNormalizes vectors before storage/search.Keep consistent across indexing and search.
Auto vectorizationTriggers async embedding after upload.Recommended for retrieval readiness.
Batch sizeText count per embedding batch.Balance throughput and timeout risk.
Chunk sizeToken-estimated chunk size.Too small breaks context; too large hurts recall precision.
Chunk overlapOverlap between adjacent chunks.Helps avoid cutting important facts.
Top KFinal chunks returned for injection.Start small, then increase as needed.

Index Status#

StatusMeaningAction
ReadyIndex works with current config.Retrieval is usable.
StaleModel or dimension changed.Rebuild index.
PendingVectorization is queued or running.Wait before testing retrieval.
FailedVectorization failed.Check extracted text, embedding service, and chunk settings.
EmptyNo index exists yet.Configure Embedding and upload sample files.

Semantic enhancement recalls related historical conversation chunks, while RAG retrieves file evidence. Message embedding must be enabled before semantic context recall. For RAG, too few chunks miss facts; too many chunks add noise and consume context. Tune from a small high-quality recall set.

Semantic Enhancement and RAG#

SettingFunctionGuidance
Message embeddingEmbeds conversation messages after each turn.Required before semantic context recall.
Semantic context recallRecalls historical related chunks.Useful for long sessions and ongoing projects.
RAG switchAllows files to enter retrieval-augmented context.Requires Embedding.
Similarity thresholdMinimum similarity for injected chunks.Too high may return empty; too low adds noise.
Injection token budgetLimits evidence token usage.Reserve room for prompt, history, and current question.
Fetch multiplierFetches more candidates before filtering.Useful when documents are noisy.
Ready wait timeWaits for vectorization before sending.Helps immediately after upload; too high slows replies.
Query history turnsAdds recent user turns to retrieval query.Improves multi-turn follow-ups.
Retrieval cache TTLCaches retrieval results.Helps repeated questions; avoid long TTL for changing files.
Hybrid retrievalCombines BM25 and vector retrieval.Good for exact terms, numbers, and identifiers.

Context compression helps long conversations and large evidence sets keep important information. A compression model should be stable, inexpensive, and good at summarization. Async compression reduces blocking, but summaries may update slightly later. Test that facts, numbers, conclusions, and open tasks are preserved.

Context Compression#

SettingFunctionGuidance
Token budget truncationTrims history based on model context window.Recommended.
Compression turn thresholdTriggers compression by conversation turns.Lower for long-running conversations.
Compression token thresholdTriggers compression by token size.Keep below model context limit.
Recent turns preservedKeeps recent raw turns before summarizing older content.Preserve current task details.
Highlights per roleNumber of summary items per role.Balance detail and budget.
Snippet character limitMax characters per compressed snippet.Prevent one long message from dominating summary.
Evidence retention daysRetains RAG, history, and tool evidence; 0 means no auto-expiry.Adjust for audit and storage needs.
LLM compressionUses a model for semantic summaries.Higher quality with extra cost.
Async compressionRuns compression in the background.Reduces reply blocking, with slight delay.
Compression modelModel used for summaries.Use a stable, inexpensive summarization model.
Failure thresholdFalls back to template summaries after repeated failures.Protects the main chat flow.
Summary promptsCustom full/light summary instructions.Preserve facts, numbers, constraints, and open tasks.

Fallback Rules#

ConditionBehaviorCheck
Upload check failsFile is rejected.Count, size, MIME, quota.
Extraction unavailableFile remains previewable but has no searchable text.Engine, address, timeout.
OCR unavailableImage or scanned text cannot become searchable text.OCR switches and engine settings.
Full context exceeds limitsSystem tries retrieval or fallback.Size, token, PDF page limits.
Embedding missingRAG and semantic recall are unavailable.Embedding address, model, dimensions.
Index stale or failedRetrieval is incomplete or unreliable.Reindex, extracted text, chunk settings.
RAG returns emptyNo irrelevant chunks are injected.Similarity, top K, hybrid retrieval.
Context too longTruncation or compression runs.Token budget and compression settings.
  1. Configure upload count, size, MIME, image/document limits, and user quota.
  2. Configure extraction and test common file types.
  3. Enable image OCR and PDF OCR fallback only when needed.
  4. Enable full-context limits for small documents.
  5. Configure Embedding model, dimensions, chunks, and auto vectorization.
  6. Upload sample files and confirm index status is ready.
  7. Enable RAG and tune similarity, top K, token budget, and hybrid retrieval.
  8. For long sessions, enable message embedding, semantic recall, and context compression.

Practical Tips#

Do not enable everything at once. Stabilize upload, preview, and extraction first, then add OCR, vectorization, and RAG. After changing the embedding model, vector dimensions, chunking, or extraction engine, test with sample files. When users say the model cannot answer from a file, check upload limits, extraction result, index status, RAG settings, and context budget in that order.