RAG Settings
Configure how your bot retrieves and uses context from your knowledge base to answer questions.
Core Settings
Top K (Number of Chunks)
top_kintegerdefault: 5Number of document chunks to retrieve for each query
Higher values provide more context but increase latency and cost. Lower values are faster but may miss relevant information.
- 3-5: Good for focused, specific queries
- 5-10: Better coverage for broad questions
- 10+: Comprehensive but slower
Similarity Threshold
similarity_thresholdnumberdefault: 0.5Minimum relevance score (0-1) for retrieved chunks
Chunks with scores below this threshold are filtered out. Higher values mean stricter matching.
- 0.3-0.4: Lenient, more results
- 0.5-0.6: Balanced (recommended)
- 0.7+: Strict, only very relevant results
Tuning Threshold
If your bot says "I don't have information" too often, try lowering the threshold. If it gives irrelevant answers, raise it.
Advanced Retrieval
Reranking
use_rerankingbooleandefault: falseEnable Cohere reranking to improve retrieval quality
Reranking uses a separate model to re-score retrieved chunks based on query relevance. This significantly improves result quality, especially for complex queries.
- Adds ~100-200ms latency
- Requires Cohere API (included in Pro plans)
- Most impactful for larger knowledge bases
Hybrid Search
use_hybrid_searchbooleandefault: falseCombine vector search with keyword (BM25) search
Hybrid search combines semantic (vector) search with traditional keyword matching. Best for:
- Technical documentation with specific terms
- Queries containing product names or codes
- When exact term matching is important
HyDE (Hypothetical Document Embeddings)
use_hydebooleandefault: falseGenerate hypothetical answers to improve retrieval
HyDE generates a hypothetical answer first, then uses it to search. Can improve results for complex or abstract queries.
- Adds ~500ms latency (requires extra LLM call)
- Most useful for vague or conceptual questions
- May not help for straightforward factual queries
Multi-Query
use_multi_querybooleandefault: falseGenerate multiple query variations for better recall
Generates multiple variations of the user's query to improve recall. Useful when users might phrase questions differently than your content.
Recommended Configurations
Simple Q&A (Default)
top_k: 5 similarity_threshold: 0.5 use_reranking: false use_hybrid_search: false
Fast, good for straightforward questions with clear answers.
Customer Support
top_k: 7 similarity_threshold: 0.45 use_reranking: true use_hybrid_search: true
Better coverage for varied customer questions.
Technical Documentation
top_k: 10 similarity_threshold: 0.4 use_reranking: true use_hybrid_search: true
Comprehensive retrieval for detailed technical queries.
Research/Complex Queries
top_k: 10 similarity_threshold: 0.35 use_reranking: true use_hyde: true use_multi_query: true
Maximum quality for complex, nuanced questions. Higher latency.
Performance Trade-offs
| Setting | Latency Impact | Quality Impact |
|---|---|---|
| Higher top_k | +50-100ms per 5 chunks | Better coverage |
| Reranking | +100-200ms | Significantly better |
| Hybrid Search | +50ms | Better for keywords |
| HyDE | +500ms | Variable improvement |
| Multi-Query | +300ms | Better recall |
Debugging Retrieval
If your bot isn't finding the right content:
- Check that the content exists in your data sources
- Lower the similarity threshold
- Enable hybrid search for keyword-heavy queries
- Increase top_k to retrieve more candidates
- Enable reranking for better relevance
- Review source citations in test responses

