RAG Settings

Configure how your bot retrieves and uses context from your knowledge base to answer questions.

Core Settings

Top K (Number of Chunks)

top_kintegerdefault: 5

Number of document chunks to retrieve for each query

Higher values provide more context but increase latency and cost. Lower values are faster but may miss relevant information.

3-5: Good for focused, specific queries
5-10: Better coverage for broad questions
10+: Comprehensive but slower

Similarity Threshold

similarity_thresholdnumberdefault: 0.5

Minimum relevance score (0-1) for retrieved chunks

Chunks with scores below this threshold are filtered out. Higher values mean stricter matching.

0.3-0.4: Lenient, more results
0.5-0.6: Balanced (recommended)
0.7+: Strict, only very relevant results

Tuning Threshold

If your bot says "I don't have information" too often, try lowering the threshold. If it gives irrelevant answers, raise it.

Advanced Retrieval

Reranking

use_rerankingbooleandefault: false

Enable Cohere reranking to improve retrieval quality

Reranking uses a separate model to re-score retrieved chunks based on query relevance. This significantly improves result quality, especially for complex queries.

Adds ~100-200ms latency
Requires Cohere API (included in Pro plans)
Most impactful for larger knowledge bases

Hybrid Search

use_hybrid_searchbooleandefault: false

Combine vector search with keyword (BM25) search

Hybrid search combines semantic (vector) search with traditional keyword matching. Best for:

Technical documentation with specific terms
Queries containing product names or codes
When exact term matching is important

HyDE (Hypothetical Document Embeddings)

use_hydebooleandefault: false

Generate hypothetical answers to improve retrieval

HyDE generates a hypothetical answer first, then uses it to search. Can improve results for complex or abstract queries.

Adds ~500ms latency (requires extra LLM call)
Most useful for vague or conceptual questions
May not help for straightforward factual queries

Multi-Query

use_multi_querybooleandefault: false

Generate multiple query variations for better recall

Generates multiple variations of the user's query to improve recall. Useful when users might phrase questions differently than your content.

Recommended Configurations

Simple Q&A (Default)

top_k: 5
similarity_threshold: 0.5
use_reranking: false
use_hybrid_search: false

Fast, good for straightforward questions with clear answers.

Customer Support

top_k: 7
similarity_threshold: 0.45
use_reranking: true
use_hybrid_search: true

Better coverage for varied customer questions.

Technical Documentation

top_k: 10
similarity_threshold: 0.4
use_reranking: true
use_hybrid_search: true

Comprehensive retrieval for detailed technical queries.

Research/Complex Queries

top_k: 10
similarity_threshold: 0.35
use_reranking: true
use_hyde: true
use_multi_query: true

Maximum quality for complex, nuanced questions. Higher latency.

Performance Trade-offs

Setting	Latency Impact	Quality Impact
Higher top_k	+50-100ms per 5 chunks	Better coverage
Reranking	+100-200ms	Significantly better
Hybrid Search	+50ms	Better for keywords
HyDE	+500ms	Variable improvement
Multi-Query	+300ms	Better recall

Debugging Retrieval

If your bot isn't finding the right content:

Check that the content exists in your data sources
Lower the similarity threshold
Enable hybrid search for keyword-heavy queries
Increase top_k to retrieve more candidates
Enable reranking for better relevance
Review source citations in test responses