Reranking
Reranking significantly improves retrieval quality by using a dedicated model to re-score retrieved documents based on query relevance.
How Reranking Works
- Initial retrieval: Vector search finds candidate documents based on embedding similarity
- Reranking: A specialized model (Cohere) evaluates each candidate against the actual query
- Re-ordering: Documents are sorted by the reranker's scores
- Selection: Top results are passed to the LLM
Query: "How do I reset my password?" Initial Retrieval (Vector): After Reranking: 1. Account settings overview 1. Password reset guide ⬆ 2. Password reset guide 2. Account security FAQ ⬆ 3. Login troubleshooting 3. Account settings overview ⬇ 4. Account security FAQ 4. Login troubleshooting ⬇ 5. User profile docs 5. User profile docs - Reranking improves the order based on actual relevance.
Why Reranking Helps
Vector embeddings capture semantic similarity, but they have limitations:
- Vocabulary mismatch: Different words for the same concept may not be close in embedding space
- Nuance loss: Embeddings compress meaning, losing subtle distinctions
- Query-document asymmetry: Short queries vs. long documents have different characteristics
Reranking uses a cross-encoder that directly compares query and document text, catching relevance that embeddings miss.
Enabling Reranking
Enable reranking in your bot settings:
// API
PATCH /api/agents/{agentId}/
{
"useReranking": true
}
// Dashboard
Settings → RAG Settings → Enable RerankingAvailability
Reranking is available on Pro and Enterprise plans. It uses Cohere's reranking API, which is included in your subscription.
Performance Impact
Latency+100-200msAdditional time for the reranking API call
QualitySignificantTypically 10-30% improvement in retrieval relevance
CostIncludedNo additional cost on Pro/Enterprise plans
Best Practices
When to Use Reranking
- Large knowledge bases: More documents benefit more from better ranking
- Diverse content: Mix of topics, formats, or styles
- Complex queries: Multi-part or nuanced questions
- High-stakes applications: Customer support, compliance, research
When to Skip Reranking
- Small knowledge bases: <50 documents may not benefit much
- Latency-critical: When every millisecond matters
- Simple queries: Single-word lookups, exact matches
Combining with Other Features
Reranking + Hybrid Search
Hybrid search retrieves candidates from both vector and keyword search. Reranking then combines and orders these candidates:
- Vector search returns top 20 candidates
- Keyword search returns top 20 candidates
- Results are merged (deduped)
- Reranker scores and orders all candidates
- Top K are sent to LLM
This combination catches both semantic and keyword matches, with optimal ordering.
Reranking + Higher Top K
With reranking, you can safely increase top_k without degrading quality. The reranker ensures only the best results are used:
Without reranking: top_k: 5 (more risks including irrelevant results) With reranking: top_k: 10 (reranker filters to best results)
Debugging Reranking
Check if reranking is improving results:
- Test queries with reranking enabled and disabled
- Compare source citations in responses
- Look at confidence scores before and after reranking
- Monitor analytics for quality metrics
Technical Details
RAG Chats uses Cohere's rerank-english-v2.0 model for English content. For multilingual content, we automatically usererank-multilingual-v2.0.
ModelCoherererank-english-v2.0 or rerank-multilingual-v2.0
Max documents100Maximum documents reranked per query
Max tokens4096Per document limit for reranking

