Reranking

Reranking significantly improves retrieval quality by using a dedicated model to re-score retrieved documents based on query relevance.

How Reranking Works

Initial retrieval: Vector search finds candidate documents based on embedding similarity
Reranking: A specialized model (Cohere) evaluates each candidate against the actual query
Re-ordering: Documents are sorted by the reranker's scores
Selection: Top results are passed to the LLM

Query: "How do I reset my password?"

Initial Retrieval (Vector):     After Reranking:
1. Account settings overview    1. Password reset guide ⬆
2. Password reset guide         2. Account security FAQ ⬆
3. Login troubleshooting        3. Account settings overview ⬇
4. Account security FAQ         4. Login troubleshooting ⬇
5. User profile docs            5. User profile docs -

Reranking improves the order based on actual relevance.

Why Reranking Helps

Vector embeddings capture semantic similarity, but they have limitations:

Vocabulary mismatch: Different words for the same concept may not be close in embedding space
Nuance loss: Embeddings compress meaning, losing subtle distinctions
Query-document asymmetry: Short queries vs. long documents have different characteristics

Reranking uses a cross-encoder that directly compares query and document text, catching relevance that embeddings miss.

Enabling Reranking

Enable reranking in your bot settings:

// API
PATCH /api/agents/{agentId}/
{
  "useReranking": true
}

// Dashboard
Settings → RAG Settings → Enable Reranking

Availability

Reranking is available on Pro and Enterprise plans. It uses Cohere's reranking API, which is included in your subscription.

Performance Impact

Latency+100-200ms

Additional time for the reranking API call

QualitySignificant

Typically 10-30% improvement in retrieval relevance

CostIncluded

No additional cost on Pro/Enterprise plans

Best Practices

When to Use Reranking

Large knowledge bases: More documents benefit more from better ranking
Diverse content: Mix of topics, formats, or styles
Complex queries: Multi-part or nuanced questions
High-stakes applications: Customer support, compliance, research

When to Skip Reranking

Small knowledge bases: <50 documents may not benefit much
Latency-critical: When every millisecond matters
Simple queries: Single-word lookups, exact matches

Combining with Other Features

Reranking + Hybrid Search

Hybrid search retrieves candidates from both vector and keyword search. Reranking then combines and orders these candidates:

Vector search returns top 20 candidates
Keyword search returns top 20 candidates
Results are merged (deduped)
Reranker scores and orders all candidates
Top K are sent to LLM

This combination catches both semantic and keyword matches, with optimal ordering.

Reranking + Higher Top K

With reranking, you can safely increase top_k without degrading quality. The reranker ensures only the best results are used:

Without reranking: top_k: 5 (more risks including irrelevant results)
With reranking: top_k: 10 (reranker filters to best results)

Debugging Reranking

Check if reranking is improving results:

Test queries with reranking enabled and disabled
Compare source citations in responses
Look at confidence scores before and after reranking
Monitor analytics for quality metrics

Technical Details

RAG Chats uses Cohere's rerank-english-v2.0 model for English content. For multilingual content, we automatically usererank-multilingual-v2.0.

ModelCohere

rerank-english-v2.0 or rerank-multilingual-v2.0

Max documents100

Maximum documents reranked per query

Max tokens4096

Per document limit for reranking