Reranking

Reranking significantly improves retrieval quality by using a dedicated model to re-score retrieved documents based on query relevance.

How Reranking Works

  1. Initial retrieval: Vector search finds candidate documents based on embedding similarity
  2. Reranking: A specialized model (Cohere) evaluates each candidate against the actual query
  3. Re-ordering: Documents are sorted by the reranker's scores
  4. Selection: Top results are passed to the LLM
Query: "How do I reset my password?"

Initial Retrieval (Vector):     After Reranking:
1. Account settings overview    1. Password reset guide ⬆
2. Password reset guide         2. Account security FAQ ⬆
3. Login troubleshooting        3. Account settings overview ⬇
4. Account security FAQ         4. Login troubleshooting ⬇
5. User profile docs            5. User profile docs -

Reranking improves the order based on actual relevance.

Why Reranking Helps

Vector embeddings capture semantic similarity, but they have limitations:

  • Vocabulary mismatch: Different words for the same concept may not be close in embedding space
  • Nuance loss: Embeddings compress meaning, losing subtle distinctions
  • Query-document asymmetry: Short queries vs. long documents have different characteristics

Reranking uses a cross-encoder that directly compares query and document text, catching relevance that embeddings miss.

Enabling Reranking

Enable reranking in your bot settings:

// API
PATCH /api/agents/{agentId}/
{
  "useReranking": true
}

// Dashboard
Settings → RAG Settings → Enable Reranking

Availability

Reranking is available on Pro and Enterprise plans. It uses Cohere's reranking API, which is included in your subscription.

Performance Impact

Latency+100-200ms

Additional time for the reranking API call

QualitySignificant

Typically 10-30% improvement in retrieval relevance

CostIncluded

No additional cost on Pro/Enterprise plans

Best Practices

When to Use Reranking

  • Large knowledge bases: More documents benefit more from better ranking
  • Diverse content: Mix of topics, formats, or styles
  • Complex queries: Multi-part or nuanced questions
  • High-stakes applications: Customer support, compliance, research

When to Skip Reranking

  • Small knowledge bases: <50 documents may not benefit much
  • Latency-critical: When every millisecond matters
  • Simple queries: Single-word lookups, exact matches

Combining with Other Features

Reranking + Hybrid Search

Hybrid search retrieves candidates from both vector and keyword search. Reranking then combines and orders these candidates:

  1. Vector search returns top 20 candidates
  2. Keyword search returns top 20 candidates
  3. Results are merged (deduped)
  4. Reranker scores and orders all candidates
  5. Top K are sent to LLM

This combination catches both semantic and keyword matches, with optimal ordering.

Reranking + Higher Top K

With reranking, you can safely increase top_k without degrading quality. The reranker ensures only the best results are used:

Without reranking: top_k: 5 (more risks including irrelevant results)
With reranking: top_k: 10 (reranker filters to best results)

Debugging Reranking

Check if reranking is improving results:

  1. Test queries with reranking enabled and disabled
  2. Compare source citations in responses
  3. Look at confidence scores before and after reranking
  4. Monitor analytics for quality metrics

Technical Details

RAG Chats uses Cohere's rerank-english-v2.0 model for English content. For multilingual content, we automatically usererank-multilingual-v2.0.

ModelCohere

rerank-english-v2.0 or rerank-multilingual-v2.0

Max documents100

Maximum documents reranked per query

Max tokens4096

Per document limit for reranking