Advanced Topics
Take your RAG implementation to the next level with advanced retrieval techniques, performance optimization, and comprehensive monitoring.
Topics
π― Reranking
Improve retrieval quality with Cohere reranking
π Hybrid Search
Combine vector and keyword search for better results
π Analytics & Monitoring
Track performance, usage, and optimize over time
RAG Pipeline Overview
Understanding the full RAG pipeline helps you optimize each stage:
User Query
β
βΌ
βββββββββββββββββββββββ
β Query Processing β β Multi-query, HyDE
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Vector Search β β Embeddings, similarity
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Keyword Search β β BM25 (if hybrid enabled)
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Merge & Rerank β β Cohere reranking
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Context Assembly β β Top K chunks
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β LLM Generation β β GPT-4, Claude, etc.
βββββββββββ¬ββββββββββββ
β
βΌ
Response + SourcesOptimization Strategies
Improve Relevance
- Enable reranking: Single biggest impact on quality
- Use hybrid search: Catches keyword-based queries
- Tune similarity threshold: Balance recall vs precision
- Increase top_k: More candidates for reranker
Improve Speed
- Lower top_k: Fewer chunks to process
- Disable HyDE: Removes extra LLM call
- Use faster models: GPT-3.5 or Claude Haiku
- Reduce max_tokens: Shorter responses
Reduce Costs
- Lower top_k: Less context = fewer input tokens
- Use cheaper models: GPT-3.5 for simple queries
- Cache common queries: (Enterprise feature)
- Limit response length: Lower max_tokens
Quality Evaluation
Systematically evaluate and improve your bot's quality:
- Build test set: Collect real user questions
- Define ground truth: Expected answers or sources
- Measure metrics: Relevance, accuracy, latency
- Iterate: Adjust settings, re-evaluate
Common Challenges
Bot says "I don't know" too often
- Lower similarity threshold
- Enable hybrid search
- Check content exists in knowledge base
- Enable allow_general_knowledge
Responses are slow
- Reduce top_k
- Disable HyDE and multi-query
- Use a faster model
- Check data source size
Wrong sources cited
- Enable reranking
- Raise similarity threshold
- Review document quality
- Update system prompt
Enterprise Features
Additional capabilities for Enterprise plans:
- Response caching: Cache common queries
- Custom embeddings: Fine-tuned for your domain
- A/B testing: Compare configurations
- Advanced analytics: Detailed query insights
- SLA guarantees: Uptime and latency SLAs

