Hybrid Search

Hybrid search combines semantic vector search with traditional keyword (BM25) search, capturing both conceptual meaning and exact matches.

Vector vs Keyword Search

Vector Search (Default)

Converts text to numerical vectors (embeddings) and finds similar vectors. Great for:

  • Semantic understanding ("happy" ≈ "joyful")
  • Paraphrased queries
  • Conceptual similarity
  • Natural language questions

Limitations:

  • May miss exact keyword matches
  • Struggles with technical terms, codes, names
  • Less precise for specific phrases

Keyword Search (BM25)

Traditional text matching based on term frequency. Great for:

  • Exact term matching
  • Product codes, IDs, names
  • Technical terminology
  • Acronyms and abbreviations

Limitations:

  • No semantic understanding
  • Misses synonyms and paraphrases
  • Requires exact (or partial) word matches

How Hybrid Search Works

Query: "SKU-12345 return policy"

Vector Search Results:           Keyword Search Results:
1. General Return Policy        1. Product SKU-12345 specs ⭐
2. Refund FAQ                   2. SKU-12345 shipping info
3. Exchange Guidelines          3. Return Policy ⭐
4. Customer Service Guide       4. Order with SKU-12345

Hybrid Merge (with reranking):
1. Product SKU-12345 specs      (keyword match)
2. General Return Policy        (semantic match)
3. Return Policy                (both matches)
4. SKU-12345 shipping info      (keyword match)
5. Refund FAQ                   (semantic match)

Result: Captures both the specific product AND return policy info.

Enabling Hybrid Search

// API
PATCH /api/agents/{agentId}/
{
  "useHybridSearch": true
}

// Dashboard
Settings → RAG Settings → Enable Hybrid Search

When to Use Hybrid Search

Recommended For

  • Technical documentation: API endpoints, error codes, config options
  • Product catalogs: SKUs, model numbers, product names
  • Support tickets: Ticket IDs, customer references
  • Mixed queries: "What is error E-4503?"
  • Compliance/legal: Specific policy names, section numbers

May Not Be Needed For

  • Conversational bots: Natural language questions
  • General knowledge bases: Prose content without codes
  • Creative content: Marketing copy, blog posts

Performance Impact

Latency+30-50ms

Additional time for keyword search

QualityVaries

Significant for keyword-heavy domains

StorageMinimal

BM25 index adds ~10% to storage

Combine with Reranking

Hybrid search is most effective when combined with reranking. The reranker optimally merges and orders results from both search methods.

Result Fusion

Hybrid search uses Reciprocal Rank Fusion (RRF) to combine results:

  • Each result gets a score based on its rank in each list
  • Scores are combined across lists
  • Documents appearing in both lists get boosted
  • Final ranking balances both search methods
RRF Score = Σ 1/(k + rank)

where k = 60 (constant to prevent overweighting top results)

Example:
- Doc A: Vector rank 1, Keyword rank 10
  Score = 1/(60+1) + 1/(60+10) = 0.016 + 0.014 = 0.030

- Doc B: Vector rank 3, Keyword rank 2
  Score = 1/(60+3) + 1/(60+2) = 0.016 + 0.016 = 0.032

Doc B ranks higher (appears well in both lists)

Tuning Hybrid Search

Adjusting Weights

Enterprise plans can adjust the balance between vector and keyword search:

{
  "hybridSearchConfig": {
    "vectorWeight": 0.7,  // 70% vector
    "keywordWeight": 0.3   // 30% keyword
  }
}

Best Configurations

General0.7 / 0.3

Default balance, good for most cases

Technical docs0.5 / 0.5

Equal weight for semantic and keyword

Product catalog0.4 / 0.6

Favor keyword matches for SKUs, IDs

Conversational0.8 / 0.2

Favor semantic for natural queries

Debugging Hybrid Search

  1. Test a query that includes specific terms/codes
  2. Compare results with hybrid enabled vs disabled
  3. Check if keyword-specific content is being found
  4. Review source citations for expected matches

API Response

Source citations include the search method that found each result, helping you debug which method contributed each source.