AI Models

Choose the AI model that powers your bot. Each model has different capabilities, speeds, and costs.

Available Models

OpenAI

GPT-4oRecommended

Best overall performance. Fast, accurate, and excellent at following instructions. Best for most production use cases.

GPT-4 TurboHigh-end

128K context window. Great for very long documents or complex reasoning. More expensive than GPT-4o.

GPT-3.5 TurboBudget

Faster and cheaper than GPT-4. Good for simple Q&A where cost is a concern.

Anthropic

Claude 3 OpusPremium

Most capable Claude model. Excellent reasoning, nuanced responses. Best for complex tasks.

Claude 3 SonnetBalanced

Great balance of performance and cost. Recommended for production workloads.

Claude 3 HaikuFast

Fastest Claude model. Best for high-volume, simple queries where speed matters.

Google

Gemini ProGeneral

Google's multimodal model. Good general-purpose performance with competitive pricing.

Model Comparison

ModelSpeedQualityCostContext
GPT-4o⚡⚡⚡⭐⭐⭐⭐⭐$$128K
GPT-4 Turbo⚡⚡⭐⭐⭐⭐⭐$$$128K
GPT-3.5 Turbo⚡⚡⚡⚡⭐⭐⭐$16K
Claude 3 Opus⚡⚡⭐⭐⭐⭐⭐$$$$200K
Claude 3 Sonnet⚡⚡⚡⭐⭐⭐⭐$$200K
Claude 3 Haiku⚡⚡⚡⚡⚡⭐⭐⭐$200K
Gemini Pro⚡⚡⚡⭐⭐⭐⭐$$32K

When to Use Each Model

GPT-4o (Recommended)

  • General production use
  • Customer support bots
  • Documentation assistants
  • Best balance of speed, quality, and cost

GPT-3.5 Turbo

  • High-volume, simple Q&A
  • Cost-sensitive applications
  • Internal tools where speed matters more than nuance

Claude 3 Models

  • Complex reasoning tasks
  • Very long documents (200K context)
  • When you need more nuanced responses
  • Opus for highest quality, Haiku for speed

Model Parameters

Temperature

temperaturenumberdefault: 0.7

Controls randomness in responses. Range: 0-2

  • 0.0 - 0.3: Deterministic, factual responses
  • 0.4 - 0.7: Balanced (recommended)
  • 0.8 - 1.0: More creative, varied responses
  • 1.0+: Very creative, may be less coherent

For RAG Applications

Lower temperatures (0.3-0.5) often work better for RAG bots where accuracy matters more than creativity.

Max Tokens

maxTokensintegerdefault: 1000

Maximum length of generated response in tokens

  • 500-1000: Concise responses (recommended)
  • 1000-2000: Detailed explanations
  • 2000+: Long-form content, documentation

Changing Models

  1. Go to bot Settings → Model
  2. Select a new model from the dropdown
  3. Adjust temperature and max tokens if needed
  4. Test with sample queries before deploying

Model Switching

Different models may respond differently to the same prompt. Test thoroughly when switching models, especially your system prompt.

Cost Optimization

  • Start with GPT-4o: Best value for most use cases
  • Consider GPT-3.5: For simple queries or cost-sensitive applications
  • Optimize context: Lower top_k reduces input tokens
  • Limit max tokens: Keep responses concise