AI Models
Choose the AI model that powers your bot. Each model has different capabilities, speeds, and costs.
Available Models
OpenAI
GPT-4oRecommendedBest overall performance. Fast, accurate, and excellent at following instructions. Best for most production use cases.
GPT-4 TurboHigh-end128K context window. Great for very long documents or complex reasoning. More expensive than GPT-4o.
GPT-3.5 TurboBudgetFaster and cheaper than GPT-4. Good for simple Q&A where cost is a concern.
Anthropic
Claude 3 OpusPremiumMost capable Claude model. Excellent reasoning, nuanced responses. Best for complex tasks.
Claude 3 SonnetBalancedGreat balance of performance and cost. Recommended for production workloads.
Claude 3 HaikuFastFastest Claude model. Best for high-volume, simple queries where speed matters.
Gemini ProGeneralGoogle's multimodal model. Good general-purpose performance with competitive pricing.
Model Comparison
| Model | Speed | Quality | Cost | Context |
|---|---|---|---|---|
| GPT-4o | ⚡⚡⚡ | ⭐⭐⭐⭐⭐ | $$ | 128K |
| GPT-4 Turbo | ⚡⚡ | ⭐⭐⭐⭐⭐ | $$$ | 128K |
| GPT-3.5 Turbo | ⚡⚡⚡⚡ | ⭐⭐⭐ | $ | 16K |
| Claude 3 Opus | ⚡⚡ | ⭐⭐⭐⭐⭐ | $$$$ | 200K |
| Claude 3 Sonnet | ⚡⚡⚡ | ⭐⭐⭐⭐ | $$ | 200K |
| Claude 3 Haiku | ⚡⚡⚡⚡⚡ | ⭐⭐⭐ | $ | 200K |
| Gemini Pro | ⚡⚡⚡ | ⭐⭐⭐⭐ | $$ | 32K |
When to Use Each Model
GPT-4o (Recommended)
- General production use
- Customer support bots
- Documentation assistants
- Best balance of speed, quality, and cost
GPT-3.5 Turbo
- High-volume, simple Q&A
- Cost-sensitive applications
- Internal tools where speed matters more than nuance
Claude 3 Models
- Complex reasoning tasks
- Very long documents (200K context)
- When you need more nuanced responses
- Opus for highest quality, Haiku for speed
Model Parameters
Temperature
temperaturenumberdefault: 0.7Controls randomness in responses. Range: 0-2
- 0.0 - 0.3: Deterministic, factual responses
- 0.4 - 0.7: Balanced (recommended)
- 0.8 - 1.0: More creative, varied responses
- 1.0+: Very creative, may be less coherent
For RAG Applications
Lower temperatures (0.3-0.5) often work better for RAG bots where accuracy matters more than creativity.
Max Tokens
maxTokensintegerdefault: 1000Maximum length of generated response in tokens
- 500-1000: Concise responses (recommended)
- 1000-2000: Detailed explanations
- 2000+: Long-form content, documentation
Changing Models
- Go to bot Settings → Model
- Select a new model from the dropdown
- Adjust temperature and max tokens if needed
- Test with sample queries before deploying
Model Switching
Different models may respond differently to the same prompt. Test thoroughly when switching models, especially your system prompt.
Cost Optimization
- Start with GPT-4o: Best value for most use cases
- Consider GPT-3.5: For simple queries or cost-sensitive applications
- Optimize context: Lower top_k reduces input tokens
- Limit max tokens: Keep responses concise

