PDF Documents
Upload PDF files directly to your knowledge base. Great for manuals, documentation, reports, and any static documents.
Supported Formats
- PDF — Standard PDF documents
- Scanned PDFs — OCR is automatically applied
- Encrypted PDFs — Password-protected files (password required)
File Limits
Max file sizevariesFree: 10MB, Starter: 50MB, Pro: 100MB, Enterprise: Custom
Max pagesvariesFree: 100 pages, Starter: 500 pages, Pro: 1,000 pages per file
Supported typesformat.pdf, .doc, .docx, .txt, .md, .pptx
Uploading Files
Navigate to Knowledge Base
From your bot's dashboard, click Knowledge Base in the sidebar.
Add File Source
Click + Add Source and select Upload Files.
Upload Files
Drag and drop files into the upload area, or click to browse. You can upload multiple files at once.
Wait for Processing
Files are automatically processed:
- Text extraction
- OCR for scanned documents
- Chunking into sections
- Embedding generation
- Indexing for search
Processing time depends on file size (typically 1-5 minutes).
API Upload
Upload files programmatically using multipart form data:
curl -X POST https://api.ragchats.ai/api/data-sources/upload/ \ -H "Authorization: Bearer YOUR_TOKEN" \ -F "file=@/path/to/document.pdf" \ -F "name=Product Manual" \ -F "organizationId=org_123"
Processing Details
Text Extraction
RAG Chats extracts text while preserving structure like headers, paragraphs, lists, and tables. Metadata (author, creation date, etc.) is also extracted.
OCR for Scanned Documents
Scanned PDFs and images are processed with OCR (Optical Character Recognition) to extract text. Quality depends on scan resolution.
Best Results
For scanned documents, use at least 300 DPI resolution and ensure good contrast between text and background.
Chunking Strategy
Documents are split into chunks for optimal retrieval:
- Chunk size: ~500 tokens
- Overlap: ~50 tokens for context continuity
- Headers preserved for context
- Tables kept intact when possible
Best Practices
- Use descriptive filenames: "Q4-2024-Product-Manual.pdf" is better than "doc123.pdf"
- Ensure text is selectable: Native PDFs work better than scanned images
- Remove irrelevant pages: Cover pages, blank pages, and appendices can add noise
- Use clear headings: Well-structured documents produce better chunks
- Update regularly: Re-upload documents when content changes
Troubleshooting
Upload fails
- Check file size is within limits
- Ensure file is not corrupted
- Try removing password protection
Poor text extraction
- Use higher resolution scans
- Ensure document has good contrast
- Consider re-saving as native PDF from source application
Bot doesn't find content
- Wait for processing to complete (check status)
- Verify document contains the expected text
- Try lowering similarity threshold in bot settings

