PDF Documents

Upload PDF files directly to your knowledge base. Great for manuals, documentation, reports, and any static documents.

Supported Formats

PDF — Standard PDF documents
Scanned PDFs — OCR is automatically applied
Encrypted PDFs — Password-protected files (password required)

File Limits

Max file sizevaries

Free: 10MB, Starter: 50MB, Pro: 100MB, Enterprise: Custom

Max pagesvaries

Free: 100 pages, Starter: 500 pages, Pro: 1,000 pages per file

Supported typesformat

.pdf, .doc, .docx, .txt, .md, .pptx

Uploading Files

Navigate to Knowledge Base

From your bot's dashboard, click Knowledge Base in the sidebar.

Add File Source

Click + Add Source and select Upload Files.

Upload Files

Drag and drop files into the upload area, or click to browse. You can upload multiple files at once.

Wait for Processing

Files are automatically processed:

Text extraction
OCR for scanned documents
Chunking into sections
Embedding generation
Indexing for search

Processing time depends on file size (typically 1-5 minutes).

API Upload

Upload files programmatically using multipart form data:

curl -X POST https://api.ragchats.ai/api/data-sources/upload/ \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@/path/to/document.pdf" \
  -F "name=Product Manual" \
  -F "organizationId=org_123"

Processing Details

Text Extraction

RAG Chats extracts text while preserving structure like headers, paragraphs, lists, and tables. Metadata (author, creation date, etc.) is also extracted.

OCR for Scanned Documents

Scanned PDFs and images are processed with OCR (Optical Character Recognition) to extract text. Quality depends on scan resolution.

Best Results

For scanned documents, use at least 300 DPI resolution and ensure good contrast between text and background.

Chunking Strategy

Documents are split into chunks for optimal retrieval:

Chunk size: ~500 tokens
Overlap: ~50 tokens for context continuity
Headers preserved for context
Tables kept intact when possible

Best Practices

Use descriptive filenames: "Q4-2024-Product-Manual.pdf" is better than "doc123.pdf"
Ensure text is selectable: Native PDFs work better than scanned images
Remove irrelevant pages: Cover pages, blank pages, and appendices can add noise
Use clear headings: Well-structured documents produce better chunks
Update regularly: Re-upload documents when content changes

Troubleshooting

Upload fails

Check file size is within limits
Ensure file is not corrupted
Try removing password protection

Poor text extraction

Use higher resolution scans
Ensure document has good contrast
Consider re-saving as native PDF from source application

Bot doesn't find content

Wait for processing to complete (check status)
Verify document contains the expected text
Try lowering similarity threshold in bot settings