PDF Documents

Upload PDF files directly to your knowledge base. Great for manuals, documentation, reports, and any static documents.

Supported Formats

  • PDF — Standard PDF documents
  • Scanned PDFs — OCR is automatically applied
  • Encrypted PDFs — Password-protected files (password required)

File Limits

Max file sizevaries

Free: 10MB, Starter: 50MB, Pro: 100MB, Enterprise: Custom

Max pagesvaries

Free: 100 pages, Starter: 500 pages, Pro: 1,000 pages per file

Supported typesformat

.pdf, .doc, .docx, .txt, .md, .pptx

Uploading Files

Navigate to Knowledge Base

From your bot's dashboard, click Knowledge Base in the sidebar.

Add File Source

Click + Add Source and select Upload Files.

Upload Files

Drag and drop files into the upload area, or click to browse. You can upload multiple files at once.

Wait for Processing

Files are automatically processed:

  • Text extraction
  • OCR for scanned documents
  • Chunking into sections
  • Embedding generation
  • Indexing for search

Processing time depends on file size (typically 1-5 minutes).

API Upload

Upload files programmatically using multipart form data:

curl -X POST https://api.ragchats.ai/api/data-sources/upload/ \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@/path/to/document.pdf" \
  -F "name=Product Manual" \
  -F "organizationId=org_123"

Processing Details

Text Extraction

RAG Chats extracts text while preserving structure like headers, paragraphs, lists, and tables. Metadata (author, creation date, etc.) is also extracted.

OCR for Scanned Documents

Scanned PDFs and images are processed with OCR (Optical Character Recognition) to extract text. Quality depends on scan resolution.

Best Results

For scanned documents, use at least 300 DPI resolution and ensure good contrast between text and background.

Chunking Strategy

Documents are split into chunks for optimal retrieval:

  • Chunk size: ~500 tokens
  • Overlap: ~50 tokens for context continuity
  • Headers preserved for context
  • Tables kept intact when possible

Best Practices

  • Use descriptive filenames: "Q4-2024-Product-Manual.pdf" is better than "doc123.pdf"
  • Ensure text is selectable: Native PDFs work better than scanned images
  • Remove irrelevant pages: Cover pages, blank pages, and appendices can add noise
  • Use clear headings: Well-structured documents produce better chunks
  • Update regularly: Re-upload documents when content changes

Troubleshooting

Upload fails

  • Check file size is within limits
  • Ensure file is not corrupted
  • Try removing password protection

Poor text extraction

  • Use higher resolution scans
  • Ensure document has good contrast
  • Consider re-saving as native PDF from source application

Bot doesn't find content

  • Wait for processing to complete (check status)
  • Verify document contains the expected text
  • Try lowering similarity threshold in bot settings