The PDF API provides two methods to parse PDFs: from a URL or by uploading a file directly.
All requests must be authenticated with your API key using a Bearer token:
Authorization: Bearer YOUR_API_KEY
Each successful PDF API call uses 5 credits per page. For example, a 10-page PDF would use 50 credits.
To parse a PDF from a URL, perform a GET request to the following URL:
| Parameter | Type | Description |
|---|---|---|
url |
string | URL of the PDF file to parse. |
output |
string | (Optional) Output format. Either "html" (default) or "text". |
use_cache |
bool | (Optional) Specify if use of cache is permitted. Defaults to true. If set to the string "false", the cache is bypassed. |
To parse a PDF file directly, perform a POST request to the following URL:
Send a multipart form-data request with the following parameters:
| Parameter | Type | Description |
|---|---|---|
file |
file | PDF file to parse (required). |
url |
string | (Optional) URL reference for the PDF file. |
output |
string | (Optional) Output format. Either "html" (default) or "text". |
use_cache |
bool | (Optional) Specify if use of cache is permitted. Defaults to true. If set to the string "false", the cache is bypassed. |
On success, both APIs return a JSON dictionary containing the following keys:
| Key | Description |
|---|---|
url |
URL of the PDF (for URL API) or empty string (for File API) |
title |
Title |
site_name |
Website Name |
thumbnail |
Thumbnail Image |
description |
PDF description |
author |
Author's Name |
date |
Published date (UNIX time) |
html |
HTML with body of PDF (present when output is "html") |
text |
Plain text body of PDF (present when output is "text") |
words |
Number of words in the PDF |
is_rtl |
Always false for PDFs |
images |
List of images in the PDF |
videos |
Always empty array for PDFs |
Possible status codes:
| Code | Reason |
|---|---|
| 200 | Success |
| 400 | Parameter missing or malformed |
| 401 | API key is invalid |
| 403 | Account suspended (payment error) |
| 409 | Exceeded monthly calls (Trial plan only) |
| 412 | Upstream parsing error or invalid response from parser |
| 429 | Rate limit exceeded |