Skip to main content
Most extraction integrations follow a three-step pattern: upload the file, start a run, then poll until results are ready. This decouples ingestion from processing so your system doesn’t have to block on a long-running AI call.

Sync vs async — which to use?

Synchronous (/extract)

One HTTP call. Blocks until done, returns the result directly. Best for interactive integrations, small files, or when you need the result immediately. Use when processing typically completes in under 60 seconds.

Async (3-step)

Upload, start, poll. Each step returns immediately. Best for large files, batch workloads, or when you want to decouple ingestion from processing.

The three steps

1

Upload the file

POST /projects/{project_name}/tables/{table_id}/filesSend the file along with its metadata. This creates a row in the extraction table associated with the file but does not start processing.Returns: { "row_id": "uuid" } — hold onto this.
2

Start a run

POST /projects/{project_name}/tables/{table_id}/runPass the row_id from step 1. This enqueues the AI job and returns immediately.Returns: { "run_id": "uuid" } — use this to poll.Optional: pass "zero_retention": true to run without persisting intermediate results.
3

Poll for results

GET /projects/{project_name}/tables/{table_id}/run/{run_id}Poll until status is done or error. A typical poll interval is 2–5 seconds.Returns a DataRow with the current status and, when done, the extracted data.

Run status values

StatusMeaning
pendingQueued, not yet started
runningAI is actively processing
doneExtraction complete — data field is populated
errorProcessing failed

Full example

import time, requests, base64

API_KEY = "YOUR_API_KEY"
BASE = "https://api.cloudsquid.io/api"
HEADERS = {"X-API-Key": API_KEY, "Content-Type": "application/json"}
PROJECT = "my-project"
TABLE_ID = "your-table-uuid"

# Step 1: Upload
with open("invoice.pdf", "rb") as f:
    file_b64 = base64.b64encode(f.read()).decode()

upload = requests.post(
    f"{BASE}/projects/{PROJECT}/tables/{TABLE_ID}/files",
    headers=HEADERS,
    json={
        "file": file_b64,
        "filename": "invoice.pdf",
        "mimetype": "application/pdf",
        "file_type": "binary"
    }
)
row_id = upload.json()["row_id"]

# Step 2: Start run
run = requests.post(
    f"{BASE}/projects/{PROJECT}/tables/{TABLE_ID}/run",
    headers=HEADERS,
    json={"row_id": row_id}
)
run_id = run.json()["run_id"]

# Step 3: Poll
while True:
    result = requests.get(
        f"{BASE}/projects/{PROJECT}/tables/{TABLE_ID}/run/{run_id}",
        headers=HEADERS
    ).json()
    if result["status"] == "done":
        print(result["data"])
        break
    elif result["status"] == "error":
        raise RuntimeError("Extraction failed")
    time.sleep(3)

File type options

The file_type field controls how Cloudsquid reads the file value:
file_typefile fieldUse when
binaryBase64-encoded file contentUploading directly from disk
uriSigned URL stringFile is hosted remotely — Cloudsquid fetches it
multipartArray of partsEmail (RFC 822) with attachments

Prefer one call? Use synchronous extraction

The /extract endpoint blocks until done and returns results in a single response.