Async Run Pattern

Most extraction integrations follow a three-step pattern: upload the file, start a run, then poll until results are ready. This decouples ingestion from processing so your system doesn’t have to block on a long-running AI call.

Sync vs async — which to use?

Synchronous (/extract)

One HTTP call. Blocks until done, returns the result directly. Best for interactive integrations, small files, or when you need the result immediately. Use when processing typically completes in under 60 seconds.

Async (3-step)

Upload, start, poll. Each step returns immediately. Best for large files, batch workloads, or when you want to decouple ingestion from processing.

The three steps

Upload the file

POST /projects/{project_name}/tables/{table_id}/filesSend the file along with its metadata. This creates a row in the extraction table associated with the file but does not start processing.Returns: { "row_id": "uuid" } — hold onto this.

Start a run

POST /projects/{project_name}/tables/{table_id}/runPass the row_id from step 1. This enqueues the AI job and returns immediately.Returns: { "run_id": "uuid" } — use this to poll.Optional: pass "zero_retention": true to run without persisting intermediate results.

Poll for results

GET /projects/{project_name}/tables/{table_id}/run/{run_id}Poll until status is done or error. A typical poll interval is 2–5 seconds.Returns a DataRow with the current status and, when done, the extracted data.

Run status values

Status	Meaning
`pending`	Queued, not yet started
`running`	AI is actively processing
`done`	Extraction complete — `data` field is populated
`error`	Processing failed

Full example

import time, requests, base64

API_KEY = "YOUR_API_KEY"
BASE = "https://api.cloudsquid.io/api"
HEADERS = {"X-API-Key": API_KEY, "Content-Type": "application/json"}
PROJECT = "my-project"
TABLE_ID = "your-table-uuid"

# Step 1: Upload
with open("invoice.pdf", "rb") as f:
    file_b64 = base64.b64encode(f.read()).decode()

upload = requests.post(
    f"{BASE}/projects/{PROJECT}/tables/{TABLE_ID}/files",
    headers=HEADERS,
    json={
        "file": file_b64,
        "filename": "invoice.pdf",
        "mimetype": "application/pdf",
        "file_type": "binary"
    }
)
row_id = upload.json()["row_id"]

# Step 2: Start run
run = requests.post(
    f"{BASE}/projects/{PROJECT}/tables/{TABLE_ID}/run",
    headers=HEADERS,
    json={"row_id": row_id}
)
run_id = run.json()["run_id"]

# Step 3: Poll
while True:
    result = requests.get(
        f"{BASE}/projects/{PROJECT}/tables/{TABLE_ID}/run/{run_id}",
        headers=HEADERS
    ).json()
    if result["status"] == "done":
        print(result["data"])
        break
    elif result["status"] == "error":
        raise RuntimeError("Extraction failed")
    time.sleep(3)

File type options

The file_type field controls how Cloudsquid reads the file value:

`file_type`	`file` field	Use when
`binary`	Base64-encoded file content	Uploading directly from disk
`uri`	Signed URL string	File is hosted remotely — Cloudsquid fetches it
`multipart`	Array of parts	Email (RFC 822) with attachments

Prefer one call? Use synchronous extraction

The /extract endpoint blocks until done and returns results in a single response.

Getting Started

Core Concepts

Integrations

Sync vs async — which to use?

Synchronous (/extract)

Async (3-step)

The three steps

Run status values

Full example

File type options

Prefer one call? Use synchronous extraction

Getting Started

Core Concepts

Integrations

​Sync vs async — which to use?

Synchronous (/extract)

Async (3-step)

​The three steps

​Run status values

​Full example

​File type options

Prefer one call? Use synchronous extraction

Sync vs async — which to use?

The three steps

Run status values

Full example

File type options