Skip to main content
A pipeline is the AI model configuration used for an extraction run. You set a default pipeline per extraction table via the settings API, and it applies to every run on that table unless overridden.

Comparison

PipelineSpeedAccuracyBest for
cloudsquid-flashFastestGoodHigh-volume, cost-sensitive, simple schemas
cloudsquid-flash-v3FastBetterDefault starting point — good balance of speed and quality
cloudsquid-pro-v2Slower (10–30s)HighComplex documents, nested schemas
cloudsquid-pro-v3SlowerHighestDense or visually complex layouts, highest accuracy requirements
Start with cloudsquid-flash-v3 for all new tables. Switch to cloudsquid-pro-v3 only if accuracy on complex document layouts is insufficient.

Bounding boxes

Enabling bounding_boxes adds source-location metadata to each extracted value — a reference back to the exact position in the original document. Useful for auditability and human review workflows, but increases processing time regardless of pipeline.

How to set a pipeline

Use the extraction settings endpoint to update a table’s active pipeline.
import requests

requests.patch(
    "https://api.cloudsquid.io/api/projects/my-project/tables/TABLE_ID/extraction-settings",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "active_pipeline": "cloudsquid-flash-v3",
        "bounding_boxes": False,
        "review_mode": False
    }
)

Table Types

How Extraction, Reconcile, and Storage tables work together.

Async Run Pattern

The three-step upload → start → poll flow.