The Bulk Download API allows you to export large volumes of historical data as Parquet files. This is ideal for data analysis workflows, backfilling local databases, or integrating with tools like pandas, Spark, or other data processing frameworks.

Overview

The bulk download process is asynchronous and consists of two steps:

Submit a job - Request data for a specific vertical and date range
Poll for completion - Check job status until files are ready, then download via pre-signed URLs

⚠️
Important
Pre-signed URLs expire after 1 hour (3600 seconds). Download your files promptly after the job completes.

Submitting a Download Job

Submit a bulk download job by specifying the vertical and date range.

POST /bulk-download/job-submit

Parameter	Type	Description
`startDate`	string	Start month in `YYYY-MM` format (e.g., `2024-01`)
`endDate`	string	End month in `YYYY-MM` format (e.g., `2024-12`)
`vertical`	string	Product vertical (see valid values below)

Valid verticals: JET, NAPHTHA, ULSD, CRUDE, GASOLINE, FUEL_OIL

📘
Example
POST /bulk-download/job-submit?startDate=2024-01&endDate=2024-12&vertical=CRUDE
Returns 202 Accepted on success:
{
    "job_id": "eyJxIjpudWxsLCJ1IjoyOTM3LC...",
    "date_range": {
        "effective_start": "2024-01",
        "effective_end": "2024-12"
    }
}

⚠️
Date Range Adjustments
The API may adjust your requested date range based on your access permissions. Compare effective_start and effective_end in the response with your requested dates to verify the actual range being processed.

Checking Job Status

Poll the job status endpoint until the job completes.

GET /bulk-download/job-status/{job_id}

Status	Description
`RUNNING`	Job is still processing
`SUCCEEDED`	Job completed - pre-signed URLs are available
`FAILED`	Job failed - check error details

📘

Example

GET /bulk-download/job-status/{job_id}

{
    "status": "SUCCEEDED",
    "presigned_urls": [
        {
            "url": "https://s3.amazonaws.com/bucket/...",
            "path": "vertical=Gasoline/year=2024/month=11",
            "size": 17530971
        },
        {
            "url": "https://s3.amazonaws.com/bucket/...",
            "path": "vertical=Gasoline/year=2024/month=12",
            "size": 18232390
        }
    ]
}

Data Format

Files are exported as Apache Parquet format, partitioned by:

vertical - Product vertical (e.g., Gasoline, Crude, Jet)
year - Four-digit year
month - Two-digit month

Each parquet file contains the following columns:

id - Unique identifier for the record
generated_on - Timestamp when the data was generated
tenor_name - Tenor period name (e.g., "Apr 25")
price - Price value
meta_data - JSON object with additional metadata
dependencies - JSON array of dependency records
time_snapshot - Snapshot type (e.g., "london_snapshot")

Python Recipes

We provide ready-to-use Python recipes for common bulk download workflows. These require requests, pandas, and pyarrow:

pip install requests pandas pyarrow

See our recipes for complete, copy-paste examples:

Recipe	Description
Complete Bulk Download Workflow	Submit a job, poll for completion, and load data into a pandas DataFrame
Download to Local Files	Save parquet files locally with meaningful filenames
Batch Processing with PyArrow	Process large datasets incrementally without loading everything into memory

Error Handling

Common error scenarios and how to handle them:

Error	Cause	Solution
`403 Forbidden`	Invalid or expired token	Refresh your JWT token
`400 Bad Request`	Invalid parameters	Check date format (YYYY-MM) and vertical name
Job `FAILED` status	Server-side processing error	Retry the job or contact support
URL expired	Presigned URL past 1-hour expiry	Re-run the job to get fresh URLs