Bulk Download

Download historical data in bulk as Parquet files

The Bulk Download API allows you to export large volumes of historical data as Parquet files. This is ideal for data analysis workflows, backfilling local databases, or integrating with tools like pandas, Spark, or other data processing frameworks.

Overview

The bulk download process is asynchronous and consists of two steps:

  1. Submit a job - Request data for a specific vertical and date range
  2. Poll for completion - Check job status until files are ready, then download via pre-signed URLs
⚠️

Important

Pre-signed URLs expire after 1 hour (3600 seconds). Download your files promptly after the job completes.

Submitting a Download Job

Submit a bulk download job by specifying the vertical and date range.

POST /bulk-download/job-submit

ParameterTypeDescription
startDatestringStart month in YYYY-MM format (e.g., 2024-01)
endDatestringEnd month in YYYY-MM format (e.g., 2024-12)
verticalstringProduct vertical (see valid values below)

Valid verticals: JET, NAPHTHA, ULSD, CRUDE, GASOLINE, FUEL_OIL


📘

Example

POST /bulk-download/job-submit?startDate=2024-01&endDate=2024-12&vertical=CRUDE

Returns 202 Accepted on success:

{
    "job_id": "eyJxIjpudWxsLCJ1IjoyOTM3LC...",
    "date_range": {
        "effective_start": "2024-01",
        "effective_end": "2024-12"
    }
}

⚠️

Date Range Adjustments

The API may adjust your requested date range based on your access permissions. Compare effective_start and effective_end in the response with your requested dates to verify the actual range being processed.


Checking Job Status

Poll the job status endpoint until the job completes.

GET /bulk-download/job-status/{job_id}

StatusDescription
RUNNINGJob is still processing
SUCCEEDEDJob completed - pre-signed URLs are available
FAILEDJob failed - check error details
📘

Example

GET /bulk-download/job-status/{job_id}

{
    "status": "SUCCEEDED",
    "presigned_urls": [
        {
            "url": "https://s3.amazonaws.com/bucket/...",
            "path": "vertical=Gasoline/year=2024/month=11",
            "size": 17530971
        },
        {
            "url": "https://s3.amazonaws.com/bucket/...",
            "path": "vertical=Gasoline/year=2024/month=12",
            "size": 18232390
        }
    ]
}

Data Format

Files are exported as Apache Parquet format, partitioned by:

  • vertical - Product vertical (e.g., Gasoline, Crude, Jet)
  • year - Four-digit year
  • month - Two-digit month

Each parquet file contains the following columns:

  • id - Unique identifier for the record
  • generated_on - Timestamp when the data was generated
  • tenor_name - Tenor period name (e.g., "Apr 25")
  • price - Price value
  • meta_data - JSON object with additional metadata
  • dependencies - JSON array of dependency records
  • time_snapshot - Snapshot type (e.g., "london_snapshot")

Python Recipes

We provide ready-to-use Python recipes for common bulk download workflows. These require requests, pandas, and pyarrow:

pip install requests pandas pyarrow

See our recipes for complete, copy-paste examples:

RecipeDescription
Complete Bulk Download WorkflowSubmit a job, poll for completion, and load data into a pandas DataFrame
Download to Local FilesSave parquet files locally with meaningful filenames
Batch Processing with PyArrowProcess large datasets incrementally without loading everything into memory

Error Handling

Common error scenarios and how to handle them:

ErrorCauseSolution
403 ForbiddenInvalid or expired tokenRefresh your JWT token
400 Bad RequestInvalid parametersCheck date format (YYYY-MM) and vertical name
Job FAILED statusServer-side processing errorRetry the job or contact support
URL expiredPresigned URL past 1-hour expiryRe-run the job to get fresh URLs