Bulk Download
Download historical data in bulk as Parquet files
The Bulk Download API allows you to export large volumes of historical data as Parquet files. This is ideal for data analysis workflows, backfilling local databases, or integrating with tools like pandas, Spark, or other data processing frameworks.
Overview
The bulk download process is asynchronous and consists of two steps:
- Submit a job - Request data for a specific vertical and date range
- Poll for completion - Check job status until files are ready, then download via pre-signed URLs
ImportantPre-signed URLs expire after 1 hour (3600 seconds). Download your files promptly after the job completes.
Submitting a Download Job
Submit a bulk download job by specifying the vertical and date range.
POST /bulk-download/job-submit
| Parameter | Type | Description |
|---|---|---|
startDate | string | Start month in YYYY-MM format (e.g., 2024-01) |
endDate | string | End month in YYYY-MM format (e.g., 2024-12) |
vertical | string | Product vertical (see valid values below) |
Valid verticals: JET, NAPHTHA, ULSD, CRUDE, GASOLINE, FUEL_OIL
ExamplePOST
/bulk-download/job-submit?startDate=2024-01&endDate=2024-12&vertical=CRUDEReturns 202 Accepted on success:
{ "job_id": "eyJxIjpudWxsLCJ1IjoyOTM3LC...", "date_range": { "effective_start": "2024-01", "effective_end": "2024-12" } }
Date Range AdjustmentsThe API may adjust your requested date range based on your access permissions. Compare
effective_startandeffective_endin the response with your requested dates to verify the actual range being processed.
Checking Job Status
Poll the job status endpoint until the job completes.
GET /bulk-download/job-status/{job_id}
| Status | Description |
|---|---|
RUNNING | Job is still processing |
SUCCEEDED | Job completed - pre-signed URLs are available |
FAILED | Job failed - check error details |
ExampleGET
/bulk-download/job-status/{job_id}{ "status": "SUCCEEDED", "presigned_urls": [ { "url": "https://s3.amazonaws.com/bucket/...", "path": "vertical=Gasoline/year=2024/month=11", "size": 17530971 }, { "url": "https://s3.amazonaws.com/bucket/...", "path": "vertical=Gasoline/year=2024/month=12", "size": 18232390 } ] }
Data Format
Files are exported as Apache Parquet format, partitioned by:
vertical- Product vertical (e.g., Gasoline, Crude, Jet)year- Four-digit yearmonth- Two-digit month
Each parquet file contains the following columns:
id- Unique identifier for the recordgenerated_on- Timestamp when the data was generatedtenor_name- Tenor period name (e.g., "Apr 25")price- Price valuemeta_data- JSON object with additional metadatadependencies- JSON array of dependency recordstime_snapshot- Snapshot type (e.g., "london_snapshot")
Python Recipes
We provide ready-to-use Python recipes for common bulk download workflows. These require requests, pandas, and pyarrow:
pip install requests pandas pyarrowSee our recipes for complete, copy-paste examples:
| Recipe | Description |
|---|---|
| Complete Bulk Download Workflow | Submit a job, poll for completion, and load data into a pandas DataFrame |
| Download to Local Files | Save parquet files locally with meaningful filenames |
| Batch Processing with PyArrow | Process large datasets incrementally without loading everything into memory |
Error Handling
Common error scenarios and how to handle them:
| Error | Cause | Solution |
|---|---|---|
403 Forbidden | Invalid or expired token | Refresh your JWT token |
400 Bad Request | Invalid parameters | Check date format (YYYY-MM) and vertical name |
Job FAILED status | Server-side processing error | Retry the job or contact support |
| URL expired | Presigned URL past 1-hour expiry | Re-run the job to get fresh URLs |
Updated 2 days ago