Skip to content

Partitions, Filenames and Filepaths

Partitions

Partitioning organizes data into directories based on specific fields to improve query performance. It helps by reducing the amount of data scanned for queries, enabling faster reads. By default, Pipelines partitions data by event date. This will be customizable in the future.

For example, the output from a Pipeline in your R2 bucket might look like this:

Terminal window
- event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz
- event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz

Filepath

Customizing the filepath allows you to store data with a specific prefix inside your specified R2 bucket. The data will remain partitioned by date.

To modify the prefix for a Pipeline using Wrangler:

Terminal window
wrangler pipelines update <pipeline-name> --filepath "test"

All the output records generated by your pipeline will be stored under the prefix “test”, and will look like this:

Terminal window
- test/event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz
- test/event_date=2024-09-06/hr=15/37db9289-15ba-4e8b-9231-538dc7c72c1e-15.json.gz