Writes events to a URI using hive partitioning.
to_hive uri:string, partition_by=list<field>, format=string, [timeout=duration, max_size=int]Description
Section titled “Description”Hive partitioning is a partitioning scheme where a set of fields is used to
partition events. For each combination of these fields, a directory is derived
under which all events with the same field values will be stored. For example,
if the events are partitioned by the fields year and month, then the files
in the directory /year=2024/month=10 will contain all events where
year == 2024 and month == 10.
Files within each partition directory are named using UUIDv7 for guaranteed uniqueness and natural time-based ordering. This prevents filename conflicts when multiple processes write to the same partition simultaneously.
uri: string
Section titled “uri: string”The base URI for all partitions.
partition_by = list<field>
Section titled “partition_by = list<field>”A list of fields that will be used for partitioning. Note that these fields will be elided from the output, as their value is already specified by the path.
format = string
Section titled “format = string”The name of the format that will be used for writing, for example json or
parquet. This will also be used for the file extension.
timeout = duration (optional)
Section titled “timeout = duration (optional)”The time after which a new file will be opened for the same partition group.
Defaults to 5min.
max_size = int (optional)
Section titled “max_size = int (optional)”The total file size after which a new file will be opened for the same partition
group. Note that files will typically be slightly larger than this limit,
because it opens a new file when only after it is exceeded. Defaults to 100M.
compression = string (optional)
Section titled “compression = string (optional)”Compress the output files with the given compression algorithm. See docs for the
compress operator for supported compression algorithms.
Examples
Section titled “Examples”Partition by a single field into local JSON files
Section titled “Partition by a single field into local JSON files”from {a: 0, b: 0}, {a: 0, b: 1}, {a: 1, b: 2}to_hive "/tmp/out/", partition_by=[a], format="json"// This pipeline produces two files:// -> /tmp/out/a=0/<uuid>.json:// {"b": 0}// {"b": 1}// -> /tmp/out/a=1/<uuid>.json:// {"b": 2}Write a Parquet file into Azure Blob Store
Section titled “Write a Parquet file into Azure Blob Store”Write as Parquet into the Azure Blob Filesystem, partitioned by year, month and day.
to_hive "abfs://domain/bucket", partition_by=[year, month, day], format="parquet"// -> abfs://domain/bucket/year=<year>/month=<month>/day=<day>/<uuid>.parquetWrite partitioned JSON into an S3 bucket
Section titled “Write partitioned JSON into an S3 bucket”Write JSON into S3, partitioned by year and month, opening a new file after 1 GB.
year = ts.year()month = ts.month()to_hive "s3://my-bucket/some/subdirectory", partition_by=[year, month], format="json", max_size=1G// -> s3://my-bucket/some/subdirectory/year=<year>/month=<month>/<uuid>.json