Skip to content

Writes events to one or multiple blobs in Azure Blob Storage.

to_azure_blob_storage url:string, [account_key=string, max_size=int,
timeout=duration, partition_by=list<field>] {}

The to_azure_blob_storage operator writes events to Azure Blob Storage, automatically opening new blobs when a rotation condition triggers. It supports hive-style partitioning through a ** placeholder in the URL and per-partition unique blob names through a {uuid} placeholder.

By default, authentication is handled by the Azure SDK’s credential chain which may read from multiple environment variables, such as:

  • AZURE_TENANT_ID
  • AZURE_CLIENT_ID
  • AZURE_CLIENT_SECRET
  • AZURE_AUTHORITY_HOST
  • AZURE_CLIENT_CERTIFICATE_PATH
  • AZURE_FEDERATED_TOKEN_FILE

URL identifying the Azure Blob Storage location where data should be written.

Supported URI formats:

  1. abfs[s]://<account>.blob.core.windows.net[/<container>[/<path>]]
  2. abfs[s]://<container>@<account>.dfs.core.windows.net[/<path>]
  3. abfs[s]://[<account>@]<host>[.<domain>][:<port>][/<container>[/<path>]]
  4. abfs[s]://[<account>@]<container>[/<path>]

(1) and (2) are compatible with the Azure Data Lake Storage Gen2 URIs, (3) is for Azure Blob Storage compatible services including Azurite, and (4) is a shorter version of (1) and (2).

The path portion may contain two placeholders:

  • ** marks the location where hive partition segments are inserted. When present, partition_by must also be set, and vice versa.
  • {uuid} expands to a fresh UUIDv7 for every blob. This is required when partitioning or when rotation can produce multiple blobs, so that rotated or per-partition blobs do not overwrite each other.

Account key for authenticating with Azure Blob Storage.

Rotates to a new file after the current file exceeds this size in bytes. Because rotation only fires after the threshold is crossed, individual files may be slightly larger than max_size.

Defaults to 100M.

Rotates to a new file after the current file has been open for this duration. Rotation is measured from the time the file was first opened.

Defaults to 5min.

A list of fields used to partition events into separate files. For every distinct combination of partition-field values, a separate file (or group of rotated files) is written. The URL must contain a ** placeholder, which is replaced by the hive-style path field1=value1/field2=value2/….

Unlike to_hive, the partitioning fields are not stripped from the written events — they remain in each record.

Pipeline that transforms the incoming events into the byte stream that is written to each file. The pipeline must return bytes, so it must end with a writer such as write_ndjson, write_parquet, or write_csv, optionally followed by a compressor such as compress_gzip.

The subpipeline runs once per output file. When rotation or partitioning produces a new file, a new instance of the subpipeline is spawned for it.

to_azure_blob_storage "abfs://my-container/events/out_{uuid}.json" {
write_ndjson
}

Partition events into Parquet blobs by date

Section titled “Partition events into Parquet blobs by date”
to_azure_blob_storage "abfs://my-container/events/**/data_{uuid}.parquet",
partition_by=[year, month] {
write_parquet
}
to_azure_blob_storage "abfs://container/data/out_{uuid}.json",
account_key="your-account-key" {
write_ndjson
}
to_azure_blob_storage "abfs://my-container/logs/events_{uuid}.json",
timeout=5min {
write_ndjson
}

Last updated: