to_azure_blob_storage

Writes events to one or multiple blobs in Azure Blob Storage.

to_azure_blob_storage url:string, [account_key=string, max_size=int,
                     timeout=duration, partition_by=list<field>] { … }

Description

The to_azure_blob_storage operator writes events to Azure Blob Storage, automatically opening new blobs when a rotation condition triggers. It supports hive-style partitioning through a ** placeholder in the URL and per-partition unique blob names through a {uuid} placeholder.

By default, authentication is handled by the Azure SDK’s credential chain which may read from multiple environment variables, such as:

AZURE_TENANT_ID
AZURE_CLIENT_ID
AZURE_CLIENT_SECRET
AZURE_AUTHORITY_HOST
AZURE_CLIENT_CERTIFICATE_PATH
AZURE_FEDERATED_TOKEN_FILE

`url: string`

URL identifying the Azure Blob Storage location where data should be written.

Supported URI formats:

abfs[s]://<account>.blob.core.windows.net[/<container>[/<path>]]
abfs[s]://<container>@<account>.dfs.core.windows.net[/<path>]
abfs[s]://[<account>@]<host>[.<domain>][:<port>][/<container>[/<path>]]
abfs[s]://[<account>@]<container>[/<path>]

(1) and (2) are compatible with the Azure Data Lake Storage Gen2 URIs, (3) is for Azure Blob Storage compatible services including Azurite, and (4) is a shorter version of (1) and (2).

The path portion may contain two placeholders:

** marks the location where hive partition segments are inserted. When present, partition_by must also be set, and vice versa.
{uuid} expands to a fresh UUIDv7 for every blob. This is required when partitioning or when rotation can produce multiple blobs, so that rotated or per-partition blobs do not overwrite each other.

`account_key = string (optional)`

Account key for authenticating with Azure Blob Storage.

`max_size = int (optional)`

Rotates to a new file after the current file exceeds this size in bytes. Because rotation only fires after the threshold is crossed, individual files may be slightly larger than max_size.

Defaults to 100M.

`timeout = duration (optional)`

Rotates to a new file after the current file has been open for this duration. Rotation is measured from the time the file was first opened.

Defaults to 5min.

`partition_by = list<field> (optional)`

A list of fields used to partition events into separate files. For every distinct combination of partition-field values, a separate file (or group of rotated files) is written. The URL must contain a ** placeholder, which is replaced by the hive-style path field1=value1/field2=value2/….

Unlike to_hive, the partitioning fields are not stripped from the written events — they remain in each record.

`{ … }`

Pipeline that transforms the incoming events into the byte stream that is written to each file. The pipeline must return bytes, so it must end with a writer such as write_ndjson, write_parquet, or write_csv, optionally followed by a compressor such as compress_gzip.

The subpipeline runs once per output file. When rotation or partitioning produces a new file, a new instance of the subpipeline is spawned for it.

Examples

Write a single NDJSON blob

to_azure_blob_storage "abfs://my-container/events/out_{uuid}.json" {
  write_ndjson
}

Partition events into Parquet blobs by date

to_azure_blob_storage "abfs://my-container/events/**/data_{uuid}.parquet",
  partition_by=[year, month] {
  write_parquet
}

Write with explicit account key

to_azure_blob_storage "abfs://container/data/out_{uuid}.json",
  account_key="your-account-key" {
  write_ndjson
}

Rotate to a new blob every 5 minutes

to_azure_blob_storage "abfs://my-container/logs/events_{uuid}.json",
  timeout=5min {
  write_ndjson
}

to_azure_blob_storage

Description

url: string

account_key = string (optional)

max_size = int (optional)

timeout = duration (optional)

partition_by = list<field> (optional)

{ … }