Reads one or multiple files from Azure Blob Storage.
from_azure_blob_storage url:string, [account_key=string, watch=bool, remove=bool, rename=string->string, path_field=field] { … }Description
Section titled “Description”The from_azure_blob_storage operator reads files from Azure Blob Storage, with
support for glob patterns, automatic format detection, and file monitoring.
By default, authentication is handled by the Azure SDK’s credential chain which may read from multiple environment variables, such as:
AZURE_TENANT_IDAZURE_CLIENT_IDAZURE_CLIENT_SECRETAZURE_AUTHORITY_HOSTAZURE_CLIENT_CERTIFICATE_PATHAZURE_FEDERATED_TOKEN_FILE
url: string
Section titled “url: string”URL identifying the Azure Blob Storage location where data should be read from.
The characters * and ** have a special meaning. * matches everything
except /. ** matches everything including /. The sequence /**/ can also
match nothing. For example, container/**/data matches container/data.
Supported URI formats:
abfs[s]://<account>.blob.core.windows.net[/<container>[/<path>]]abfs[s]://<container>@<account>.dfs.core.windows.net[/<path>]abfs[s]://[<account>@]<host>[.<domain>][:<port>][/<container>[/<path>]]abfs[s]://[<account>@]<container>[/<path>]
(1) and (2) are compatible with the Azure Data Lake Storage Gen2 URIs, (3) is for Azure Blob Storage compatible service including Azurite, and (4) is a shorter version of (1) and (2).
account_key = string (optional)
Section titled “account_key = string (optional)”Account key for authenticating with Azure Blob Storage.
watch = bool (optional)
Section titled “watch = bool (optional)”In addition to processing all existing files, this option keeps the operator running, watching for new files that also match the given URL. Currently, this scans the filesystem up to every 10s.
Defaults to false.
remove = bool (optional)
Section titled “remove = bool (optional)”Deletes files after they have been read completely.
Defaults to false.
rename = string -> string (optional)
Section titled “rename = string -> string (optional)”Renames files after they have been read completely. The lambda function receives the original path as an argument and must return the new path.
If the target path already exists, the operator will overwrite the file.
The operator automatically creates any intermediate directories required for the
target path. If the target path ends with a trailing slash (/), the original
filename will be automatically appended to create the final path.
path_field = field (optional)
Section titled “path_field = field (optional)”This makes the operator insert the path to the file where an event originated from before emitting it.
By default, paths will not be inserted into the outgoing events.
{ … } (optional)
Section titled “{ … } (optional)”Pipeline to use for parsing the file. By default, this pipeline is derived from the path of the file, and will not only handle parsing but also decompression if applicable.
Examples
Section titled “Examples”Read every JSON file from a container
Section titled “Read every JSON file from a container”from_azure_blob_storage "abfs://my-container/data/**.json"Read CSV files using account key authentication
Section titled “Read CSV files using account key authentication”from_azure_blob_storage "abfs://container/data.csv", account_key="your-account-key"Read Suricata EVE JSON logs continuously
Section titled “Read Suricata EVE JSON logs continuously”from_azure_blob_storage "abfs://logs/suricata/**.json", watch=true { read_suricata}Process files and move them to an archive container
Section titled “Process files and move them to an archive container”from_azure_blob_storage "abfs://input/**.json", rename=(path => "/archive/" + path)Add source path to events
Section titled “Add source path to events”from_azure_blob_storage "abfs://data/**.json", path_field=source_file