Skip to content

Better Performance

This release delivers significant performance improvements for situations with many concurrent pipelines, making Tenzir more robust under high-load scenarios. New features include AWS role assumption support, enhanced string trimming functionality, and improved HTTP error handling capabilities. Additionally, this release adds several new operators and comes with various bug fixes.

Download the release on GitHub.

Roles in save_s3 and to_amazon_security_lake

Section titled “Roles in save_s3 and to_amazon_security_lake”

We have added new options to assume a role to the save_s3 and to_amazon_security_lake operators. You can specify an AWS role and the operator(s) will assume this role for authorization and optionally. Additionally you can specify an external_id to use alongside the role.

By @raxyte, @IyeOnline in #5391.

The trim(), trim_start(), and trim_end() functions can now remove specific characters from strings, not just whitespace. Pass a second argument containing a string where each character represents a character to remove:

from {
path: "/path/to/file/".trim("/"),
decorated: "--hello--world--".trim("-"),
complex: "/-/data/-/".trim("/-")
}
{
path: "path/to/file",
decorated: "hello--world",
complex: "data"
}

Each character in the second argument is treated individually, not as a complete string to match:

from {
// Removes 'a', 'e', and 'g' from both ends
chars: "abcdefg".trim("aeg"),
// Removes any 'o', 'l', 'e', or 'h' from both ends
word: "helloworldhello".trim("olleh")
}
{
chars: "bcdf",
word: "wr"
}

This also works with trim_start() and trim_end() for one-sided trimming:

from {
start: "///api/v1/users".trim_start("/"),
end: "data.csv.tmp.....".trim_end(".")
}
{
start: "api/v1/users",
end: "data.csv"
}

By @mavam in #5389.

The from_http and http operators now provide an error_field option that lets you specify a field to receive the error response as a blob. When you set this option, the operators keep events with status codes outside the 200–399 range so you can handle them manually.

By @raxyte in #5358.

Versioned sources in to_amazon_security_lake operator

Section titled “Versioned sources in to_amazon_security_lake operator”

The to_amazon_security_lake operator now supports versioned custom sources, such as

let $lake_url = "s3://aws-security-data-lake-eu-west-2-lake-abcdefghijklmnopqrstuvwxyz1234/ext/tnz-ocsf-dns/1.0/"
to_amazon_security_lake $lake_url, …

By @IyeOnline in #5369.

The new drop_null_fields operator removes fields containing null values from events. Without arguments, it drops all fields with null values. With field arguments, it drops the specified fields if they contain null values, and for record fields, it also recursively drops all null fields within them.

Drop all null fields:

from {
id: 42,
user: {name: "alice", email: null},
status: null,
tags: ["security", "audit"]
}
drop_null_fields
{
id: 42,
user: {
name: "alice",
},
tags: [
"security",
"audit",
],
}

Drop specific null fields:

from {
id: 42,
user: {name: "alice", email: null},
status: null,
tags: ["security", "audit"]
}
drop_null_fields user.email
{
id: 42,
user: {
name: "alice",
},
status: null,
tags: [
"security",
"audit",
],
}

Note that status remains because it wasn’t specified in the field list.

When specifying a record field, all null fields within it are removed:

from {
user: {name: "alice", email: null, role: null},
settings: {theme: "dark", notifications: null}
}
drop_null_fields user
{
user: {
name: "alice",
},
settings: {
theme: "dark",
notifications: null,
},
}

The user.email and user.role fields are removed because they are null fields within the specified user record. The settings.notifications field remains because settings was not specified.

By @mavam in #5370.

Tenzir’s does not support all types that Parquet supports. We have enabled the read_parquet operator to accept more types that are convertible to supported types. It will convert integer, floating point, and time types to the appropriate (wider) Tenzir type. For example, if your Parquet file contains a column of type int32, it will now be read in as int64 instead of rejecting the entire file.

By @IyeOnline in #5373.

The to_google_secops operator now supports dynamic log_types. You can set the option to any expression evaluating to a string, e.g.:

from {type: "CUSTOM_DNS", text: "..."},
{type: "BIND_DNS", text: "..."}
to_google_secops log_type=type, log_text=text, ...

By @raxyte in #5365.

The read_all operator produces a single event for its entire input stream.

By @jachris in #5368.

Account key authentication for Azure Blob Storage

Section titled “Account key authentication for Azure Blob Storage”

The load_azure_blob_storage and save_azure_blob_storage operators now support account key (shared key) authentication via a new account_key option. This provides an additional method for accessing Azure Blob Storage, alongside existing authentication options.

By @raxyte in #5380.

Tenzir can now handle significantly more concurrent pipelines without becoming unresponsive. These improvements make the system significantly more robust under high load, with response times remaining stable even with thousands of concurrent pipelines.

By @jachris in #5382.

The context::enrich operator now allows using mode="append" even if the enrichment does not have the exact same type as the existing type, as long as they are compatible.

Furthermore, mode="ocsf" now returns null if no enrichment took place instead of a record with a null data field.

By @jachris in #5388.

The data flowing through the context:: family of operators is no longer counted as actual ingress and egress.

By @jachris in #5383.

Fixed secrets in headers argument in from_http

Section titled “Fixed secrets in headers argument in from_http”

We fixed a crash when using a secret in the headers argument of the from_http operator.

By @IyeOnline in #5376.

Tenzir and the read_parquet operator only support a subset of all Parquet types. Reading an unsupported Parquet file could previously crash Tenzir in some situations. This is now fixed and the operator instead raises an error.

By @IyeOnline in #5373.

Fixed issue with table creation in to_clickhouse

Section titled “Fixed issue with table creation in to_clickhouse”

Multiple to_clickhouse operators can now attempt to create the same ClickHouse table at the same time without an error.

By @IyeOnline in #5360.

Fixed to_amazon_security_lake partitioning

Section titled “Fixed to_amazon_security_lake partitioning”

The to_amazon_security_lake incorrectly partitioned as …/accountID=…. It now uses the correct …/accountId=….

By @IyeOnline in #5369.

Previously, the map function would return the input list when the input was empty, possibly producing type warnings downstream. It now correctly returns list<null> instead.

By @jachris in #5385.

Formatting ip and subnet values in to_amazon_security_lake

Section titled “Formatting ip and subnet values in to_amazon_security_lake”

The to_amazon_security_lake operator now correctly formats ip and subnet values as strings and formats timestamps using millisecond precision, similar to the Security Lake built-in sources.

By @raxyte in #5387.

Last updated: