This release delivers significant performance improvements for situations with many concurrent pipelines, making Tenzir more robust under high-load scenarios. New features include AWS role assumption support, enhanced string trimming functionality, and improved HTTP error handling capabilities. Additionally, this release adds several new operators and comes with various bug fixes.
Download the release on GitHub.
Features
Section titled “Features”Roles in save_s3
and to_amazon_security_lake
Section titled “Roles in save_s3 and to_amazon_security_lake”We have added new options to assume a role to the save_s3
and
to_amazon_security_lake
operators. You can specify an AWS role
and the
operator(s) will assume this role for authorization and optionally. Additionally
you can specify an external_id
to use alongside the role.
By @raxyte, @IyeOnline in #5391.
Trimming custom characters
Section titled “Trimming custom characters”The trim()
, trim_start()
, and trim_end()
functions can now remove
specific characters from strings, not just whitespace. Pass a second argument
containing a string where each character represents a character to remove:
from { path: "/path/to/file/".trim("/"), decorated: "--hello--world--".trim("-"), complex: "/-/data/-/".trim("/-")}
{ path: "path/to/file", decorated: "hello--world", complex: "data"}
Each character in the second argument is treated individually, not as a complete string to match:
from { // Removes 'a', 'e', and 'g' from both ends chars: "abcdefg".trim("aeg"), // Removes any 'o', 'l', 'e', or 'h' from both ends word: "helloworldhello".trim("olleh")}
{ chars: "bcdf", word: "wr"}
This also works with trim_start()
and trim_end()
for one-sided trimming:
from { start: "///api/v1/users".trim_start("/"), end: "data.csv.tmp.....".trim_end(".")}
{ start: "api/v1/users", end: "data.csv"}
Handling HTTP error status codes
Section titled “Handling HTTP error status codes”The from_http
and http
operators now provide an error_field
option that
lets you specify a field to receive the error response as a blob
. When you set
this option, the operators keep events with status codes outside the 200–399
range so you can handle them manually.
Versioned sources in to_amazon_security_lake
operator
Section titled “Versioned sources in to_amazon_security_lake operator”The to_amazon_security_lake
operator now supports versioned custom sources,
such as
let $lake_url = "s3://aws-security-data-lake-eu-west-2-lake-abcdefghijklmnopqrstuvwxyz1234/ext/tnz-ocsf-dns/1.0/"to_amazon_security_lake $lake_url, …
By @IyeOnline in #5369.
Dropping null fields
Section titled “Dropping null fields”The new drop_null_fields
operator removes fields containing null values from
events. Without arguments, it drops all fields with null values. With field
arguments, it drops the specified fields if they contain null values, and for
record fields, it also recursively drops all null fields within them.
Drop all null fields:
from { id: 42, user: {name: "alice", email: null}, status: null, tags: ["security", "audit"]}drop_null_fields
{ id: 42, user: { name: "alice", }, tags: [ "security", "audit", ],}
Drop specific null fields:
from { id: 42, user: {name: "alice", email: null}, status: null, tags: ["security", "audit"]}drop_null_fields user.email
{ id: 42, user: { name: "alice", }, status: null, tags: [ "security", "audit", ],}
Note that status
remains because it wasn’t specified in the field list.
When specifying a record field, all null fields within it are removed:
from { user: {name: "alice", email: null, role: null}, settings: {theme: "dark", notifications: null}}drop_null_fields user
{ user: { name: "alice", }, settings: { theme: "dark", notifications: null, },}
The user.email
and user.role
fields are removed because they are null fields
within the specified user
record. The settings.notifications
field remains
because settings
was not specified.
More supported types in read_parquet
Section titled “More supported types in read_parquet”Tenzir’s does not support all types that Parquet supports. We have enabled the
read_parquet
operator to accept more types that are convertible to supported
types. It will convert integer, floating point, and time types to the appropriate
(wider) Tenzir type. For example, if your Parquet file contains a column of type
int32
, it will now be read in as int64
instead of rejecting the entire file.
By @IyeOnline in #5373.
Dynamic log_type
for to_google_secops
Section titled “Dynamic log_type for to_google_secops”The to_google_secops
operator now supports dynamic log_type
s. You can set
the option to any expression evaluating to a string, e.g.:
from {type: "CUSTOM_DNS", text: "..."}, {type: "BIND_DNS", text: "..."}to_google_secops log_type=type, log_text=text, ...
New read_all
operator
Section titled “New read_all operator”The read_all
operator produces a single event for its entire input stream.
Account key authentication for Azure Blob Storage
Section titled “Account key authentication for Azure Blob Storage”The load_azure_blob_storage
and save_azure_blob_storage
operators now
support account key (shared key) authentication via a new account_key
option.
This provides an additional method for accessing Azure Blob Storage, alongside
existing authentication options.
Changes
Section titled “Changes”Performance improvements
Section titled “Performance improvements”Tenzir can now handle significantly more concurrent pipelines without becoming unresponsive. These improvements make the system significantly more robust under high load, with response times remaining stable even with thousands of concurrent pipelines.
Improvements to context::enrich
Section titled “Improvements to context::enrich”The context::enrich
operator now allows using mode="append"
even if the
enrichment does not have the exact same type as the existing type, as long as
they are compatible.
Furthermore, mode="ocsf"
now returns null
if no enrichment took place
instead of a record with a null
data field.
Bug Fixes
Section titled “Bug Fixes”Context operator metrics
Section titled “Context operator metrics”The data flowing through the context::
family of operators is no longer
counted as actual ingress and egress.
Fixed secrets in headers
argument in from_http
Section titled “Fixed secrets in headers argument in from_http”We fixed a crash when using a secret in the headers
argument of the from_http
operator.
By @IyeOnline in #5376.
Fixed crash in read_parquet
Section titled “Fixed crash in read_parquet”Tenzir and the read_parquet
operator only support a subset of all Parquet types.
Reading an unsupported Parquet file could previously crash Tenzir in some
situations. This is now fixed and the operator instead raises an error.
By @IyeOnline in #5373.
Fixed issue with table creation in to_clickhouse
Section titled “Fixed issue with table creation in to_clickhouse”Multiple to_clickhouse
operators can now attempt to create the same ClickHouse
table at the same time without an error.
By @IyeOnline in #5360.
Fixed to_amazon_security_lake
partitioning
Section titled “Fixed to_amazon_security_lake partitioning”The to_amazon_security_lake
incorrectly partitioned as …/accountID=…
. It now
uses the correct …/accountId=…
.
By @IyeOnline in #5369.
Return type of map
for empty lists
Section titled “Return type of map for empty lists”Previously, the map
function would return the input list when the input was
empty, possibly producing type warnings downstream. It now correctly returns
list<null>
instead.
Formatting ip
and subnet
values in to_amazon_security_lake
Section titled “Formatting ip and subnet values in to_amazon_security_lake”The to_amazon_security_lake
operator now correctly formats ip
and subnet
values as strings and formats timestamps using millisecond precision, similar
to the Security Lake built-in sources.