This release delivers significant performance improvements for situations with many concurrent pipelines, making Tenzir more robust under high-load scenarios. New features include AWS role assumption support, enhanced string trimming functionality, and improved HTTP error handling capabilities. Additionally, this release adds several new operators and comes with various bug fixes.
Download the release on GitHub.
Features
Section titled “Features”Roles in save_s3 and to_amazon_security_lake
Section titled “Roles in save_s3 and to_amazon_security_lake”We have added new options to assume a role to the save_s3 and
to_amazon_security_lake operators. You can specify an AWS role and the
operator(s) will assume this role for authorization and optionally. Additionally
you can specify an external_id to use alongside the role.
By @raxyte, @IyeOnline in #5391.
Trimming custom characters
Section titled “Trimming custom characters”The trim(), trim_start(), and trim_end() functions can now remove
specific characters from strings, not just whitespace. Pass a second argument
containing a string where each character represents a character to remove:
from { path: "/path/to/file/".trim("/"), decorated: "--hello--world--".trim("-"), complex: "/-/data/-/".trim("/-")}{ path: "path/to/file", decorated: "hello--world", complex: "data"}Each character in the second argument is treated individually, not as a complete string to match:
from { // Removes 'a', 'e', and 'g' from both ends chars: "abcdefg".trim("aeg"), // Removes any 'o', 'l', 'e', or 'h' from both ends word: "helloworldhello".trim("olleh")}{ chars: "bcdf", word: "wr"}This also works with trim_start() and trim_end() for one-sided trimming:
from { start: "///api/v1/users".trim_start("/"), end: "data.csv.tmp.....".trim_end(".")}{ start: "api/v1/users", end: "data.csv"}Handling HTTP error status codes
Section titled “Handling HTTP error status codes”The from_http and http operators now provide an error_field option that
lets you specify a field to receive the error response as a blob. When you set
this option, the operators keep events with status codes outside the 200–399
range so you can handle them manually.
Versioned sources in to_amazon_security_lake operator
Section titled “Versioned sources in to_amazon_security_lake operator”The to_amazon_security_lake operator now supports versioned custom sources,
such as
let $lake_url = "s3://aws-security-data-lake-eu-west-2-lake-abcdefghijklmnopqrstuvwxyz1234/ext/tnz-ocsf-dns/1.0/"to_amazon_security_lake $lake_url, …By @IyeOnline in #5369.
Dropping null fields
Section titled “Dropping null fields”The new drop_null_fields operator removes fields containing null values from
events. Without arguments, it drops all fields with null values. With field
arguments, it drops the specified fields if they contain null values, and for
record fields, it also recursively drops all null fields within them.
Drop all null fields:
from { id: 42, user: {name: "alice", email: null}, status: null, tags: ["security", "audit"]}drop_null_fields{ id: 42, user: { name: "alice", }, tags: [ "security", "audit", ],}Drop specific null fields:
from { id: 42, user: {name: "alice", email: null}, status: null, tags: ["security", "audit"]}drop_null_fields user.email{ id: 42, user: { name: "alice", }, status: null, tags: [ "security", "audit", ],}Note that status remains because it wasn’t specified in the field list.
When specifying a record field, all null fields within it are removed:
from { user: {name: "alice", email: null, role: null}, settings: {theme: "dark", notifications: null}}drop_null_fields user{ user: { name: "alice", }, settings: { theme: "dark", notifications: null, },}The user.email and user.role fields are removed because they are null fields
within the specified user record. The settings.notifications field remains
because settings was not specified.
More supported types in read_parquet
Section titled “More supported types in read_parquet”Tenzir’s does not support all types that Parquet supports. We have enabled the
read_parquet operator to accept more types that are convertible to supported
types. It will convert integer, floating point, and time types to the appropriate
(wider) Tenzir type. For example, if your Parquet file contains a column of type
int32, it will now be read in as int64 instead of rejecting the entire file.
By @IyeOnline in #5373.
Dynamic log_type for to_google_secops
Section titled “Dynamic log_type for to_google_secops”The to_google_secops operator now supports dynamic log_types. You can set
the option to any expression evaluating to a string, e.g.:
from {type: "CUSTOM_DNS", text: "..."}, {type: "BIND_DNS", text: "..."}to_google_secops log_type=type, log_text=text, ...New read_all operator
Section titled “New read_all operator”The read_all operator produces a single event for its entire input stream.
Account key authentication for Azure Blob Storage
Section titled “Account key authentication for Azure Blob Storage”The load_azure_blob_storage and save_azure_blob_storage operators now
support account key (shared key) authentication via a new account_key option.
This provides an additional method for accessing Azure Blob Storage, alongside
existing authentication options.
Changes
Section titled “Changes”Performance improvements
Section titled “Performance improvements”Tenzir can now handle significantly more concurrent pipelines without becoming unresponsive. These improvements make the system significantly more robust under high load, with response times remaining stable even with thousands of concurrent pipelines.
Improvements to context::enrich
Section titled “Improvements to context::enrich”The context::enrich operator now allows using mode="append" even if the
enrichment does not have the exact same type as the existing type, as long as
they are compatible.
Furthermore, mode="ocsf" now returns null if no enrichment took place
instead of a record with a null data field.
Bug Fixes
Section titled “Bug Fixes”Context operator metrics
Section titled “Context operator metrics”The data flowing through the context:: family of operators is no longer
counted as actual ingress and egress.
Fixed secrets in headers argument in from_http
Section titled “Fixed secrets in headers argument in from_http”We fixed a crash when using a secret in the headers argument of the from_http
operator.
By @IyeOnline in #5376.
Fixed crash in read_parquet
Section titled “Fixed crash in read_parquet”Tenzir and the read_parquet operator only support a subset of all Parquet types.
Reading an unsupported Parquet file could previously crash Tenzir in some
situations. This is now fixed and the operator instead raises an error.
By @IyeOnline in #5373.
Fixed issue with table creation in to_clickhouse
Section titled “Fixed issue with table creation in to_clickhouse”Multiple to_clickhouse operators can now attempt to create the same ClickHouse
table at the same time without an error.
By @IyeOnline in #5360.
Fixed to_amazon_security_lake partitioning
Section titled “Fixed to_amazon_security_lake partitioning”The to_amazon_security_lake incorrectly partitioned as …/accountID=…. It now
uses the correct …/accountId=….
By @IyeOnline in #5369.
Return type of map for empty lists
Section titled “Return type of map for empty lists”Previously, the map function would return the input list when the input was
empty, possibly producing type warnings downstream. It now correctly returns
list<null> instead.
Formatting ip and subnet values in to_amazon_security_lake
Section titled “Formatting ip and subnet values in to_amazon_security_lake”The to_amazon_security_lake operator now correctly formats ip and subnet
values as strings and formats timestamps using millisecond precision, similar
to the Security Lake built-in sources.