Better Performance

This release delivers significant performance improvements for situations with many concurrent pipelines, making Tenzir more robust under high-load scenarios. New features include AWS role assumption support, enhanced string trimming functionality, and improved HTTP error handling capabilities. Additionally, this release adds several new operators and comes with various bug fixes.

Download the release on GitHub.

Features

Roles in `save_s3` and `to_amazon_security_lake`

We have added new options to assume a role to the save_s3 and to_amazon_security_lake operators. You can specify an AWS role and the operator(s) will assume this role for authorization and optionally. Additionally you can specify an external_id to use alongside the role.

By @raxyte, @IyeOnline in #5391.

Trimming custom characters

The trim(), trim_start(), and trim_end() functions can now remove specific characters from strings, not just whitespace. Pass a second argument containing a string where each character represents a character to remove:

from {
  path: "/path/to/file/".trim("/"),
  decorated: "--hello--world--".trim("-"),
  complex: "/-/data/-/".trim("/-")
}

{
  path: "path/to/file",
  decorated: "hello--world",
  complex: "data"
}

Each character in the second argument is treated individually, not as a complete string to match:

from {
  // Removes 'a', 'e', and 'g' from both ends
  chars: "abcdefg".trim("aeg"),
  // Removes any 'o', 'l', 'e', or 'h' from both ends
  word: "helloworldhello".trim("olleh")
}

{
  chars: "bcdf",
  word: "wr"
}

This also works with trim_start() and trim_end() for one-sided trimming:

from {
  start: "///api/v1/users".trim_start("/"),
  end: "data.csv.tmp.....".trim_end(".")
}

{
  start: "api/v1/users",
  end: "data.csv"
}

By @mavam in #5389.

Handling HTTP error status codes

The from_http and http operators now provide an error_field option that lets you specify a field to receive the error response as a blob. When you set this option, the operators keep events with status codes outside the 200–399 range so you can handle them manually.

By @raxyte in #5358.

Versioned sources in `to_amazon_security_lake` operator

The to_amazon_security_lake operator now supports versioned custom sources, such as

let $lake_url = "s3://aws-security-data-lake-eu-west-2-lake-abcdefghijklmnopqrstuvwxyz1234/ext/tnz-ocsf-dns/1.0/"
to_amazon_security_lake $lake_url, …

By @IyeOnline in #5369.

Dropping null fields

The new drop_null_fields operator removes fields containing null values from events. Without arguments, it drops all fields with null values. With field arguments, it drops the specified fields if they contain null values, and for record fields, it also recursively drops all null fields within them.

Drop all null fields:

from {
  id: 42,
  user: {name: "alice", email: null},
  status: null,
  tags: ["security", "audit"]
}
drop_null_fields

{
  id: 42,
  user: {
    name: "alice",
  },
  tags: [
    "security",
    "audit",
  ],
}

Drop specific null fields:

from {
  id: 42,
  user: {name: "alice", email: null},
  status: null,
  tags: ["security", "audit"]
}
drop_null_fields user.email

{
  id: 42,
  user: {
    name: "alice",
  },
  status: null,
  tags: [
    "security",
    "audit",
  ],
}

Note that status remains because it wasn’t specified in the field list.

When specifying a record field, all null fields within it are removed:

from {
  user: {name: "alice", email: null, role: null},
  settings: {theme: "dark", notifications: null}
}
drop_null_fields user

{
  user: {
    name: "alice",
  },
  settings: {
    theme: "dark",
    notifications: null,
  },
}

The user.email and user.role fields are removed because they are null fields within the specified user record. The settings.notifications field remains because settings was not specified.

By @mavam in #5370.

More supported types in `read_parquet`

Tenzir’s does not support all types that Parquet supports. We have enabled the read_parquet operator to accept more types that are convertible to supported types. It will convert integer, floating point, and time types to the appropriate (wider) Tenzir type. For example, if your Parquet file contains a column of type int32, it will now be read in as int64 instead of rejecting the entire file.

By @IyeOnline in #5373.

Dynamic `log_type` for `to_google_secops`

The to_google_secops operator now supports dynamic log_types. You can set the option to any expression evaluating to a string, e.g.:

from {type: "CUSTOM_DNS", text: "..."},
     {type: "BIND_DNS", text: "..."}
to_google_secops log_type=type, log_text=text, ...

By @raxyte in #5365.

New `read_all` operator

The read_all operator produces a single event for its entire input stream.

By @jachris in #5368.

Account key authentication for Azure Blob Storage

The load_azure_blob_storage and save_azure_blob_storage operators now support account key (shared key) authentication via a new account_key option. This provides an additional method for accessing Azure Blob Storage, alongside existing authentication options.

By @raxyte in #5380.

Changes

Performance improvements

Tenzir can now handle significantly more concurrent pipelines without becoming unresponsive. These improvements make the system significantly more robust under high load, with response times remaining stable even with thousands of concurrent pipelines.

By @jachris in #5382.

Improvements to `context::enrich`

The context::enrich operator now allows using mode="append" even if the enrichment does not have the exact same type as the existing type, as long as they are compatible.

Furthermore, mode="ocsf" now returns null if no enrichment took place instead of a record with a null data field.

By @jachris in #5388.

Bug Fixes

Context operator metrics

The data flowing through the context:: family of operators is no longer counted as actual ingress and egress.

By @jachris in #5383.

Fixed secrets in `headers` argument in `from_http`

We fixed a crash when using a secret in the headers argument of the from_http operator.

By @IyeOnline in #5376.

Fixed crash in `read_parquet`

Tenzir and the read_parquet operator only support a subset of all Parquet types. Reading an unsupported Parquet file could previously crash Tenzir in some situations. This is now fixed and the operator instead raises an error.

By @IyeOnline in #5373.

Fixed issue with table creation in `to_clickhouse`

Multiple to_clickhouse operators can now attempt to create the same ClickHouse table at the same time without an error.

By @IyeOnline in #5360.

Fixed `to_amazon_security_lake` partitioning

The to_amazon_security_lake incorrectly partitioned as …/accountID=…. It now uses the correct …/accountId=….

By @IyeOnline in #5369.

Return type of `map` for empty lists

Previously, the map function would return the input list when the input was empty, possibly producing type warnings downstream. It now correctly returns list<null> instead.

By @jachris in #5385.

Formatting `ip` and `subnet` values in `to_amazon_security_lake`

The to_amazon_security_lake operator now correctly formats ip and subnet values as strings and formats timestamps using millisecond precision, similar to the Security Lake built-in sources.

By @raxyte in #5387.

Better Performance

Features

Roles in save_s3 and to_amazon_security_lake

Trimming custom characters

Handling HTTP error status codes

Versioned sources in to_amazon_security_lake operator

Dropping null fields

More supported types in read_parquet

Dynamic log_type for to_google_secops

New read_all operator