Tenzir Node v5.22.0

This release introduces support for arguments in user-defined operators, letting operators declare positional and named parameters with optional default values and use them just like built-in operators. It also enhances parser behavior for duplicate keys and includes several important stability, parsing, and retention improvements to make pipelines more flexible and reliable.

🚀 Features

Argument support for User-defined operators

Dec 17, 2025 · @tobim

User-defined operators in packages can now declare arguments in their YAML frontmatter, enabling parameterized operator definitions with the same calling convention as built-in operators.

Arguments can be positional or named. Both support optional default values and can be called with literals, constant expressions, or dynamically evaluated runtime expressions such as fields.

For example, create a reusable operator to set fields dynamically:

---
description: "Set a field to a value"
args:
  positional:
    - name: field
      type: field
    - name: value
      type: string
  named:
    - name: prefix
      type: string
      default: ""
---
$field = $prefix + $value

Use the operator with both constant and runtime arguments:

from {x: 1}
mypkg::set_field this.name, "Alice", prefix="User: "

{
  x: 1,
  name: "User: Alice",
}

Parameters can be typed with a type name passed in the namesake field. In case the passed in expression can be evaluated at instantiation time it is checked against the type and a diagnostic is returned if it does not match. In case a type check is not possible because the expression contains references to run-time data, the type check is omitted, and potential errors will be flagged at runtime. Field-path arguments (declared via type: field) accept selectors and cannot declare defaults.

Filter files by modification times with `max_age`

Dec 16, 2025 · @raxyte · #5611

The from_file, from_s3, from_gcs, and from_azure_blob_storage operators now support an optional max_age parameter that filters files based on their last modification time. Only files modified within the specified duration from now will be processed.

Example

Process only files modified in the last hour:

from_file "/var/log/security/*.json", max_age=1h

Improved Google Cloud PubSub Integration

Dec 15, 2025 · @IyeOnline · #5593

We have improved our Google Cloud PubSub integration with the addition of the new from_google_cloud_pubsub and to_google_cloud_pubsub operators.

These operators are direct void -> event and event -> void operators, which means that they ensure a 1:1 relation between events and messages.

The from_google_cloud_pubsub operator can also attach metadata such as message ID, publish time, and attributes for downstream enrichment.

The legacy load_google_cloud_pubsub and save_google_cloud_pubsub operators are deprecated in favor of these event-preserving counterparts.

Backpressure and connection limits for HTTP server

Dec 12, 2025 · @raxyte · #5601

The from_http operator in server mode now implements backpressure, waiting for each request to be processed before accepting new data. This prevents memory pressure during traffic spikes from webhook integrations or log receivers.

A new max_connections parameter limits simultaneous connections:

from_http "0.0.0.0:8080", server=true, max_connections=50

The default is 10 connections. Additional connections are rejected until a slot frees up, keeping your pipelines stable under heavy load.

Getting data from SentinelOne Data Lake

Dec 12, 2025 · @raxyte · #5599

The new from_sentinelone_data_lake operator allows you to query the SentinelOne Singularity Data Lake using PowerQuery and retrieve security events directly into your Tenzir pipelines. Tenzir’s integrations with SentinelOne now allow you to send data to and load data from SentinelOne Data Lakes.

Example

Query threat events and filter by severity:

from_sentinelone_data_lake "https://xdr.eu1.sentinelone.net",
  token=secret("sentinelone-token"),
  query="severity > 3 | columns id",
  start=now()-7d

The operator sends a request to the /api/powerQuery endpoint with optional time range filters and parses the tabular response into events for downstream processing.

Support for duplicate keys in parsers

Dec 8, 2025 · @IyeOnline · #5445

Our parsers now have improved support for repeated keys in a an event. Previously a later key-value pair would always overwrite the previous one. With this change the value is transparently upgraded to a list of values.

Getting Kafka records with `from_kafka`

Dec 4, 2025 · @raxyte · #5575

The new from_kafka operator allows you to receive one event per Kafka message, thus keeping the event boundary unlike load_kafka, which has now been deprecated.

Example

Use from_kafka to parse JSON events from a topic:

from_kafka "events"
this = message.parse_json()

Support GOOGLE_CLOUD_PROJECT environment variable in `to_google_cloud_logging` operator

Dec 3, 2025 · @lava · #5591

The to_google_cloud_logging operator now checks for the GOOGLE_CLOUD_PROJECT environment variable if no explicit project id is given, before falling back to the Google Metadata service.

Run pipelines with uvx tenzir

Oct 27, 2025 · @tobim

The tenzir binary is now bundled directly with the tenzir Python wheel. This means you can run Tenzir pipelines on any machine with uv installed, without any separate installation steps.

Just use uvx:

uvx tenzir 'version'

The bundled binary is available for Apple Silicon Macs, aarch64 Linux, and x86_64 Linux. On other platforms, the wheel only contains the Python bindings and you need to install the tenzir binary separately.

🔧 Changes

Simplified `publish` and `subscribe` connection

Dec 16, 2025 · @IyeOnline · #5597

We made an under-the-hood change to the publish and subscribe implementation that reduces the overhead when publishing to high-throughput topics.

Removed `gcps://` URI scheme

Dec 15, 2025 · @IyeOnline · #5593

We have removed the gcps:/ URI scheme, which previously would dispatch to load_google_cloud_pubsub and save_google_cloud_pubsub. As these operators are deprecated and will be removed, the schemas are being retired as well.

Update default retention policies for metrics and diagnostics

Dec 5, 2025 · @lava · #5594

Tenzir now applies default retention policies for internal metrics and diagnostics:

Metrics (schema tenzir.metrics.*): Retained for 16 days by default
Diagnostics (schema tenzir.diagnostics.*): Retained for 30 days by default

These defaults help manage storage usage while keeping sufficient history for troubleshooting. You can customize these settings:

tenzir:
  retention:
    metrics: 16d      # Retention period for general metrics
    diagnostics: 30d  # Retention period for diagnostics

Set any retention period to 0 to disable automatic deletion for that category.

🐞 Bug Fixes

Fixed an assertion in parsers

Dec 8, 2025 · @IyeOnline · #5595

When parsing typed-data (e.g. integers in JSON), with a predefined schema that expected a different type (e.g. a time), the parser would crash with an assertion failure.

This has now been resolved and the field will simply be null instead with a warning being emitted.

Fixed missing Zeek fields

Dec 8, 2025 · @IyeOnline · #5445

Zeek JSON contains fields such as io.data.read.bytes and io.data.read.bytes.per-second. These fields would previously overwrite each other in order of appearance.

With this change bytes now is a record and the original value is kept under the key "".

Removed warning for `void` metrics

Dec 8, 2025 · @jachris · #5598

The non-actionable warning “received an operator metric without a unit” that was sometimes emitted for closed subpipelines was removed.

Fixed an assertion failure in parsers

Dec 3, 2025 · @IyeOnline · #5590

We fixed a bug in a common component used across all parsers, which could enter an inconsistent state, leading to an “unexpected internal error: unreachable”.

Download on GitHub

Get the release artifacts and source code.

Tenzir Node v5.22.0

🚀 Features

Argument support for User-defined operators

Filter files by modification times with max_age

Improved Google Cloud PubSub Integration

Backpressure and connection limits for HTTP server

Getting data from SentinelOne Data Lake

Support for duplicate keys in parsers

Getting Kafka records with from_kafka

Support GOOGLE_CLOUD_PROJECT environment variable in to_google_cloud_logging operator

Run pipelines with uvx tenzir

🔧 Changes

Simplified publish and subscribe connection

Removed gcps:// URI scheme

Update default retention policies for metrics and diagnostics

🐞 Bug Fixes

Fixed an assertion in parsers

Fixed missing Zeek fields

Removed warning for void metrics

Fixed an assertion failure in parsers

Filter files by modification times with `max_age`

Getting Kafka records with `from_kafka`

Support GOOGLE_CLOUD_PROJECT environment variable in `to_google_cloud_logging` operator

Simplified `publish` and `subscribe` connection

Removed `gcps://` URI scheme

Removed warning for `void` metrics