Skip to content

This release introduces support for arguments in user-defined operators, letting operators declare positional and named parameters with optional default values and use them just like built-in operators. It also enhances parser behavior for duplicate keys and includes several important stability, parsing, and retention improvements to make pipelines more flexible and reliable.

Argument support for User-defined operators

Section titled “Argument support for User-defined operators”

Dec 17, 2025 · @tobim

User-defined operators in packages can now declare arguments in their YAML frontmatter, enabling parameterized operator definitions with the same calling convention as built-in operators.

Arguments can be positional or named. Both support optional default values and can be called with literals, constant expressions, or dynamically evaluated runtime expressions such as fields.

For example, create a reusable operator to set fields dynamically:

---
description: "Set a field to a value"
args:
positional:
- name: field
type: field
- name: value
type: string
named:
- name: prefix
type: string
default: ""
---
$field = $prefix + $value

Use the operator with both constant and runtime arguments:

from {x: 1}
mypkg::set_field this.name, "Alice", prefix="User: "
{
x: 1,
name: "User: Alice",
}

Parameters can be typed with a type name passed in the namesake field. In case the passed in expression can be evaluated at instantiation time it is checked against the type and a diagnostic is returned if it does not match. In case a type check is not possible because the expression contains references to run-time data, the type check is omitted, and potential errors will be flagged at runtime. Field-path arguments (declared via type: field) accept selectors and cannot declare defaults.

Filter files by modification times with max_age

Section titled “Filter files by modification times with max_age”

Dec 16, 2025 · @raxyte · #5611

The from_file, from_s3, from_gcs, and from_azure_blob_storage operators now support an optional max_age parameter that filters files based on their last modification time. Only files modified within the specified duration from now will be processed.

Example

Process only files modified in the last hour:

from_file "/var/log/security/*.json", max_age=1h

Dec 15, 2025 · @IyeOnline · #5593

We have improved our Google Cloud PubSub integration with the addition of the new from_google_cloud_pubsub and to_google_cloud_pubsub operators.

These operators are direct void -> event and event -> void operators, which means that they ensure a 1:1 relation between events and messages.

The from_google_cloud_pubsub operator can also attach metadata such as message ID, publish time, and attributes for downstream enrichment.

The legacy load_google_cloud_pubsub and save_google_cloud_pubsub operators are deprecated in favor of these event-preserving counterparts.

Backpressure and connection limits for HTTP server

Section titled “Backpressure and connection limits for HTTP server”

Dec 12, 2025 · @raxyte · #5601

The from_http operator in server mode now implements backpressure, waiting for each request to be processed before accepting new data. This prevents memory pressure during traffic spikes from webhook integrations or log receivers.

A new max_connections parameter limits simultaneous connections:

from_http "0.0.0.0:8080", server=true, max_connections=50

The default is 10 connections. Additional connections are rejected until a slot frees up, keeping your pipelines stable under heavy load.

Dec 12, 2025 · @raxyte · #5599

The new from_sentinelone_data_lake operator allows you to query the SentinelOne Singularity Data Lake using PowerQuery and retrieve security events directly into your Tenzir pipelines. Tenzir’s integrations with SentinelOne now allow you to send data to and load data from SentinelOne Data Lakes.

Example

Query threat events and filter by severity:

from_sentinelone_data_lake "https://xdr.eu1.sentinelone.net",
token=secret("sentinelone-token"),
query="severity > 3 | columns id",
start=now()-7d

The operator sends a request to the /api/powerQuery endpoint with optional time range filters and parses the tabular response into events for downstream processing.

Dec 8, 2025 · @IyeOnline · #5445

Our parsers now have improved support for repeated keys in a an event. Previously a later key-value pair would always overwrite the previous one. With this change the value is transparently upgraded to a list of values.

Dec 4, 2025 · @raxyte · #5575

The new from_kafka operator allows you to receive one event per Kafka message, thus keeping the event boundary unlike load_kafka, which has now been deprecated.

Example

Use from_kafka to parse JSON events from a topic:

from_kafka "events"
this = message.parse_json()

Support GOOGLE_CLOUD_PROJECT environment variable in to_google_cloud_logging operator

Section titled “Support GOOGLE_CLOUD_PROJECT environment variable in to_google_cloud_logging operator”

Dec 3, 2025 · @lava · #5591

The to_google_cloud_logging operator now checks for the GOOGLE_CLOUD_PROJECT environment variable if no explicit project id is given, before falling back to the Google Metadata service.

Oct 27, 2025 · @tobim

The tenzir binary is now bundled directly with the tenzir Python wheel. This means you can run Tenzir pipelines on any machine with uv installed, without any separate installation steps.

Just use uvx:

Terminal window
uvx tenzir 'version'

The bundled binary is available for Apple Silicon Macs, aarch64 Linux, and x86_64 Linux. On other platforms, the wheel only contains the Python bindings and you need to install the tenzir binary separately.

Simplified publish and subscribe connection

Section titled “Simplified publish and subscribe connection”

Dec 16, 2025 · @IyeOnline · #5597

We made an under-the-hood change to the publish and subscribe implementation that reduces the overhead when publishing to high-throughput topics.

Dec 15, 2025 · @IyeOnline · #5593

We have removed the gcps:/ URI scheme, which previously would dispatch to load_google_cloud_pubsub and save_google_cloud_pubsub. As these operators are deprecated and will be removed, the schemas are being retired as well.

Update default retention policies for metrics and diagnostics

Section titled “Update default retention policies for metrics and diagnostics”

Dec 5, 2025 · @lava · #5594

Tenzir now applies default retention policies for internal metrics and diagnostics:

  • Metrics (schema tenzir.metrics.*): Retained for 16 days by default
  • Diagnostics (schema tenzir.diagnostics.*): Retained for 30 days by default

These defaults help manage storage usage while keeping sufficient history for troubleshooting. You can customize these settings:

tenzir.yaml
tenzir:
retention:
metrics: 16d # Retention period for general metrics
diagnostics: 30d # Retention period for diagnostics

Set any retention period to 0 to disable automatic deletion for that category.

Dec 8, 2025 · @IyeOnline · #5595

When parsing typed-data (e.g. integers in JSON), with a predefined schema that expected a different type (e.g. a time), the parser would crash with an assertion failure.

This has now been resolved and the field will simply be null instead with a warning being emitted.

Dec 8, 2025 · @IyeOnline · #5445

Zeek JSON contains fields such as io.data.read.bytes and io.data.read.bytes.per-second. These fields would previously overwrite each other in order of appearance.

With this change bytes now is a record and the original value is kept under the key "".

Dec 8, 2025 · @jachris · #5598

The non-actionable warning “received an operator metric without a unit” that was sometimes emitted for closed subpipelines was removed.

Dec 3, 2025 · @IyeOnline · #5590

We fixed a bug in a common component used across all parsers, which could enter an inconsistent state, leading to an “unexpected internal error: unreachable”.