OCSF Casting & Recursion

This release introduces the ocsf::cast operator to streamline schema transformations for OCSF events and adds support for one-level recursion in OCSF objects, enabling recursive relations such as process.parent_process and analytic.related_analytics.

Download the release on GitHub.

Features

`ocsf::cast` operator

The new ocsf::cast operator handles common schema transformations when working with OCSF events, such as homogenizing events of the same OCSF type or converting timestamps to integer counts to strictly adhere to the schema. This also deprecates the less flexible ocsf::apply operator, which is now equivalent to ocsf::cast null_fill=true.

By @raxyte in #5502.

Changes

Reduced memory consumption during import

The memory usage while importing events has been significantly optimized. Previously, importing would leave a trail of memory usage that only decreased slowly over a period corresponding to tenzir.active-partition-timeout. Now, events are properly released immediately after being written to disk, preventing unnecessary memory accumulation.

We also eliminated redundant copies throughout the import path, reducing memory usage by 2-4x depending on the dataset. Additionally, we optimized the memory usage of buffered synopses, which are used internally when building indexes during import. This optimization avoids unnecessary copies of strings and IP addresses, roughly halving the memory consumption of the underlying component.

By @jachris in #5532, #5533, #5535.

Dynamic clean up of expired keys in `deduplicate`

The deduplicate operator now also considers the timeouts set when calculating frequency of cleaning up expired state. This resuts in lower memory usage if a timeout is under 15min.

By @raxyte in #5534.

Flip pipeline subprocesses option semantics

We renamed the configuration option to tenzir.pipeline-subprocesses and kept the feature opt-in to avoid surprising users upgrading from earlier releases. Set the option to true to enable subprocess execution:

tenzir:
  pipeline-subprocesses: true

By @mavam in #5537.

Expose one-level recursion for OCSF objects

We now support recursive OCSF objects at depth one, as opposed to dropping recursive objects entirely. For example, pipelines can safely follow relationships such as process.parent_process or analytic.related_analytics:

from {
  metadata: {version: "1.5.0"},
  class_uid: 1007,
  process: {
    pid: 1234,
    parent_process: {
      pid: 5678,
    },
  },
}
ocsf::apply

// New!
assert process.parent_process.pid == 5678
assert not process.parent_process.has("parent_process")

The first assertion now succeeds while deeper ancestry is trimmed automatically, preserving schema compatibility for downstream consumers.

By @mavam in #5529.

Bug Fixes

Lambda capture extraction

Lambda captures now work correctly for field accesses where the left side is not a constant field path. For example, .map(x => a[x].b) previously did not capture a, even though that is required to correctly evaluate the body of the lambda. This now works as expected.

By @jachris in #5538.