OCSF Trim & Derive

This release introduces two new powerful OCSF operators that automate enum derivation and provide intelligent field trimming. The update also includes string padding functions, better HTTP requests, IP categorization and much more!

Download the release on GitHub.

Features

Improved node robustness

We added an experimental feature to run node-independent operators of a pipeline in dedicated subprocesses. This brings improved error resilience and resource utilization. You can opt-in to this feature with the setting tenzir.disable-pipeline-subprocesses: false in tenzir.yaml. We plan to enable this feature by default in the future.

By @tobim in #5233.

Operations on concatenated secrets

You can now arbitrarily nest operations on secrets. This is useful for APIs that expect authentication is an encoded blob:

let $headers = {
  auth: f"{secret("user")}:{secret("password")}".encode_base64()
}

By @IyeOnline in #5324.

New string padding functions

Ever tried aligning threat actor names in your incident reports? Or formatting CVE IDs with consistent spacing for your vulnerability dashboard? We’ve all been there, fighting with inconsistent string lengths that make our security tools output look like alphabet soup. 🍲

Meet your new formatting friends: pad_start() and pad_end()!

Live Threat Feed Dashboard

Create a real-time threat indicator board with perfectly aligned columns:

from {time: "14:32", actor: "APT29", target: "energy", severity: 9},
     {time: "14:35", actor: "Lazarus", target: "finance", severity: 10},
     {time: "14:41", actor: "APT1", target: "defense", severity: 8}
select threat_line = time + " │ " + actor.pad_end(12) + " │ " +
                     target.pad_end(10) + " │ " + severity.string().pad_start(2, "0")
write_lines

14:32 │ APT29        │ energy     │ 09
14:35 │ Lazarus      │ finance    │ 10
14:41 │ APT1         │ defense    │ 08

CVE Priority Matrix

Format CVE IDs and CVSS scores for your vulnerability management system:

from {cve: "CVE-2024-1337", score: 9.8, vector: "network", status: "🔴"},
     {cve: "CVE-2024-42", score: 7.2, vector: "local", status: "🟡"},
     {cve: "CVE-2024-31415", score: 5.4, vector: "physical", status: "🟢"}
select priority = status + " " + cve.pad_end(16) + " [" +
                  score.string().pad_start(4) + "] " + vector.pad_start(10, "·")
write_lines

🔴 CVE-2024-1337    [ 9.8] ···network
🟡 CVE-2024-42      [ 7.2] ·····local
🟢 CVE-2024-31415   [ 5.4] ··physical

Network Flow Analysis

Build clean firewall logs with aligned source/destination pairs:

from {src: "10.0.0.5", dst: "8.8.8.8", proto: "DNS", bytes: 234},
     {src: "192.168.1.100", dst: "13.107.42.14", proto: "HTTPS", bytes: 8924},
     {src: "172.16.0.50", dst: "185.199.108.153", proto: "SSH", bytes: 45812}
select flow = src.pad_start(15) + " → " + dst.pad_start(15) +
              " [" + proto.pad_end(5) + "] " + bytes.string().pad_start(7) + " B"
write_lines

       10.0.0.5 →         8.8.8.8 [DNS  ]     234 B
  192.168.1.100 →    13.107.42.14 [HTTPS]    8924 B
    172.16.0.50 → 185.199.108.153 [SSH  ]   45812 B

Both padding functions accept three parameters:

String to pad (required)
Target length (required)
Padding character (optional, defaults to space)

If your string is already longer than the target length, it returns unchanged. Multi-character padding? That’s a paddlin’ (returns an error).

Your SOC dashboards never looked so clean! 🎯

By @mavam in #5344.

Sinks in HTTP Parsing Pipelines

Parsing pipeline in the from_http and http operators now support sinks. This worked already in from_file parsing pipelines and now works, as expected, also in the HTTP parsing pipelines. For example, you can now write:

from_http "https://cra.circl.lu/opendata/geo-open/mmdb-country-asn/latest.mmdb" {
  context::load "geo-open-country-asn"
}

By @mavam in #5343.

HTTP request body encoding

The from_http and http operators now support using record values for the request body parameter. By default, the record is serialized as JSON. You can also specify encode="form" to send the body as URL-encoded form data. When using form encoding, nested fields are flattened using dot notation (e.g., foo: {bar: "baz"} => foo.bar=baz). This supersedes the payload parameter, which therefore is now deprecated.

Examples

By default, setting body to a record will JSON-encode it:

http "https://api.example.com/data", body={foo: "bar", count: 42}

POST /data HTTP/1.1
Host: api.example.com
Content-Type: application/json
Content-Length: 33

{
  "foo": "bar",
  "count": 42
}

To change the encoding, you can use the encode option:

http "https://api.example.com/data",
  body={foo: {bar: "baz"}, count: 42},
  encode="form"

POST /data HTTP/1.1
Host: api.example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 20

foo.bar=baz&count=42

Arbitrary body contents can be sent by using a string or blob:

http "https://api.example.com/data", body="hello world!"

POST /data HTTP/1.1
Host: api.example.com
Content-Length: 12

hello world!

By @raxyte in #5305.

IP address categorization functions

Ever wondered if that suspicious traffic is coming from inside the corporate network? 🏢 We’ve got you covered with a new suite of IP address classification functions that make network analysis a breeze.

is_private() - Quickly spot internal RFC 1918 addresses in your logs. Perfect for identifying lateral movement or distinguishing between internal and external threats:

where src_ip.is_private() and dst_ip.is_global()
// Catch data exfiltration attempts from your internal network

is_global() - Find publicly routable addresses. Essential for tracking external attackers or monitoring outbound connections:

where src_ip.is_global() and failed_login_count > 5
// Detect brute force attempts from the internet

is_multicast() - Identify multicast traffic (224.0.0.0/4, ff00::/8). Great for spotting mDNS, SSDP, and other broadcast protocols that shouldn’t cross network boundaries:

where dst_ip.is_multicast() and src_ip.is_global()
// Flag suspicious multicast from external sources

is_link_local() - Detect link-local addresses (169.254.0.0/16, fe80::/10). Useful for identifying misconfigurations or APIPA fallback:

where server_ip.is_link_local()
// Find services accidentally binding to link-local addresses

is_loopback() - Spot loopback addresses (127.0.0.0/8, ::1). Hunt for suspicious local connections or tunneled traffic:

where src_ip != dst_ip and dst_ip.is_loopback()
// Unusual loopback connections might indicate malware

ip_category() - Get the complete classification in one shot. Returns: “global”, “private”, “multicast”, “link_local”, “loopback”, “broadcast”, or “unspecified”:

where src_ip.ip_category() == "private" and dst_ip.ip_category() == "multicast"
// Analyze traffic patterns by IP category

These functions work seamlessly with both IPv4 and IPv6 addresses, making them future-proof for your dual-stack environments. Happy hunting! 🔍

By @mavam in #5336.

`ocsf::trim` and `ocsf::derive`

Tenzir now provides two new operators for processing OCSF events:

ocsf::derive automatically assigns enum strings from their integer counterparts and vice versa. It performs bidirectional enum derivation for OCSF events and validates consistency between existing enum values.

from {
  activity_id: 1,
  class_uid: 1001,
  metadata: {version: "1.5.0"},
}
ocsf::derive

This transforms the event to include the derived activity_name: "Create" and class_name: "File System Activity" fields.

ocsf::trim intelligently removes fields from OCSF events to reduce data size while preserving essential information. You can also have explicit control over optional and recommended field removal.

from {
  class_uid: 3002,
  class_name: "Authentication",
  user: {
    name: "alice",
    display_name: "Alice",
  },
  status: "Success",
}
ocsf::trim

This removes non-essential fields like class_name and user.display_name while keeping critical information intact.

By @jachris in #5330.

Compression for `write_bitz`

Tenzir’s internal wire format, which is accessible through the read_bitz and write_bitz operators, now uses Zstd compression internally, resulting in a significantly smaller output size. This change is backwards-compatible.

By @dominiklohmann in #5335.

Changes

Better query optimization

Previously, queries that used export followed by a where that used fields such as this["field name"] were not optimized. Now, the same optimizations apply as with normal fields, improving the performance of such queries.

By @jachris in #5362.

Improved `join` behavior

The join function now also works with empty lists that are typed as list<null>. Furthermore, it now emits more helpful warnings.

By @jachris in #5356.

Respecting error responses from Azure Log Analytics

The to_azure_log_analytics operator now emits an error when it receives any response considering an internal error. Those normally indicate configuration errors and the pipeline will now stop with an error instead of continuing to send data that will not be received correctly.

By @tobim in #5314.

Renamed `to_asl`

We renamed our Amazon Security Lake integration operator from to_asl to to_amazon_security_lake. The old name is now deprecated and will be removed in the future.

By @IyeOnline in #5340.

`kv` parser no longer produces empty fields

Our Key-Value parsers (the read_kv operator and parse_kv function) previously produced empty values if the value_split was not found.

With this change, a “field” missing a value_split is considered an extension of the previous fields value instead:

from \
  {input: "x=1 y=2 z=3 4 5 a=6"},
this = { ...input.parse_kv() }

Previous result:

{x:1, y:2, z:"3", "4":"", "5":"", a:6}

New result:

{x:1, y:2, z:"3 4 5", a:6}

By @IyeOnline in #5313.

Bug Fixes

Non-default databases in `to_clickhouse`

The to_clickhouse operator erroneously rejected table arguments of the form database_name.table_name. This is now fixed, allowing you to write to non-default databases.

By @IyeOnline in #5355.

Remove file size limit from Amazon Security Lake Integration

We removed the 256MB file size limit from the Amazon Security Lake integration.

By @IyeOnline in #5340.

Newlines before `else`

Previously, the if … { … } else { … } construct required that there was no newline before else. This restriction is now lifted, which allows placing else at the beginning of the line:

if x { … }
else if y { … }
else { … }

By @jachris in #5348.

Fixed `encrypt_cryptopan` function

We fixed a bug that sometimes caused the encrypt_cryptopan function to fail with the error “got ip, expected ip”, which was caused by an incorrect type check. The function now works as expected again.

By @dominiklohmann in #5345.

Fixed `list_separator` option name in `print_csv`

The print_csv, print_ssv and print_tsv functions had an option incorrectly named field_separator. Instead, these functions have an option list_separator now, allowing you to change the list separator. You cannot set a custom field_separator on these functions. If you want to print with custom field_separators, use print_xsv instead.

By @IyeOnline in #5357.

Fix `context::create_geoip` without `db_path`

The context::create_geoip operator failed with a message_mismatch error when no db_path option was provided. This was caused by an internal serialization error, which we now fixed. This is the only known place where this error occurred.

By @dominiklohmann in #5342.

Fix `http` operator pagination

The http operator dropped all provided HTTP headers after the first request when performing paginated requests. The operator now preserves the headers for all requests.

By @mavam in #5332.