🚀 Features
Section titled “🚀 Features”Implement a fallback parser mechanism for extensions that don’t have …
Section titled “Implement a fallback parser mechanism for extensions that don’t have …”Aug 4, 2023 · @Dakostu · #3422
The json parser now servers as a fallback parser for all files whose
extension do not have any default parser in Tenzir.
Add show operator
Section titled “Add show operator”The new show source operator makes it possible to gather meta information
about Tenzir. For example, the provided introspection capabilities allow for
emitting existing formats, connectors, and operators.
Improve metrics (and some other things)
Section titled “Improve metrics (and some other things)”Jul 24, 2023 · @dominiklohmann · #3390
The sort operator now also works for ip and enum fields.
tenzir --dump-metrics '<pipeline>' prints a performance overview of the
executed pipeline on stderr at the end.
Expose pipeline operator metrics in execution node and pipeline executor
Section titled “Expose pipeline operator metrics in execution node and pipeline executor”Jul 24, 2023 · @Dakostu · #3376
Pipeline metrics (total ingress/egress amount and average rate per second) are
now visible in the pipeline-manager, via the metrics field in the
/pipeline/list endpoint result.
Expose the batch operator underlying rebuild
Section titled “Expose the batch operator underlying rebuild”Jul 24, 2023 · @dominiklohmann · #3391
The batch <limit> operator allows expert users to control batch sizes in
pipelines explicitly.
Add —append and —real-time to directory saver
Section titled “Add —append and —real-time to directory saver”The directory saver now supports the two arguments -a|--append and
-r|--realtime that have the same semantics as they have for the file saver:
open files in the directory in append mode (instead of overwriting) and flush
the output buffers on every update.
Add colors to JSON printer
Section titled “Add colors to JSON printer”The json printer can now colorize its output by providing the
-C|--color-output option, and explicitly disable coloring via
-M|--monochrome-output.
Revamp packet acquisition and parsing
Section titled “Revamp packet acquisition and parsing”The new nic plugin provides a loader that acquires packets from a network
interface card using libpcap. It emits chunks of data in the PCAP file format so
that the pcap parser can process them as if packets come from a trace file.
The new decapsulate operator processes events of type pcap.packet and emits
new events of type tenzir.packet that contain the decapsulated PCAP packet
with packet header fields from the link, network, and transport layer. The
operator also computes a Community ID.
Use load - and read json as implicit sources
Section titled “Use load - and read json as implicit sources”Jul 10, 2023 · @dominiklohmann · #3329
Pipelines executed locally with tenzir now use load - and read json as
implicit sources. This complements save - and write json --pretty as
implicit sinks.
Implement the unflatten operator
Section titled “Implement the unflatten operator”Jul 7, 2023 · @Dakostu · #3304
The unflatten [<separator>] operator unflattens data structures by creating
nested records out of fields whose names contain a <separator>.
Add a --schema option to the JSON parser
Section titled “Add a --schema option to the JSON parser”Jul 4, 2023 · @dominiklohmann · #3295
The --schema option for the JSON parser allows for setting the target schema
explicitly by name.
Unroll the Zeek TSV header parsing loop
Section titled “Unroll the Zeek TSV header parsing loop”Jul 4, 2023 · @Dakostu · #3291
The zeek-tsv parser sometimes failed to parse Zeek TSV logs, wrongly
reporting that the header ended too early. This bug no longer exists.
Fix sporadic stalling of pipelines
Section titled “Fix sporadic stalling of pipelines”Jun 30, 2023 · @dominiklohmann · #3264
The pipeline manager now accepts empty strings for the optional name. The
/create endpoint returns a list of diagnostics if pipeline creation fails,
and if start_when_created is set, the endpoint now returns only after the
pipeline execution has been fully started. The /list endpoint now returns
the diagnostics collected for every pipeline so far. The /delete endpoint
now returns an empty object if the request is successful.
Change summarize to operate across schemas
Section titled “Change summarize to operate across schemas”Jun 23, 2023 · @jachris · #3250
The summarize operator now works across multiple schemas and can combine
events of different schemas into one group. It now also treats missing columns
as having null values.
The by clause of summarize is now optional. If it is omitted, all events
are assigned to the same group.
Implement top and rare
Section titled “Implement top and rare”Jun 21, 2023 · @dominiklohmann · #3176
The top <field> operator makes it easy to find the most common values for the
given field. Likewise, rare <field> returns the least common values for the
given field.
Add diagnostics (and some other improvements)
Section titled “Add diagnostics (and some other improvements)”Jun 20, 2023 · @jachris · #3223
In addition to tenzir "<pipeline>", there now is tenzir -f <file>, which
loads and executes the pipeline defined in the given file.
The pipeline parser now emits helpful and visually pleasing diagnostics.
Implement the serve operator and /serve endpoint
Section titled “Implement the serve operator and /serve endpoint”Jun 1, 2023 · @dominiklohmann · #3180
The serve operator and /serve endpoint supersede the experimental /query
endpoint. The operator is a sink for events, and bridges a pipeline into a
RESTful interface from which events can be pulled incrementally.
Rename #type to #schema and introduce #schema_id
Section titled “Rename #type to #schema and introduce #schema_id”Jun 1, 2023 · @jachris · #3183
The new #schema_id meta extractor returns a unique fingerprint for the schema.
Apply the changes from the new pipeline_manager plugin
Section titled “Apply the changes from the new pipeline_manager plugin”May 30, 2023 · @Dakostu · #3164
The new pipeline-manager is a proprietary plugin that allows for creating, updating and persisting pipelines. The included RESTful interface allows for easy access and modification of these pipelines.
A collection of minor UX improvements
Section titled “A collection of minor UX improvements”The --timeout option for the vast status command allows for defining how
long VAST waits for components to report their status. The option defaults to 10
seconds.
PRs 3128-3173-3193
Section titled “PRs 3128-3173-3193”May 17, 2023 · @dominiklohmann · #3128
The sink operator import persists events in a VAST node.
The source operator export retrieves events from a VAST node.
The repeat operator repeats its input a given number of times.
Implement a sort operator
Section titled “Implement a sort operator”May 17, 2023 · @dominiklohmann · #3155
The new sort operator allows for arranging events by field, in ascending and
descending order. The current version is still “beta” and has known limitations.
Add a --cumulative option to the measure operator
Section titled “Add a --cumulative option to the measure operator”May 17, 2023 · @dominiklohmann · #3156
The measure operator now returns running totals with the --cumulative
option.
Add the enumerate operator
Section titled “Add the enumerate operator”The new enumerate operator prepends a column with the row number of the input
records.
Avoid crashing when reading a pre-2.0 partition
Section titled “Avoid crashing when reading a pre-2.0 partition”Mar 16, 2023 · @dominiklohmann · #3018
The flatten [<separator>] operator flattens nested data structures by joining
nested records with the specified separator (defaults to .) and merging lists.
🔧 Changes
Section titled “🔧 Changes”Improve low-load memory consumption
Section titled “Improve low-load memory consumption”The default interval between two automatic rebuilds is now set to 2 hours and
can be configured with the rebuild-interval option.
Transform read and write into parse and print
Section titled “Transform read and write into parse and print”Jul 18, 2023 · @jachris · #3365
The parse and print operators have been renamed to read and write,
respectively. The read ... [from ...] and write ... [to ...] operators
are not available anymore. If you did not specify a connector, you can
continue using read ... and write ... in many cases. Otherwise, use
from ... [read ...] and to ... [write ...] instead.
Add colors to JSON printer
Section titled “Add colors to JSON printer”We removed the --pretty option from the json printer. This option is now the
default. To switch to NDJSON, use -c|--compact-output.
Remove previously deprecated options
Section titled “Remove previously deprecated options”Jul 13, 2023 · @dominiklohmann · #3358
The previously deprecated options tenzir.pipelines (replaced with
tenzir.operators) and tenzir.pipeline-triggers (no replacement) no longer
exist.
The previously deprecated deprecated types addr, count, int, and real
(replaced with ip, uint64, int64, and double, respectively) no longer
exist.
Revamp packet acquisition and parsing
Section titled “Revamp packet acquisition and parsing”We reimplemented the old pcap plugin as a format. The command tenzir-ctl import pcap no longer works. Instead, the new pcap plugin provides a parser
that emits pcap.packet events, as well as a printer that generates a PCAP file
when provided with these events.
Tune defaults and demo-node experience
Section titled “Tune defaults and demo-node experience”We reduced the default batch-timeout from ten seconds to one second in to
improve the user experience of interactive pipelines with data aquisition.
We reduced the default active-partition-timeout from 5 minutes to 30 seconds
to reduce the time until data is persisted.
Delete delete_when_stopped from the pipeline manager
Section titled “Delete delete_when_stopped from the pipeline manager”Jul 4, 2023 · @jachris · #3292
The delete_when_stopped flag was removed from the pipeline manager REST API.
Change summarize to operate across schemas
Section titled “Change summarize to operate across schemas”Jun 23, 2023 · @jachris · #3250
The aggregation functions in a summarize operator can now receive only a
single extractor instead of multiple ones.
The behavior for absent columns and aggregations across multiple schemas was changed.
Add diagnostics (and some other improvements)
Section titled “Add diagnostics (and some other improvements)”Jun 20, 2023 · @jachris · #3223
We changed the default connector of read <format> and write <format> for
all formats to stdin and stdout, respectively.
We removed language plugins in favor of operator-based integrations.
The interface of the operator, loader, parser, printer and saver plugins was changed.
Rename package artifacts from vast to tenzir
Section titled “Rename package artifacts from vast to tenzir”The Debian package for Tenzir replaces previous VAST installations and attempts
to migrate existing data from VAST to Tenzir in the process. You can opt-out of
this migration by creating the file /var/lib/vast/disable-migration.
Remove the prefix() function from the REST endpoint plugin API
Section titled “Remove the prefix() function from the REST endpoint plugin API”We removed the rest_endpoint_plugin::prefix() function from
the public API of the rest_endpoint_plugin class. For a migration,
existing users should prepend the prefix manually to all endpoints
defined by their plugin.
Rename default database directory to tenzir.db
Section titled “Rename default database directory to tenzir.db”Jun 9, 2023 · @dominiklohmann · #3212
The default database directory moved from vast.db to tenzir.db. Use the
option tenzir.db-directory to manually set the database directory path.
Remove lsvast
Section titled “Remove lsvast”Jun 9, 2023 · @dominiklohmann · #3211
The debugging utility lsvast no longer exists. Pipelines replace most of its
functionality.
Change Arrow extension type and metadata prefixes
Section titled “Change Arrow extension type and metadata prefixes”Jun 9, 2023 · @dominiklohmann · #3208
We now register extension types as tenzir.ip, tenzir.subnet, and
tenzir.enumeration instead of vast.address, vast.subnet, and
vast.enumeration, respectively. Arrow schema metadata now has a TENZIR:
prefix instead of a VAST: prefix.
Switch /status to POST
Section titled “Switch /status to POST”The HTTP method of the status endpoint in the experimental REST API is now POST.
Introduce the tenzir and tenzird binaries
Section titled “Introduce the tenzir and tenzird binaries”Jun 2, 2023 · @dominiklohmann · #3187
VAST is now called Tenzir. The tenzir binary replaces vast exec to execute a
pipeline. The tenzird binary replaces vast start to start a node. The
tenzirctl binary continues to offer all functionality that vast previously
offered until all commands have been migrated to pipeline operators.
Implement the serve operator and /serve endpoint
Section titled “Implement the serve operator and /serve endpoint”Jun 1, 2023 · @dominiklohmann · #3180
The default port of the web plugin changed from 42001 to 5160. This change avoids collisions from dynamic port allocation on Linux systems.
Rename #type to #schema and introduce #schema_id
Section titled “Rename #type to #schema and introduce #schema_id”Jun 1, 2023 · @jachris · #3183
The #type meta extractor was renamed to #schema.
Remove old commands
Section titled “Remove old commands”May 25, 2023 · @dominiklohmann · #3166
The stop command no longer exists. Shut down VAST nodes using CTRL-C instead.
The version command no longer exists. Use the more powerful version pipeline
operator instead.
The spawn source and spawn sink commands no longer exist. To import data
remotely, run a pipeline in the form of remote from … | … | import, and to
export data remotely, run a pipeline in the form of export | … | remote to ….
The lower-level peer, kill, and send commands no longer exist.
🐞 Bug Fixes
Section titled “🐞 Bug Fixes”Fix shutdown of sources and importer
Section titled “Fix shutdown of sources and importer”Jun 8, 2023 · @dominiklohmann · #3207
Import processes sometimes failed to shut down automatically when the node exited. They now shut down reliably.
Fix reconnect attempts for remote pipelines
Section titled “Fix reconnect attempts for remote pipelines”Jun 2, 2023 · @dominiklohmann · #3188
Starting a remote pipeline with vast exec failed when the node was not
reachable yet. Like other commands, executing a pipeline now waits until the
node is reachable before starting.
Add a changelog entry for the compaction fix
Section titled “Add a changelog entry for the compaction fix”We fixed a bug in the compation plugin that prevented it from applying the configured weights when it was used for the first time on a database.
Fix rare crash when transforming sliced nested arrays
Section titled “Fix rare crash when transforming sliced nested arrays”May 25, 2023 · @dominiklohmann · #3171
Using transformation operators like summarize, sort, put, extend, or
replace no longer sometimes crashes after a preceding head or tail
operator when referencing a nested field.
The tail operator sometimes returned more events than specified. This no
longer happens.