Download the release on GitHub.
Features
Section titled “Features”Add the enumerate operator
Section titled “Add the enumerate operator”The new enumerate
operator prepends a column with the row number of the input
records.
Add colors to JSON printer
Section titled “Add colors to JSON printer”The json
printer can now colorize its output by providing the
-C|--color-output
option, and explicitly disable coloring via
-M|--monochrome-output
.
Add show
operator
Section titled “Add show operator”The new show
source operator makes it possible to gather meta information
about Tenzir. For example, the provided introspection capabilities allow for
emitting existing formats, connectors, and operators.
Rename #type
to #schema
and introduce #schema_id
Section titled “Rename #type to #schema and introduce #schema_id”The new #schema_id
meta extractor returns a unique fingerprint for the schema.
Expose the batch
operator underlying rebuild
Section titled “Expose the batch operator underlying rebuild”The batch <limit>
operator allows expert users to control batch sizes in
pipelines explicitly.
By @dominiklohmann in #3391.
Revamp packet acquisition and parsing
Section titled “Revamp packet acquisition and parsing”The new nic
plugin provides a loader that acquires packets from a network
interface card using libpcap. It emits chunks of data in the PCAP file format so
that the pcap
parser can process them as if packets come from a trace file.
The new decapsulate
operator processes events of type pcap.packet
and emits
new events of type tenzir.packet
that contain the decapsulated PCAP packet
with packet header fields from the link, network, and transport layer. The
operator also computes a Community ID.
Add —append and —real-time to directory saver
Section titled “Add —append and —real-time to directory saver”The directory
saver now supports the two arguments -a|--append
and
-r|--realtime
that have the same semantics as they have for the file
saver:
open files in the directory in append mode (instead of overwriting) and flush
the output buffers on every update.
Use load -
and read json
as implicit sources
Section titled “Use load - and read json as implicit sources”Pipelines executed locally with tenzir
now use load -
and read json
as
implicit sources. This complements save -
and write json --pretty
as
implicit sinks.
By @dominiklohmann in #3329.
Fix sporadic stalling of pipelines
Section titled “Fix sporadic stalling of pipelines”The pipeline manager now accepts empty strings for the optional name
. The
/create
endpoint returns a list of diagnostics if pipeline creation fails,
and if start_when_created
is set, the endpoint now returns only after the
pipeline execution has been fully started. The /list
endpoint now returns
the diagnostics collected for every pipeline so far. The /delete
endpoint
now returns an empty object if the request is successful.
By @dominiklohmann in #3264.
Add a --schema
option to the JSON parser
Section titled “Add a --schema option to the JSON parser”The --schema
option for the JSON parser allows for setting the target schema
explicitly by name.
By @dominiklohmann in #3295.
Expose pipeline operator metrics in execution node and pipeline executor
Section titled “Expose pipeline operator metrics in execution node and pipeline executor”Pipeline metrics (total ingress/egress amount and average rate per second) are
now visible in the pipeline-manager
, via the metrics
field in the
/pipeline/list
endpoint result.
Implement top
and rare
Section titled “Implement top and rare”The top <field>
operator makes it easy to find the most common values for the
given field. Likewise, rare <field>
returns the least common values for the
given field.
By @dominiklohmann in #3176.
Implement the unflatten
operator
Section titled “Implement the unflatten operator”The unflatten [<separator>]
operator unflattens data structures by creating
nested records out of fields whose names contain a <separator>
.
Implement a sort
operator
Section titled “Implement a sort operator”The new sort
operator allows for arranging events by field, in ascending and
descending order. The current version is still “beta” and has known limitations.
By @dominiklohmann in #3155.
Add a --cumulative
option to the measure
operator
Section titled “Add a --cumulative option to the measure operator”The measure
operator now returns running totals with the --cumulative
option.
By @dominiklohmann in #3156.
Change summarize
to operate across schemas
Section titled “Change summarize to operate across schemas”The summarize
operator now works across multiple schemas and can combine
events of different schemas into one group. It now also treats missing columns
as having null
values.
The by
clause of summarize
is now optional. If it is omitted, all events
are assigned to the same group.
Add diagnostics (and some other improvements)
Section titled “Add diagnostics (and some other improvements)”In addition to tenzir "<pipeline>"
, there now is tenzir -f <file>
, which
loads and executes the pipeline defined in the given file.
The pipeline parser now emits helpful and visually pleasing diagnostics.
Implement the serve
operator and /serve
endpoint
Section titled “Implement the serve operator and /serve endpoint”The serve
operator and /serve
endpoint supersede the experimental /query
endpoint. The operator is a sink for events, and bridges a pipeline into a
RESTful interface from which events can be pulled incrementally.
By @dominiklohmann in #3180.
Apply the changes from the new pipeline_manager
plugin
Section titled “Apply the changes from the new pipeline_manager plugin”The new pipeline-manager is a proprietary plugin that allows for creating, updating and persisting pipelines. The included RESTful interface allows for easy access and modification of these pipelines.
Implement a fallback parser mechanism for extensions that don’t have …
Section titled “Implement a fallback parser mechanism for extensions that don’t have …”The json
parser now servers as a fallback parser for all files whose
extension do not have any default parser in Tenzir.
Avoid crashing when reading a pre-2.0 partition
Section titled “Avoid crashing when reading a pre-2.0 partition”The flatten [<separator>]
operator flattens nested data structures by joining
nested records with the specified separator (defaults to .
) and merging lists.
By @dominiklohmann in #3018.
PRs 3128-3173-3193
Section titled “PRs 3128-3173-3193”The sink operator import
persists events in a VAST node.
The source operator export
retrieves events from a VAST node.
The repeat
operator repeats its input a given number of times.
By @dominiklohmann in #3128.
Improve metrics (and some other things)
Section titled “Improve metrics (and some other things)”The sort
operator now also works for ip
and enum
fields.
tenzir --dump-metrics '<pipeline>'
prints a performance overview of the
executed pipeline on stderr at the end.
By @dominiklohmann in #3390.
A collection of minor UX improvements
Section titled “A collection of minor UX improvements”The --timeout
option for the vast status
command allows for defining how
long VAST waits for components to report their status. The option defaults to 10
seconds.
Unroll the Zeek TSV header parsing loop
Section titled “Unroll the Zeek TSV header parsing loop”The zeek-tsv
parser sometimes failed to parse Zeek TSV logs, wrongly
reporting that the header ended too early. This bug no longer exists.
Changes
Section titled “Changes”Rename package artifacts from vast to tenzir
Section titled “Rename package artifacts from vast to tenzir”The Debian package for Tenzir replaces previous VAST installations and attempts
to migrate existing data from VAST to Tenzir in the process. You can opt-out of
this migration by creating the file /var/lib/vast/disable-migration
.
Change Arrow extension type and metadata prefixes
Section titled “Change Arrow extension type and metadata prefixes”We now register extension types as tenzir.ip
, tenzir.subnet
, and
tenzir.enumeration
instead of vast.address
, vast.subnet
, and
vast.enumeration
, respectively. Arrow schema metadata now has a TENZIR:
prefix instead of a VAST:
prefix.
By @dominiklohmann in #3208.
Introduce the tenzir
and tenzird
binaries
Section titled “Introduce the tenzir and tenzird binaries”VAST is now called Tenzir. The tenzir
binary replaces vast exec
to execute a
pipeline. The tenzird
binary replaces vast start
to start a node. The
tenzirctl
binary continues to offer all functionality that vast
previously
offered until all commands have been migrated to pipeline operators.
By @dominiklohmann in #3187.
Delete delete_when_stopped
from the pipeline manager
Section titled “Delete delete_when_stopped from the pipeline manager”The delete_when_stopped
flag was removed from the pipeline manager REST API.
Transform read
and write
into parse
and print
Section titled “Transform read and write into parse and print”The parse
and print
operators have been renamed to read
and write
,
respectively. The read ... [from ...]
and write ... [to ...]
operators
are not available anymore. If you did not specify a connector, you can
continue using read ...
and write ...
in many cases. Otherwise, use
from ... [read ...]
and to ... [write ...]
instead.
Rename #type
to #schema
and introduce #schema_id
Section titled “Rename #type to #schema and introduce #schema_id”The #type
meta extractor was renamed to #schema
.
Tune defaults and demo-node experience
Section titled “Tune defaults and demo-node experience”We reduced the default batch-timeout
from ten seconds to one second in to
improve the user experience of interactive pipelines with data aquisition.
We reduced the default active-partition-timeout
from 5 minutes to 30 seconds
to reduce the time until data is persisted.
Remove old commands
Section titled “Remove old commands”The stop
command no longer exists. Shut down VAST nodes using CTRL-C instead.
The version
command no longer exists. Use the more powerful version
pipeline
operator instead.
The spawn source
and spawn sink
commands no longer exist. To import data
remotely, run a pipeline in the form of remote from … | … | import
, and to
export data remotely, run a pipeline in the form of export | … | remote to …
.
The lower-level peer
, kill
, and send
commands no longer exist.
By @dominiklohmann in #3166.
Remove lsvast
Section titled “Remove lsvast”The debugging utility lsvast
no longer exists. Pipelines replace most of its
functionality.
By @dominiklohmann in #3211.
Revamp packet acquisition and parsing
Section titled “Revamp packet acquisition and parsing”We reimplemented the old pcap
plugin as a format. The command tenzir-ctl import pcap
no longer works. Instead, the new pcap
plugin provides a parser
that emits pcap.packet
events, as well as a printer that generates a PCAP file
when provided with these events.
Add colors to JSON printer
Section titled “Add colors to JSON printer”We removed the --pretty
option from the json
printer. This option is now the
default. To switch to NDJSON, use -c|--compact-output
.
Change summarize
to operate across schemas
Section titled “Change summarize to operate across schemas”The aggregation functions in a summarize
operator can now receive only a
single extractor instead of multiple ones.
The behavior for absent columns and aggregations across multiple schemas was changed.
Remove the prefix()
function from the REST endpoint plugin API
Section titled “Remove the prefix() function from the REST endpoint plugin API”We removed the rest_endpoint_plugin::prefix()
function from
the public API of the rest_endpoint_plugin
class. For a migration,
existing users should prepend the prefix manually to all endpoints
defined by their plugin.
Implement the serve
operator and /serve
endpoint
Section titled “Implement the serve operator and /serve endpoint”The default port of the web plugin changed from 42001 to 5160. This change avoids collisions from dynamic port allocation on Linux systems.
By @dominiklohmann in #3180.
Switch /status to POST
Section titled “Switch /status to POST”The HTTP method of the status endpoint in the experimental REST API is now POST
.
Add diagnostics (and some other improvements)
Section titled “Add diagnostics (and some other improvements)”We changed the default connector of read <format>
and write <format>
for
all formats to stdin
and stdout
, respectively.
We removed language plugins in favor of operator-based integrations.
The interface of the operator, loader, parser, printer and saver plugins was changed.
Improve low-load memory consumption
Section titled “Improve low-load memory consumption”The default interval between two automatic rebuilds is now set to 2 hours and
can be configured with the rebuild-interval
option.
Remove previously deprecated options
Section titled “Remove previously deprecated options”The previously deprecated options tenzir.pipelines
(replaced with
tenzir.operators
) and tenzir.pipeline-triggers
(no replacement) no longer
exist.
The previously deprecated deprecated types addr
, count
, int
, and real
(replaced with ip
, uint64
, int64
, and double
, respectively) no longer
exist.
By @dominiklohmann in #3358.
Rename default database directory to tenzir.db
Section titled “Rename default database directory to tenzir.db”The default database directory moved from vast.db
to tenzir.db
. Use the
option tenzir.db-directory
to manually set the database directory path.
By @dominiklohmann in #3212.
Bug Fixes
Section titled “Bug Fixes”Fix shutdown of sources and importer
Section titled “Fix shutdown of sources and importer”Import processes sometimes failed to shut down automatically when the node exited. They now shut down reliably.
By @dominiklohmann in #3207.
Fix rare crash when transforming sliced nested arrays
Section titled “Fix rare crash when transforming sliced nested arrays”Using transformation operators like summarize
, sort
, put
, extend
, or
replace
no longer sometimes crashes after a preceding head
or tail
operator when referencing a nested field.
The tail
operator sometimes returned more events than specified. This no
longer happens.
By @dominiklohmann in #3171.
Add a changelog entry for the compaction fix
Section titled “Add a changelog entry for the compaction fix”We fixed a bug in the compation plugin that prevented it from applying the configured weights when it was used for the first time on a database.
Fix reconnect attempts for remote pipelines
Section titled “Fix reconnect attempts for remote pipelines”Starting a remote pipeline with vast exec
failed when the node was not
reachable yet. Like other commands, executing a pipeline now waits until the
node is reachable before starting.
By @dominiklohmann in #3188.