Changelog
This changelog documents all notable changes to Tenzir and is updated on every release.
v4.6.0
Changes
Ingress and egress metrics for pipelines now indicate whether the pipeline sent/received events to/from outside of the node with a new
internal
flag. For example, when using theexport
operator, data is entering the pipeline from within the node, so its ingress is considered internal. #3658We renamed the name of our python package from
pytenzir
totenzir
. #3660We renamed the
--bind
option of thezmq
connector to--listen
. #3664
Features
The
python
operator adds the ability to perform arbitrary event to event transformations with the full power of Python 3. #3592The operators
from
,to
,load
, andsave
support using URLs and file paths directly as their argument. For example,load https://example.com
meansload https https://example.com
, andsave local-file.json
meanssave file local-file.json
. #3608The new
--internal
flag for theexport
operators returns internal events collected by the system, for example pipeline metrics. #3619The
syslog
parser allows reading both RFC 5424 and RFC 3164 syslog messages. #3645Use
show
without an aspect to return information about all aspects of a node. #3650The new
yield
operator extracts nested records with the ability to unfold lists. #3651When using
from <URL>
andto <URL>
without specifying the format explicitly using aread
/write
argument, the default format is determined by the file extension for all loaders and savers, if possible. Previously, that was only done when using thefile
loader/saver. Additionally, if the file name would indicate some sort of compression (e.g..gz
), compression and decompression is performed automatically. For example,from https://example.com/myfile.yml.gz
is expanded toload https://example.com/myfile.yml.gz | decompress gzip | read yaml
automatically. #3653We added a new
tcp
connector that allows reading raw bytes from TCP or TLS connections. #3664The new, experimental
parse
operator applies a parser to the string stored in a given field. #3665We optimized the behavior of the 'serve' operator to respond quicker and cause less system load for pipelines that take a long time to generate the first result. The new
min_events
parameter can be used to implement long-polling behavior for clients of/serve
. #3666The new
apply
operator includes pipelines defined in other files. #3677Use
--allow-comments
with thexsv
parser (incl.csv
,tsv
, andssv
) to treat lines beginning with'#'
as comments. #3681The closed-source
context
plugin offers a backend functionality for finding matches between data sets. #3684The new
lookup-table
built-in is a hashtable-based contextualization algorithm that enriches events based on a unique value. #3684The JSON format has a new
--arrays-of-objects
parameter that allows for parsing a JSON array of JSON objects into an event for each object. #3684
Bug Fixes
export --live
now respects a subsequentwhere <expr>
instead of silently discarding the filter expression. #3619Using the
sort
operator with polymorphic inputs no longer leads to a failing assertion under some circumstances. #3655The
csv
,ssv
, andtsv
parsers now correctly support empty strings, lists, and null values. #3687The
tail
operator no longer hangs occasionally. #3687
v4.5.0
Changes
The operators
drop
,pseudonymize
,put
,extend
,replace
,rename
andselect
were converted from suffix matching to prefix matching and can therefore address records now. #3616Sparse indexes for time and bool fields are now always enabled, accelerating lookups against them. #3639
Features
The
api
source operator interacts with Tenzir's REST API without needing to spin up a web server, making all APIs accessible from within pipelines. #3630In
where <expression>
, the types of numeric literals and numeric fields in an equality or relational comparison must no longer match exactly. The literals+42
,42
or42.0
now compare against fields of typesint64
,uint64
, anddouble
as expected. #3634The
import
operator now flushes events to disk automatically before returning, ensuring that they are available immediately for subsequent uses of theexport
operator. #3638Lookups against uint64, int64, double, and duration fields now always use sparse indexes, which improves the performance of
export | where <expression>
for some expressions. #3639If the
summarize
operator has noby
clause, it now returns a result even if there is no input. For example,summarize num=count(.)
returns an event with{"num": 0}
. Aggregation functions which do not have a single default value, for example because it would depend on the input type, returnnull
. #3640The
tenzir.disable-plugins
option is a list of names of plugins and builtins to explicitly forbid from being used in Tenzir. For example, addingshell
will prohibit use of theshell
operator builtin, and addingkafka
will prohibit use of thekafka
connector plugin. This allows for a more fine-grained control than thetenzir.allow-unsafe-pipelines
option. #3642
Bug Fixes
The long option
--append
for thefile
anddirectory
savers now works as documented. Previously, only the short option worked correctly. #3629The
exporter.*
metrics will now be emitted in case the exporter finishes early. #3633
v4.4.0
Changes
The
string
type is now restricted to valid UTF-8 strings. Useblob
for arbitrary binary data. #3581The new
autostart
andautodelete
parameters for the pipeline manager supersede thestart_when_created
andrestart_with_node
parameters and extend restarting and deletion possibilities for pipelines. #3585
Features
The new
amqp
connector enables interaction with an AMQP 0-9-1 exchange, supporting working with messages as producer (saver) and consumer (loader). #3546The new
completed
pipeline state in the pipeline manager shows when a pipeline has finished execution. #3554If the node with running pipelines crashes, they will be marked as
failed
upon restarting. #3554The new
velociraptor
source supports submitting VQL queries to a Velociraptor server. The operator communicates with the server via gRPC using a mutually authenticated and encrypted connection with client certificates. For example,velociraptor -q "select * from pslist()"
lists processes and their running binaries. #3556The output of
show partitions
includes a newevents
field that shows the number of events kept in that partition. E.g., the pipelineshow partitions | summarize events=sum(events) by schema
shows the number of events per schema stored at the node. #3580The new
blob
type can be used to represent arbitrary binary data. #3581The new
ttl_expires_in_ns
shows the remaining time to live for a pipeline in the pipeline manager. #3585The new
yara
operator matches Yara rules on byte streams, producing structured events when rules match. #3594show serves
displays all currently active serve IDs in the/serve
API endpoint, showing an overview of active pipelines with an on-demand API. #3596The
export
operator now has a--live
option to continuously emit events as they are imported instead of those that already reside in the database. #3612
Bug Fixes
Pipelines ending with the
serve
operator no longer incorrectly exit 60 seconds after transferring all events to the/serve
endpoint, but rather wait until all events were fetched from the endpoint. #3562Shutting down a node immediately after starting it now no longer waits for all partitions to be loaded. #3562
When using
read json
, incomplete objects (e.g., due to truncated files) are now reported as an error instead of silently discarded. #3570Having duplicate field names in
zeek-tsv
data no longer causes a crash, but rather errors out gracefully. #3578The
csv
parsed (or more generally, thexsv
parser) now attempts to parse fields in order to infer their types. #3582A regression in Tenzir v4.3 caused exports to often consider all partitions as candidates. Pipelines of the form
export | where <expr>
now work as expected again and only load relevant partitions from disk. #3599The long option
--skip-empty
forread lines
now works as documented. #3599The
zeek-tsv
parser is now able to handle fields of typesubnet
correctly. #3606
v4.3.0
Changes
We made it easier to reuse the default
zmq
socket endpoint by disabling socket lingering, and thereby immediately relinquishing resources when terminating a ZeroMQ pipeline. Changing the linger period from infinite to 0 no longer buffers pending messages in memory after closing a ZeroMQ socket. #3536Tenzir no longer builds dense indexes for imported events. Dense indexes improved query performance at the cost of a higher memory usage. However, over time the performance improvement became smaller due to other improvements in the underlying storage engine. #3552
Tenzir no longer supports models in taxonomies. Since Tenzir v4.0 they were only supported in the deprecated
tenzir-ctl export
andtenzir-ctl count
commands. We plan to bring the functionality back in the future with more powerful expressions in TQL. #3552
Features
The
yaml
format supports reading and writing YAML documents and streams. #3456The new
fluent-bit
source and sink operator provide and interface to the Fluent Bit ecosystem. The source operator maps to a Fluent Bit input and the sink operator to a Fluent Bit output. #3461 @fluent @bitThe performance of the
json
,suricata
andzeek-json
parsers was improved. #3503The
json
parser has a new--raw
flag, which uses the raw type of JSON values instead of trying to infer one. For example, strings with ip addresses are given the typestring
instead ofip
. #3503A dedicated
null
type was added. #3503Empty records are now allowed. Operators that previously discarded empty records (for example,
drop
) now preserve them. #3503The pipeline manager now supports user-provided labels for pipelines. #3541
Bug Fixes
The
json
,suricata
andzeek-json
parsers are now more stable and should now parse all inputs correctly. #3503null
records are no longer incorrectly transformed into records withnull
fields anymore. #3503The type of the
quic.version
field in the built-insuricata.quic
schema was fixed. It now is a string instead of an integer. #3533The
http
loader no longer ignores the value user-provided custom headers. #3535The
parquet
andfeather
formats no longer throw assertions during normal usage anymore. #3537The
zeek.software
does not contain an incompleteversion
record type anymore. #3538The
version.minor
type in thezeek.software
schema is now auint64
instead of adouble
to comply with Zeek's version structure. #3538The web server will not crash when receiving requests during shutdown anymore. #3553
v4.2.0
Changes
The long option name
--emit-file-header
of thepcap
parser is now called--emit-file-headers
(plural) to streamline it with thenic
loader and the new capability to process concatenated PCAP files. #3513The
decapsulate
operator no longer drops the PCAP packet data in incoming events. #3515
Features
The new
s3
connector enables the user to import/export file data from/to S3 buckets. #3496The new
zmq
connector ships with a saver and loader for interacting with ZeroMQ. The loader (source) implements a connectingSUB
socket and the saver (sink) a bindingPUB
socket. The--bind
or--connect
flags make it possible to control the direction of connection establishment. #3497The new
gcs
connector enables the user to import/export file data from/to GCS buckets. #3498The new connectors
http
,https
,ftp
, andftps
simplify using remote files in pipelines via HTTP(S) and FTP(S). #3499The new
lines
parser splits its input at newline characters and produces events with a single field containing the line. #3511The
pcap
parser can now process a stream of concatenated PCAP files. On the command line, you can now parse traces withcat *.pcap | tenzir 'read pcap'
. When providing--emit-file-headers
, each intermediate file header yields a separate event. #3513The
nic
loader has a new option--emit-file-headers
that prepends a PCAP file header for every batch of bytes that the loader produces, yielding a stream of concatenated PCAP files. #3513You can now write
show nics
to get a list of network interfaces. Useshow nics | select name
to a get a list of possible interface names forfrom nic
. #3517
Bug Fixes
- Pipelines now show up in the "stopped" instead of the "created" state after the node restarted. #3487
v4.1.0
Changes
- The
version
operator no longer exists. Useshow version
to get the Tenzir version instead. The additional information thatversion
produced is now available asshow build
,show dependencies
, andshow plugins
. #3442
Features
The new
sigma
operator filters its input with Sigma rules and outputs matching events alongside the matched rule. #3138The
compress [--level <level>] <codec>
anddecompress <codec>
operators enable streaming compression and decompression in pipelines forbrotli
,bz2
,gzip
,lz4
, andzstd
. #3443The
show config
aspect returns the configuration currently in use, combining options set in the configuration file, the command-line, environment options. #3455The new
show pipelines
aspect displays a list of all managed pipelines. #3457The
pause
action in the/pipeline/update
endpoint suspends a pipeline and sets its state topaused
. Resume it with thestart
action. #3471Newly created pipelines are now in a new
created
rather thanstopped
state. #3471The
rendered
field in the pipeline manager diagnostics delivers a displayable version of the diagnostic's error message. #3479Pipelines that encounter an error during execution are now in a new
failed
rather thanstopped
state. #3479
Bug Fixes
Pipeline operators that create output independent of their input now emit their output instantly instead of waiting for receiving further input. This makes the
shell
operator more reliable. #3470The
show <aspect>
operator wrongfully required unsafe pipelines to be allowed for some aspects. This is now fixed. #3470
v4.0.1
Features
- It is now possible to replace the schema name with
replace #schema="new_name"
. #3451
v4.0.0
Breaking Changes
The
stop
command no longer exists. Shut down VAST nodes using CTRL-C instead. #3166The
version
command no longer exists. Use the more powerfulversion
pipeline operator instead. #3166The
spawn source
andspawn sink
commands no longer exist. To import data remotely, run a pipeline in the form ofremote from … | … | import
, and to export data remotely, run a pipeline in the form ofexport | … | remote to …
. #3166The lower-level
peer
,kill
, andsend
commands no longer exist. #3166The
#type
meta extractor was renamed to#schema
. #3183VAST is now called Tenzir. The
tenzir
binary replacesvast exec
to execute a pipeline. Thetenzird
binary replacesvast start
to start a node. Thetenzirctl
binary continues to offer all functionality thatvast
previously offered until all commands have been migrated to pipeline operators. #3187The Debian package for Tenzir replaces previous VAST installations and attempts to migrate existing data from VAST to Tenzir in the process. You can opt-out of this migration by creating the file
/var/lib/vast/disable-migration
. #3203We removed the
rest_endpoint_plugin::prefix()
function from the public API of therest_endpoint_plugin
class. For a migration, existing users should prepend the prefix manually to all endpoints defined by their plugin. #3221We changed the default connector of
read <format>
andwrite <format>
for all formats tostdin
andstdout
, respectively. #3223We removed language plugins in favor of operator-based integrations. #3223
The interface of the operator, loader, parser, printer and saver plugins was changed. #3223
The aggregation functions in a
summarize
operator can now receive only a single extractor instead of multiple ones. #3250The behavior for absent columns and aggregations across multiple schemas was changed. #3250
We reimplemented the old
pcap
plugin as a format. The commandtenzir-ctl import pcap
no longer works. Instead, the newpcap
plugin provides a parser that emitspcap.packet
events, as well as a printer that generates a PCAP file when provided with these events. #3263The
delete_when_stopped
flag was removed from the pipeline manager REST API. #3292We removed the
--pretty
option from thejson
printer. This option is now the default. To switch to NDJSON, use-c|--compact-output
. #3343The previously deprecated options
tenzir.pipelines
(replaced withtenzir.operators
) andtenzir.pipeline-triggers
(no replacement) no longer exist. #3358The previously deprecated deprecated types
addr
,count
,int
, andreal
(replaced withip
,uint64
,int64
, anddouble
, respectively) no longer exist. #3358The
parse
andprint
operators have been renamed toread
andwrite
, respectively. Theread ... [from ...]
andwrite ... [to ...]
operators are not available anymore. If you did not specify a connector, you can continue usingread ...
andwrite ...
in many cases. Otherwise, usefrom ... [read ...]
andto ... [write ...]
instead. #3365
Changes
The default port of the web plugin changed from 42001 to 5160. This change avoids collisions from dynamic port allocation on Linux systems. #3180
The HTTP method of the status endpoint in the experimental REST API is now
POST
. #3194We now register extension types as
tenzir.ip
,tenzir.subnet
, andtenzir.enumeration
instead ofvast.address
,vast.subnet
, andvast.enumeration
, respectively. Arrow schema metadata now has aTENZIR:
prefix instead of aVAST:
prefix. #3208The debugging utility
lsvast
no longer exists. Pipelines replace most of its functionality. #3211The default database directory moved from
vast.db
totenzir.db
. Use the optiontenzir.db-directory
to manually set the database directory path. #3212We reduced the default
batch-timeout
from ten seconds to one second in to improve the user experience of interactive pipelines with data aquisition. #3320We reduced the default
active-partition-timeout
from 5 minutes to 30 seconds to reduce the time until data is persisted. #3320The default interval between two automatic rebuilds is now set to 2 hours and can be configured with the
rebuild-interval
option. #3377
Features
The
flatten [<separator>]
operator flattens nested data structures by joining nested records with the specified separator (defaults to.
) and merging lists. #3018The sink operator
import
persists events in a VAST node. #3128 #3173 #3193The source operator
export
retrieves events from a VAST node. #3128 #3173 #3193The
repeat
operator repeats its input a given number of times. #3128 #3173 #3193The new
enumerate
operator prepends a column with the row number of the input records. #3142The new
sort
operator allows for arranging events by field, in ascending and descending order. The current version is still "beta" and has known limitations. #3155The
measure
operator now returns running totals with the--cumulative
option. #3156The
--timeout
option for thevast status
command allows for defining how long VAST waits for components to report their status. The option defaults to 10 seconds. #3162The new pipeline-manager is a proprietary plugin that allows for creating, updating and persisting pipelines. The included RESTful interface allows for easy access and modification of these pipelines. #3164
The
top <field>
operator makes it easy to find the most common values for the given field. Likewise,rare <field>
returns the least common values for the given field. #3176The
serve
operator and/serve
endpoint supersede the experimental/query
endpoint. The operator is a sink for events, and bridges a pipeline into a RESTful interface from which events can be pulled incrementally. #3180The new
#schema_id
meta extractor returns a unique fingerprint for the schema. #3183In addition to
tenzir "<pipeline>"
, there now istenzir -f <file>
, which loads and executes the pipeline defined in the given file. #3223The pipeline parser now emits helpful and visually pleasing diagnostics. #3223
The
summarize
operator now works across multiple schemas and can combine events of different schemas into one group. It now also treats missing columns as havingnull
values. #3250The
by
clause ofsummarize
is now optional. If it is omitted, all events are assigned to the same group. #3250The new
nic
plugin provides a loader that acquires packets from a network interface card using libpcap. It emits chunks of data in the PCAP file format so that thepcap
parser can process them as if packets come from a trace file. #3263The new
decapsulate
operator processes events of typepcap.packet
and emits new events of typetenzir.packet
that contain the decapsulated PCAP packet with packet header fields from the link, network, and transport layer. The operator also computes a Community ID. #3263The pipeline manager now accepts empty strings for the optional
name
. The/create
endpoint returns a list of diagnostics if pipeline creation fails, and ifstart_when_created
is set, the endpoint now returns only after the pipeline execution has been fully started. The/list
endpoint now returns the diagnostics collected for every pipeline so far. The/delete
endpoint now returns an empty object if the request is successful. #3264The
zeek-tsv
parser sometimes failed to parse Zeek TSV logs, wrongly reporting that the header ended too early. This bug no longer exists. #3291The
--schema
option for the JSON parser allows for setting the target schema explicitly by name. #3295The
unflatten [<separator>]
operator unflattens data structures by creating nested records out of fields whose names contain a<separator>
. #3304Pipelines executed locally with
tenzir
now useload -
andread json
as implicit sources. This complementssave -
andwrite json --pretty
as implicit sinks. #3329The
json
printer can now colorize its output by providing the-C|--color-output
option, and explicitly disable coloring via-M|--monochrome-output
. #3343Pipeline metrics (total ingress/egress amount and average rate per second) are now visible in the
pipeline-manager
, via themetrics
field in the/pipeline/list
endpoint result. #3376The
directory
saver now supports the two arguments-a|--append
and-r|--realtime
that have the same semantics as they have for thefile
saver: open files in the directory in append mode (instead of overwriting) and flush the output buffers on every update. #3379The
sort
operator now also works forip
andenum
fields. #3390tenzir --dump-metrics '<pipeline>'
prints a performance overview of the executed pipeline on stderr at the end. #3390The
batch <limit>
operator allows expert users to control batch sizes in pipelines explicitly. #3391The new
show
source operator makes it possible to gather meta information about Tenzir. For example, the provided introspection capabilities allow for emitting existing formats, connectors, and operators. #3414The
json
parser now servers as a fallback parser for all files whose extension do not have any default parser in Tenzir. #3422
Bug Fixes
Using transformation operators like
summarize
,sort
,put
,extend
, orreplace
no longer sometimes crashes after a precedinghead
ortail
operator when referencing a nested field. #3171The
tail
operator sometimes returned more events than specified. This no longer happens. #3171We fixed a bug in the compation plugin that prevented it from applying the configured weights when it was used for the first time on a database. #3185
Starting a remote pipeline with
vast exec
failed when the node was not reachable yet. Like other commands, executing a pipeline now waits until the node is reachable before starting. #3188Import processes sometimes failed to shut down automatically when the node exited. They now shut down reliably. #3207
v3.1.0
Changes
The
/query
REST endpoint no longer accepts an expression at the start of the query. Instead, usewhere <expr> | ...
. #3015As already announced with the VAST v3.0 release, the
vast.pipeline-triggers
option now no longer functions. The feature will be replaced with node ingress/egress pipelines that fit better into a multi-node model than the previous feature that was built under the assumption of a client/server model with a single server. #3052The bundled systemd service is now configured to restart VAST in case of a failure. #3058
The
vast.operators
section in the configuration file supersedes the now deprecatedvast.pipelines
section and more generally enables user-defined operators. Defined operators now must use the new, textual format introduced with VAST v3.0, and are available for use in all places where pipelines are supported. #3067The
exporter.*
metrics no longer exist, and will return in a future release as a more generic instrumentation mechanism for all pipelines. #3076
Features
The
put
operator is the new companion to the existingextend
andreplace
operators. It specifies the output fields exactly, referring either to input fields with an extractor, metadata with a selector, or a fixed value. #3036 #3039 #3089The
extend
andreplace
operators now support assigning extractors and selectors in addition to just fixed values. #3036 #3039 #3089The new
tail
pipeline operator limits all latest events to a specified number. The operator takes the limit as an optional argument, with the default value being 10. #3050The newly-added
unique
operator removes adjacent duplicates. #3051User-defined operator aliases make pipelines easier to use by enabling users to encapsulate a pipelinea into a new operator. To define a user-defined operator alias, add an entry to the
vast.operators
section of your configuration. #3064Compaction now makes use of the new pipeline operators, and allows pipelines to be defined inline instead in addition to the now deprecated
vast.pipelines
configuration section. #3064The
count_distinct
aggregation function returns the number of distinct, non-null values. #3068The
vast export
command now accepts the new pipelines as input. Furthermore,vast export <expr>
is now deprecated in favor ofvast export 'where <expr>'
. #3076The new
from <connector> [read <format>]
,read <format> [from <connector>]
,write <format> [to <connector>]
, andto <connector> [write <format>]
operators bring together a connector and a format to prduce and consume events, respectively. Their lower-level building blocksload <connector>
,parse <format>
,print <format>
, andsave <connector>
enable expert users to operate on raw byte streams directly. #3079The new
file
connector enables the user to process file input/output as data in a pipeline. This includes regular files, UDS files as well asstdin/stdout
. #3085 #3088 #3097The
inspect
operator replaces the events or bytes it receives with incremental metrics describing the input. #3093The new
directory
sink creates a directory with a file for each schema in the specified format. #3098The
feather
andparquet
formats allow for reading and writing events from and to the Apache Feather V2 and Apache Parquet files, respectively. #3103The
xsv
format enables the user to parse and print character-separated values, with the additionalcsv
,tsv
andssv
formats as sane defaults. #3104The
cef
parser allows for using the CEF format with the new pipelines. #3110The
zeek-tsv
format parses and prints Zeek's native tab-separated value (TSV) representation of logs. #3114Pipelines may now span across multiple processes. This will enable upcoming operators that do not just run locally in the
vast exec
process, but rather connect to a VAST node and partially run in that node. The new operator modifiersremote
andlocal
allow expert users to control where parts of their pipeline run explicitly, e.g., to offload compute to a more powerful node. Potentially unsafe use of these modifiers requires settingvast.allow-unsafe-pipelines
totrue
in the configuration file. #3119The
vast exec
command now supports implicit sinks for pipelines that end in events or bytes:write json --pretty
andsave file -
, respectively. #3123The
--pretty
option for the JSON printer enables multi-line output. #3123The new
version
source operator yields a single event containing VAST's version and a list of enabled plugins. #3123
Bug Fixes
VAST incorrectly handled subnets using IPv6 addresses for which an equivalent IPv4 address existed. This is now done correctly. For example, the query
where :ip !in ::ffff:0:0/96
now returns all events containing an IP address that cannot be represented as an IPv4 address. As an additional safeguard, the VAST language no longer allows for constructing subnets for IPv4 addresses with lengths greater than 32. #3060The
distinct
function silently performed a different operation on lists, returning the distinct non-null elements in the list rather than operating on the list itself. This special-casing no longer exists, and instead the function now operates on the lists itself. This feature will return in the future as unnesting on the extractor level viadistinct(field[])
, but for now it has to go to make thedistinct
aggregation function work consistently. #3068Tokens created with
vast web generate-token
now persist correctly, and work across restarts of VAST. #3086The matcher plugin no longer causes deadlocks through detached matcher clients. #3115
The
tenzir/vast
image now listens on0.0.0.0:5158
instead of127.0.0.1:5158
by default, which aligns the behavior with thetenzir/vast-slim
image. #3137Some pipelines in compaction caused transformed partitions to be treated as if they were older than they were supposed to be, causing them to be picked up again for deletion too early. This bug no longer exists, and compacted partitions are now considered at most as old as the oldest event before compaction. #3141
The
rebuilder.partitions.remaining
metric sometimes reported wrong values when partitions for at least one schema did not need to be rebuilt. We aligned the metrics with the actual functionality. #3147
v3.0.4
Bug Fixes
Automatic rebuilds now correctly consider only outdated or undersized partitions. #3083
The
--all
flag for therebuild
command now consistently causes all partitions to be rebuilt, aligning its functionality with its documentation. #3083
v3.0.3
Changes
VAST now depends on the Boost C++ libraries. #3043
VAST's rebuilding and compaction features now interfere less with queries. This patch was also backported as VAST v2.4.2 to enable a smoother upgrade from to VAST v3.x. #3047
Features
- The new
vast exec
command executes a pipeline locally. It takes a single argument representing a closed pipeline, and immediately executes it. This is the foundation for a new, pipeline-first VAST, in which most operations are expressed as pipelines. #3004#3010
v3.0.2
Bug Fixes
- VAST no longer miscalculates the
rebuild
metrics. #3026
v3.0.1
Features
- The VAST language now supports comments using the familiar
/* comment */
notation. This makes it easy to document multi-line pipelines inline. #3011
Bug Fixes
VAST no longer crashes when reading an unsupported partition from VAST v1.x. Instead, the partition is ignored correctly. Since v2.2 VAST automatically rebuilds partitions in the background to ensure compatibility. #3018
Automatic partition rebuilding both updates partitions with an outdated storage format and merges undersized partitions continuously in the background. This now also works as expected for outdated but not undersized partitions. #3020
v3.0.0
Breaking Changes
The match operator
~
, its negation!~
, and thepattern
type no longer exist. Use queries of the formslhs == /rhs/
andlhs != /rhs/
instead for queries using regular expressions. #2769 #2873vast status
does not work anymore with an embedded node (i.e., spawned with the-N
parameter). #2771The
#field
meta extractor no longer exists. UseX != null
over#field == "X"
to check for existence for the fieldX
. #2776VAST no longer supports reading partitions created with VAST versions older than VAST v2.2. Since VAST v2.2, VAST continuously upgrades old partitions to the most recent internal format while running. #2778 #2797 #2798
We removed the broker plugin that enabled direct Zeek 3.x log transfer to VAST. The plugin will return in the future rewritten for Zeek 5+. #2796
VAST now ignores the previously deprecated options
vast.meta-index-fp-rate
,vast.catalog-fp-rate
,vast.transforms
andvast.transform-triggers
. Similarly, settingvast.store-backend
tosegment-store
now results in an error rather than a graceful fallback to the default store. #2832Boolean literals in expressions have a new syntax:
true
andfalse
replace the old representationsT
andF
. For example, the querysuricata.alert.alerted == T
is no longer valid; usesuricata.alert.alerted == true
instead. #2844The builtin types
count
,int
,real
, andaddr
were renamed touint64
,int64
,double
, andip
respectively. For backwards-compatibility, VAST still supports parsing the old type tokens in schema files. #2864The
explore
andpivot
commands are now unavailable. They will be reintroduced as pipeline operators in the future. #2898For the experimental REST API, the result format of the
/export
endpoint was modified: Thenum_events
key was renamed tonum-events
, and theversion
key was removed. #2899The
map
type no longer exists: instead ofmap<T, U>
, use the equivalentlist<record{ key: T, value: U }>
. #2976We renamed the
identity
operator topass
. #2980The REST API does not contain the
/export
and/export/with-schemas
endpoints anymore. Any previous queries using those endpoints have to be sent to the/query
endpoint now. #2990From now on VAST will use TCP port 5158 for its native inter process communication. This change avoids collisions from dynamic port allocation on Linux systems. #2998
The non-value literal in expressions has a new syntax:
null
replaces its old representationnil
. For example, the queryx != nil
is no longer valid; usex != null
instead. #2999The
vast.pipeline-triggers
option is deprecated; while it continues to work as-is, support for it will be removed in the next release. Use the new inline import and export pipelines instead. They will return as more generally applicable node ingress and egress pipelines in the future. #3008
Changes
VAST now comes with a role definition for Ansible. You can find it directly in the
ansible
subdirectory. #2604Building VAST now requires CAF 0.18.7. VAST supports setting advanced options for CAF directly in its configuration file under the
caf
section. If you were using any of these, compare them against the bundledvast.yaml.example
file to see if you need to make any changes. The change has (mostly positive) performance and stability implications throughout VAST, especially in high-load scenarios. #2693 #2923OpenSSL is now a required dependency. #2719
vast status
no longer shows type registry-related information. Instead, refer tovast show
for detailed type metadata information. #2745Blocking imports now imply that ingested data gets persisted to disk before the the
vast import
process exits. #2807 #2848Plugin names are now case-insensitive. #2832
The per-schema event distribution moved from
index.statistics.layouts
tocatalog.schemas
, and additionally includes information about the import time range and the number of partitions VAST knows for the schema. The number of events per schema no longer includes events that are yet unpersisted. #2852The bundled Zeek schema no longer includes the
_path
field included in Zeek JSON. Use#type == "zeek.foo"
over_path == "foo"
for querying data ingested usingvast import zeek-json
. #2887We removed the frontend prototype bundled with the web plugin Some parts of the frontend that we have in development are designed to be closed-source, and it is easier to develop at the current development stage in a single repository that is not bound to the release process of VAST itself. An open-source version of the frontend may return in the future. #2922 #2927
Features
The
cef
import format allows for reading events in the Common Event Format (CEF) viavast import cef < cef.log
. #2216VAST installations and packages now include Python bindings in a site-package under
<install-prefix>/lib/python*/site-packages/vast
. #2636VAST now imports Arrow IPC data, which is the same format it already supports for export. #2707
The new
pseudonymize
pipeline operator pseudonymizes IP addresses in user-specified fields. #2719We now offer a
tenzir/vast-slim
image as an alternative to thetenzir/vast
image. The image is minimal in size and supports the same features as the regular image, but does not support building additional plugins against it and mounting in additional plugins. #2742The new
/query
endpoint for the experimental REST API allows users to receive query data in multiple steps, as opposed to a oneshot export. #2766Queries of the forms
:string == /pattern/
,field == /pattern/
,#type == /pattern/
, and their respective negations now work as expected. #2769The
/export
family of endpoints now accepts an optionalpipeline
parameter to specify an ad-hoc pipeline that should be applied to the exported data. #2773We changed VAST client processes to attempt connecting to a VAST server multiple times until the configured connection timeout (
vast.connection-timeout
, defaults to 5 minutes) runs out. A fixed delay between connection attempts (vast.connection-retry-delay
, defaults to 3 seconds) ensures that clients to not stress the server too much. Set the connection timeout to zero to let VAST client attempt connecting indefinitely, and the delay to zero to disable the retry mechanism. #2835The JSON export format gained the options
--omit-empty-records
,--omit-empty-lists
, and--omit-empty-maps
, which cause empty records, lists, and maps not to be rendered respectively. The options may be combined together with the existing--omit-nulls
option. Use--omit-empty
to set all four flags at once. #2856The
export
andimport
commands now support an optional pipeline string that allows for chaining pipeline operators together and executing such a pipeline on outgoing and incoming data. This feature is experimental and the syntax is subject to change without notice. New operators are only available in the new pipeline syntax, and the old YAML syntax is deprecated. #2877 #2904 #2907The new
head
andtaste
operators limit results to the specified number of events. Thehead
operator applies this limit for all events, and thetaste
operator applies it per schema. Both operators take the limit as an optional argument, with the default value being 10. #2891The experimental web frontend now correctly responds to CORS preflight requests. To configure CORS behavior, the new
vast.web.cors-allowed-origin
config option can be used. #2944Patterns now support case insensitivity by adding
i
to the pattern string, e.g./^\w{3}$/i
. #2951The
sigma
plugin now treats Sigma strings as case-insensitive patterns during the transpilation process. #2974The experimental web plugin now serves its own API specification at the new '/openapi.json' endpoint. #2981
Extractors such as
x
and:T
can now expand to the predicatesx != null
and:T != null
, respectively. #2984
Bug Fixes
Attempting to connect with thousands of clients around the same time sometimes crashed the VAST server. This no longer occurs. #2693
The
replace
andextend
pipeline operators wrongly inferred IP address, subnet, pattern, and map values as strings. They are now inferred correctly. To force a value to be inferred as a string, wrap it inside double quotes. #2768VAST now shuts down instantly when metrics are enabled instead of being held alive for up to the duration of the telemetry interval (10 seconds). #2832
The web plugin now reacts correctly to CTRL-C by stopping itself. #2860
VAST no longer ignores existing PID lock files on Linux. #2861
The start commands specified with the
vast.start.commands
option are now run aynchronously. This means that commands that block indefinitely will no longer prevent execution of subsequent commands, and allow for correct signal handling. #2868The Zeek TSV reader now respects the schema files in the bundled
zeek.schema
file, and produces data of the same schema as the Zeek JSON reader. E.g., instead of producing a top-level ip fieldid.orig_h
, the reader now produces a top-level record fieldid
that contains the ip fieldorig_h
, effectively unflattening the data. #2887Pipelines that reduce the number of events do not prevent
vast export
processes that have amax-events
limit from terminating any more. #2896We fixed incorrect printing of human-readable durations in some edge cases. E.g., the value 1.999s was rendered as 1.1s instead of the expected 2.0s. This bug affected the JSON and CSV export formats, and all durations printed in log messages or the status command. #2906
Options passed in the
caf.openssl
section in the configuration file or asVAST_CAF__OPENSSL__*
environment variables are no longer ignored. #2908The VAST client will now terminate properly when using the
count
command with a query which delivers zero results. #2924VAST no longer crashes when it encounters an invalid type expression in a schema. #2977
Compaction now retries immediately on failure instead of waiting for the configured scan interval to expire again. #3006
v2.4.2
Changes
- VAST's rebuilding and compaction features now interfere less with queries. #3047
v2.4.1
Features
- VAST's Feather store now yields initial results much faster and performs better when running queries affecting a large number of partitions by doing smaller incremental disk reads as needed rather than one large disk read upfront. #2805
v2.4.0
Changes
VAST now emits per-component memory usage metrics under the keys
index.memory-usage
andcatalog.memory-usage
. #2471We changed the default VAST endpoint from
localhost
to127.0.0.1
. This ensures the listening address is deterministic and not dependent on the host-specific IPv4 and IPv6 resolution. For example, resolvinglocalhost
yields a list of addresses, and if VAST fails to bind on the first (e.g., to due to a lingering socket) it would silently go to the next. Taking name resolution out of the equation fixes such issues. Set the optionvast.endpoint
to override the default endpoint. #2512Building VAST from source now requires CMake 3.19 or greater. #2582
The default store backend of VAST is now
feather
. Reading from VAST's customsegment-store
backend is still transparently supported, but new partitions automatically write to the Apache Feather V2 backend instead. #2587We removed PyVAST from the code base in favor of the new Python bindings. PyVAST continues to work as a thin wrapper around the VAST binary, but will no longer be released alongside VAST. #2674
Building VAST from source now requires Apache Arrow 10.0 or newer. #2685
The
vast dump
command is now calledvast show
. #2686VAST now loads all plugins by default. To revert to the old behavior, explicitly set the
vast.plugins
option to have no value. #2689
Features
We now distribute VAST also as Debian Package with every new release. The Debian package automatically installs a systemd service and creates a
vast
user for the VAST process. #2513 #2738VAST Cloud has now a MISP plugin that enables to add a MISP instance to the cloud stack. #2548
The new experimental web plugin offers a RESTful API to VAST and a bundled web user interface in Svelte. #2567 #2614 #2638 #3681
VAST now emits metrics for filesystem access under the keys
posix-filesystem.{checks,writes,reads,mmaps,erases,moves}.{successful,failed,bytes}
. #2572VAST now ships a Docker Compose file. In particular, the Docker Compose stack now has a TheHive integration that can run VAST queries as a Cortex Analyzer. #2574 #2652
VAST Cloud can now expose HTTP services using Cloudflare Access. #2578
Rebuilding partitions now additionally rebatches the contained events to
vast.import.batch-size
events per batch, which accelerates queries against partitions that previously had undersized batches. #2583VAST has a new configuration setting,
vast.zstd-compression-level
, to control the compression level of the Zstd algorithm used in both the Feather and Parquet store backends. The default level is set by the Apache Arrow library, and for Parquet is no longer explicitly defaulted to9
. #2623VAST has three new metrics:
catalog.num-partitions-total
,catalog.num-events-total
, andingest-total
that sum up all schema-based metrics by their respective schema-based metric counterparts. #2682Queries without acceleration from a dense index run significantly faster, e.g., initial tests show a 2x performance improvement for substring queries. #2730
Bug Fixes
VAST now skips unreadable partitions while starting up, instead of aborting the initialization routine. #2515
Rebuilding of heterogeneous partition no longer freezes the entire rebuilder on pipeline failures. #2530
VAST no longer attempts to hard-kill itself if the shutdown did not finish within the configured grace period. The option
vast.shutdown-grace-period
no longer exists. We recommend settingTimeoutStopSec=180
in the VAST systemd service definition to restore the previous behavior. #2568The error message on connection failure now contains a correctly formatted target endpoint. #2609
The UDS metrics sink no longer deadlocks due to suspended listeners. #2635
VAST now ejects partitions from the LRU cache if they fail to load with an I/O error. #2642
The systemd service no longer fails if the home directory of the vast user is not in
/var/lib/vast
. #2734
v2.3.1
Bug Fixes
We fixed an indefinite hang that occurred when attempting to apply a pipeline to a partition that is not a valid flatbuffer. #2624
VAST now properly regenerates any corrupted, oversized partitions it encounters during startup, provided that the corresponding store files are available. These files could be produced by versions up to and including VAST v2.2, when using configurations with an increased maximum partition size. #2631
v2.3.0
Changes
We improved the operability of VAST servers under high load from automated low-priority queries. VAST now considers queries issued with
--low-priority
, such as automated retro-match queries, with even less priority compared to regular queries (down from 33.3% to 4%) and internal high-priority queries used for rebuilding and compaction (down from 12.5% to 1%). #2484The default value for
vast.active-partition-timeout
is now 5 minutes (down from 1 hour), causing VAST to persist underful partitions earlier. #2493We split the
vast rebuild
command into two:vast rebuild start
andvast rebuild stop
. Rebuild orchestration now runs server-side, and only a single rebuild may run at a given time. We also made it more intuitive to use:--undersized
now implies--all
, and a new--detached
option allows for running rebuilds in the background. #2493
Features
VAST's partition indexes are now optional, allowing operators to control the trade-off between disk-usage and query performance for every field. #2430
We can now use matchers in AWS using the vast-cloud CLI matcher plugin. #2473
VAST now continuously rebuilds outdated and merges undersized partitions in the background. The new option
vast.automatic-rebuild
controls how many resources to spend on this. To disable this behavior, set the option to 0; the default is 1. #2493Rebuilding now emits metrics under the keys
rebuilder.partitions.{remaining,rebuilding,completed}
. Thevast status rebuild
command additionally shows information about the ongoing rebuild. #2493The new
vast.connection-timeout
option allows for configuring the timeout VAST clients use when connecting to a VAST server. The value defaults to 10s; setting it to a zero duration causes produces an infinite timeout. #2499
Bug Fixes
VAST properly processes queries for fields with
skip
attribute. #2430VAST can now store data in segments bigger than 2GiB in size each. #2449
VAST can now store column indexes that are bigger than 2GiB. #2449
VAST no longer occasionally prints warnings about no longer available partitions when queries run concurrently to imports. #2500
Configuration options representing durations with an associated command-line option like
vast.connection-timeout
and--connection-timeout
were not picked up from configuration files or environment variables. This now works as expected. #2503Partitions now fail early when their stores fail to load from disk, detailing what went wrong in an error message. #2507
We changed the way
vast-cloud
is loading its cloud plugins to make it more explicit. This avoids inconsitent defaults assigned to variables when using core commands on specific plugins. #2510The
rebuild
command, automatic rebuilds, and compaction are now much faster, and match the performance of theimport
command for building indexes. #2515Fixed a race condition where the output of a partition transform could be reused before it was fully written to disk, for example when running
vast rebuild
. #2543
v2.2.0
Changes
Metrics for VAST's store lookups now use the keys
{active,passive}-store.lookup.{runtime,hits}
. The store type metadata field now distinguishes between the various supported store types, e.g.,parquet
,feather
, orsegment-store
, rather than containingactive
orpassive
. #2413The
summarize
pipeline operator is now a builtin; the previously bundledsummarize
plugin no longer exists. Aggregation functions in thesummarize
operator are now plugins, which makes them easily extensible. The syntax ofsummarize
now supports specification of output field names, similar to SQL'sAS
inSELECT f(x) AS name
. #2417The undocumented
count
pipeline operator no longer exists. #2417The
put
pipeline operator is now calledselect
, as we've abandoned plans to integrate the functionality ofreplace
into it. #2423The
replace
pipeline operator now supports multiple replacements in one configuration, which aligns the behavior with other operators. #2423Transforms are now called pipelines. In your configuration, replace
transform
withpipeline
in all keys. #2429An
init
command was added tovast-cloud
to help getting out of inconsistent Terraform states. #2435
Features
The new
flush
command causes VAST to decommission all currently active partitions, i.e., write all active partitions to disk immediately regardless of their size or the active partition timeout. This is particularly useful for testing, or when needing to guarantee in automated scripts that input is available for operations that only work on persisted passive partitions. Theflush
command returns only after all active partitions were flushed to disk. #2396The
summarize
operator supports three new aggregation functions:sample
takes the first value in every group,distinct
filters out duplicate values, andcount
yields the number of values. #2417The
drop
pipeline operator now drops entire schemas spcefied by name in theschemas
configuration key in addition to dropping fields by extractors in thefields
configuration key. #2419The new
extend
pipeline operator allows for adding new fields with fixed values to data. #2423The cloud execution commands (
run-lambda
andexecute-command
) now accept scripts from file-like handles. To improve the usability of this feature, the whole host file system is now mounted into the CLI container. #2446
Bug Fixes
VAST will export
real
values in JSON consistently with at least one decimal place. #2393VAST is now able to detect corrupt index files and will attempt to repair them on startup. #2431
The JSON export with
--omit-nulls
now correctly handles nested records whose first field isnull
instead of dropping them entirely. #2447We fixed a race condition when VAST crashed while applying a partition transform, leading to data duplication. #2465
The rebuild command no longer crashes on failure, and displays the encountered error instead. #2466
Missing arguments for the
--plugins
,--plugin-dirs
, and--schema-dirs
command line options no longer cause VAST to crash occasionally. #2470
v2.1.0
Changes
The
mdx-regenerate
tool is no longer part of VAST binary releases. #2260Partition transforms now always emit homogenous partitions, i.e., one schema per partition. This makes compaction and aging more efficient. #2277
VAST now requires Arrow >= v8.0.0. #2284
The
vast.store-backend
configuration option no longer supportsarchive
, and instead always uses the superiorsegment-store
instead. Events stored in the archive will continue to be available in queries. #2290The
vast.use-legacy-query-scheduler
option is now ignored because the legacy query scheduler has been removed. #2312VAST will from now on always format
time
andtimestamp
values with six decimal places (microsecond precision). The old behavior used a precision that depended on the actual value. This may require action for downstream tooling like metrics collectors that expect nanosecond granularity. #2380
Features
The
lsvast
tool can now print contents of individual.mdx
files. It now has an option to print raw Bloom filter contents of string and IP address synopses. #2260The
mdx-regenerate
tool was renamed tovast-regenerate
and can now also regenerate an index file from a list of partition UUIDs. #2260VAST now compresses data with Zstd. When persisting data to the segment store, the default configuration achieves over 2x space savings. When transferring data between client and server processes, compression reduces the amount of transferred data by up to 5x. This allowed us to increase the default partition size from 1,048,576 to 4,194,304 events, and the default number of events in a single batch from 1,024 to 65,536. The performance increase comes at the cost of a ~20% memory footprint increase at peak load. Use the option
vast.max-partition-size
to tune this space-time tradeoff. #2268VAST now produces additional metrics under the keys
ingest.events
,ingest.duration
andingest.rate
. Each of those gets issued once for every schema that VAST ingested during the measurement period. Use themetadata_schema
key to disambiguate the metrics. #2274A new parquet store plugin allows VAST to store its data as parquet files, increasing storage efficiency at the expense of higher deserialization costs. Storage requirements for the VAST database is reduced by another 15-20% compared to the existing segment store with Zstd compression enabled. CPU usage for suricata import is up ~ 10%, mostly related to the more expensive serialization. Deserialization (reading) of a partition is significantly more expensive, increasing CPU utilization by about 100%, and should be carefully considered and compared to the potential reduction in storage cost and I/O operations. #2284
The
status
command now supports filtering by component name. E.g.,vast status importer index
only shows the status of the importer and index components. #2288VAST emits the new metric
partition.events-written
when writing a partition to disk. The metric's value is the number of events written, and themetadata_schema
field contains the name of the partition's schema. #2302The new
rebuild
command rebuilds old partitions to take advantage of improvements in newer VAST versions. Rebuilding takes place in the VAST server in the background. This process merges partitions up to the configuredmax-partition-size
, turns VAST v1.x's heterogeneous into VAST v2.x's homogenous partitions, migrates all data to the currently configuredstore-backend
, and upgrades to the most recent internal batch encoding and indexes. #2321PyVAST now supports running client commands for VAST servers running in a container environment, if no local VAST binary is available. Specify the
container
keyword to customize this behavior. It defaults to{"runtime": "docker", "name": "vast"}
. #2334 @KaanSKThe
csv
import gained a new--seperator='x'
option that defaults to','
. Set it to'\t'
to import tab-separated values, or' '
to import space-separated values. #2336VAST now compresses on-disk indexes with Zstd, resulting in a 50-80% size reduction depending on the type of indexes used, and reducing the overall index size to below the raw data size. This improves retention spans significantly. For example, using the default configuration, the indexes for
suricata.ftp
events now use 75% less disk space, andsuricata.flow
30% less. #2346The index statistics in
vast status --detailed
now show the event distribution per schema as a percentage of the total number of events in addition to the per-schema number, e.g., forsuricata.flow
events under the keyindex.statistics.layouts.suricata.flow.percentage
. #2351The output
vast status --detailed
now shows metadata from all partitions under the key.catalog.partitions
. Additionally, the catalog emits metrics under the keycatalog.num-events
andcatalog.num-partitions
containing the number of events and partitions respectively. The metrics contain the schema name in the fieldmetadata_schema
and the (internal) partition version in the fieldmetadata_partition-version
. #2360 #2363The VAST Cloud CLI can now authenticate to the Tenzir private registry and download the vast-pro image (including plugins such as Matcher). The deployment script can now be configured to use a specific image and can thus be set to use vast-pro. #2415
Bug Fixes
VAST no longer crashes when importing
map
orpattern
data annotated with the#skip
attribute. #2286The command-line options
--plugins
,--plugin-dirs
, and--schema-dirs
now correctly overwrite their corresponding configuration options. #2289VAST no longer crashes when a query arrives at a newly created active partition in the time window between the partition creation and the first event arriving at the partition. #2295
Setting the environment variable
VAST_ENDPOINT
tohost:port
pair no longer fails on startup with a parse error. #2305VAST no longer hangs when it is shut down while still importing events. #2324
VAST now reads the default false-positive rate for sketches correctly. This broke accidentally with the v2.0 release. The option moved from
vast.catalog-fp-rate
tovast.index.default-fp-rate
. #2325The parser for
real
values now understands scientific notation, e.g.,1.23e+42
. #2332The
csv
import no longer crashes when the CSV file contains columns not present in the selected schema. Instead, it imports these columns as strings. #2336vast export csv
now renders enum columns in their string representation instead of their internal numerical representation. #2336The JSON import now treats
time
andduration
fields correctly for JSON strings containing a number, i.e., the JSON string"1654735756"
now behaves just like the JSON number1654735756
and for atime
field results in the value2022-06-09T00:49:16.000Z
. #2340VAST will no longer terminate when it can't write any more data to disk. Incoming data will still be accepted but discarded. We encourage all users to enable the disk-monitor or compaction features as a proper solution to this problem. #2376
VAST no longer ignores environment variables for plugin-specific options. E.g., the environment variable
VAST_PLUGINS__FOO__BAR
now correctly refers to thebar
option of thefoo
plugin, i.e.,plugins.foo.bar
. #2390We improved the mechanism to recover the database state after an unclean shutdown. #2394
v2.0.0
Breaking Changes
We removed the experimental
vast get
command. It relied on an internal unique event ID that was only exposed to the user in debug messages. This removal is a preparatory step towards a simplification of some of the internal workings of VAST. #2121The
meta-index
is now called thecatalog
. This affects multiple metrics and entries in the output ofvast status
, and the configuration optionvast.meta-index-fp-rate
, which is now calledvast.catalog-fp-rate
. #2128The command line option
--verbosity
has the new name--console-verbosity
. This synchronizes the CLI interface with the configuration file that solely understands the optionvast.console-verbosity
. #2178Multiple transform steps now have new names:
select
is now calledwhere
,delete
is now calleddrop
,project
is now calledput
, andaggregate
is now calledsummarize
. This breaking change is in preparation for an upcoming feature that improves the capability of VAST's query language. #2228The
layout-names
option of therename
transform step was renamedschemas
. The step now additonally supports renamingfields
. #2228
Changes
VAST ships experimental Terraform scripts to deploy on AWS Lambda and Fargate. #2108
We revised the query scheduling logic to exploit synergies when multiple queries run at the same time. In that vein, we updated the related metrics with more accurate names to reflect the new mechanism. The new keys
scheduler.partition.materializations
,scheduler.partition.scheduled
, andscheduler.partition.lookups
provide periodic counts of partitions loaded from disk and scheduled for lookup, and the overall number of queries issued to partitions, respectively. The keysquery.workers.idle
, andquery.workers.busy
were renamed toscheduler.partition.remaining-capacity
, andscheduler.partition.current-lookups
. Finally, the keyscheduler.partition.pending
counts the number of currently pending partitions. It is still possible to opt-out of the new scheduling algorithm with the (deprecated) option--use-legacy-query-scheduler
. #2117VAST now requires Apache Arrow >= v7.0.0. #2122
VAST's internal data model now completely preserves the nesting of the stored data when using the
arrow
encoding, and maps the pattern, address, subnet, and enumeration types onto Arrow extension types rather than using the underlying representation directly. This change enables use of theexport arrow
command without needing information about VAST's type system. #2159Transform steps that add or modify columns now transform the columns in-place rather than at the end, preserving the nesting structure of the original data. #2159
The deprecated
msgpack
encoding no longer exists. Data imported using themsgpack
encoding can still be accessed, but new data will always use thearrow
encoding. #2159Client commands such as
vast export
orvast status
now create less threads at runtime, reducing the risk of hitting system resource limits. #2193The
index
section in the status output no longer contains thecatalog
andcatalog-bytes
keys. The information is already present in the top-levelcatalog
section. #2233
Features
The new
vast.index
section in the configuration supports adjusting the false-positive rate of first-stage lookups for individual fields, allowing users to optimize the time/space trade-off for expensive queries. #2065VAST now creates one active partition per layout, rather than having a single active partition for all layouts. #2096
The new option
vast.active-partition-timeout
controls the time after which an active partition is flushed to disk. The timeout may hit before the partition size reachesvast.max-partition-size
, allowing for an additional temporal control for data freshness. The active partition timeout defaults to 1 hour. #2096The output of
vast status
now displays the total number of events stored under the keyindex.statistics.events.total
. #2133The disk monitor has new status entries
blacklist
andblacklist - size
containing information about partitions failed to be erased. #2160VAST has now complete support for passing environment variables as alternate path to configuration files. Environment variables have lower precedence than CLI arguments and higher precedence than config files. Variable names of the form
VAST_FOO__BAR_BAZ
map tovast.foo.bar-baz
, i.e.,__
is a record separator and_
translates to-
. This does not apply to the prefixVAST_
, which is considered the application identifier. Only variables with non-empty values are considered. #2162VAST v1.0 deprecated the experimental aging feature. Given popular demand we've decided to un-deprecate it, and to actually implement it on top of the same building blocks the compaction mechanism uses. This means that it is now fully working and no longer considered experimental. #2186
The
replace
transform step now allows for setting values of complex types, e.g., lists or records. #2228The
lsvast
tool now prints the whole store contents when given a store file as an argument. #2247
Bug Fixes
The
explore
command now properly terminates after the requested number of results are delivered. #2120The
count --estimate
erroneously materialized store files from disk, resulting in an unneeded performance penalty. VAST now answers approximate count queries by solely consulting the relevant index files. #2146The
import zeek
command now correctly marks the event timestamp using thetimestamp
type alias for all inferred schemas. #2155Some queries could get stuck when an importer would time out during the meta index lookup. This race condition no longer exists. #2167
We optimized the queue size of the logger for commands other than
vast start
. Client commands now show a significant reduction in memory usage and startup time. #2176The CSV parser no longer fails when encountering integers when floating point values were expected. #2184
The
vast(1)
man-page is no longer empty for VAST distributions with static binaries. #2190VAST servers no longer accept queries after initiating shutdown. This fixes a potential infinite hang if new queries were coming in faster than VAST was able to process them. #2215
VAST no longer sometimes crashes when aging or compaction erase whole partitions. #2227
Environment variables for options that specify lists now consistently use comma-separators and respect escaping with backslashes. #2236
The JSON import no longer rejects non-string selector fields. Instead, it always uses the textual JSON representation as a selector. E.g., the JSON object
{id:1,...}
imported viavast import json --selector=id:mymodule
now matches the schema namedmymodule.1
rather than erroring because theid
field is not a string. #2255Transform steps removing all nested fields from a record leaving only empty nested records no longer cause VAST to crash. #2258
The query optimizer incorrectly transformed queries with conjunctions or disjunctions with several operands testing against the same string value, leading to missing result. This was rarely an issue in practice before the introduction of homogenous partitions with the v2.0 release. #2264
v1.1.2
Bug Fixes
- Terminating or timing out exports during the catalog lookup no longer causes query workers to become stuck indefinitely. #2165
v1.1.1
Bug Fixes
The disk monitor now correctly continues deleting until below the low water mark after a partition failed to delete. #2160
We fixed a rarely occurring race condition caused query workers to become stuck after delivering all results until the corresponding client process terminated. #2160
Queries that timed out or were externally terminated while in the query backlog and with more than five unhandled candidate partitions no longer permanently get stuck. #2160
v1.1.0
Changes
VAST no longer attempts to intepret query expressions as Sigma rules automatically. Instead, this functionality moved to a dedicated
sigma
query language plugin that must explicitly be enabled at build time. #2074The
msgpack
encoding option is now deprecated. VAST issues a warning on startup and automatically uses thearrow
encoding instead. A future version of VAST will remove this option entirely. #2087The experimental aging feature is now deprecated. The compaction plugin offers a superset of the aging functionality. #2087
Actor names in log messages now have an
-ID
suffix to make it easier to tell multiple instances of the same actor apart, e.g.,exporter-42
. #2119We fixed an issue where partition transforms that erase complete partitions trigger an internal assertion failure. #2123
Features
The built-in
select
andproject
transform steps now correctly handle dropping all rows and columns respectively, effectively deleting the input data. #2064 #2082VAST has a new query language plugin type that allows for adding additional query language frontends. The plugin performs one function: compile user input into a VAST expression. The new
sigma
plugin demonstrates usage of this plugin type. #2074The new built-in
rename
transform step allows for renaming event types during a transformation. This is useful when you want to ensure that a repeatedly triggered transformation does not affect already transformed events. #2076The new
aggregate
transform plugin allows for flexibly grouping and aggregating events. We recommend using it alongside thecompaction
plugin, e.g., for rolling up events into a more space-efficient representation after a certain amount of time. #2076
Bug Fixes
A performance bug in the first stage of query evaluation caused VAST to return too many candidate partitions when querying for a field suffix. For example, a query for the
ts
field commonly used in Zeek logs also included partitions fornetflow.pkts
fromsuricata.netflow
events. This bug no longer exists, resulting in a considerable speedup of affected queries. #2086VAST does not lose query capacity when backlogged queries are cancelled any more. #2092
VAST now correctly adjusts the index statistics when applying partition transforms. #2097
We fixed a bug that potentially resulted in the wrong subset of partitions to be considered during query evaluation. #2103
v1.0.0
Changes
Building VAST now requires Arrow >= 6.0. #2033
VAST no longer uses calendar-based versioning. Instead, it uses a semantic versioning scheme. A new VERSIONING.md document installed alongside VAST explores the semantics in-depth. #2035
Plugins now have a separate version. The build scaffolding installs README.md and CHANGELOG.md files in the plugin source tree root automatically. #2035
Features
VAST has a new transform step:
project
, which keeps the fields with configured key suffixes and removes the rest from the input. At the same time, thedelete
transform step can remove not only one but multiple fields from the input based on the configured key suffixes. #2000The new
--omit-nulls
option to thevast export json
command causes VAST to skip over fields in JSON objects whose value isnull
when rendering them. #2004VAST has a new transform step:
select
, which keeps rows matching the configured expression and removes the rest from the input. #2014The
#import_time
meta extractor allows for querying events based on the time they arrived at the VAST server process. It may only be used for comparisons with time value literals, e.g.,vast export json '#import_time > 1 hour ago'
exports all events that were imported within the last hour as NDJSON. #2019
Bug Fixes
The index now emits the metrics
query.backlog.{low,normal}
andquery.workers.{idle,busy}
reliably. #2032VAST no longer ignores the
--schema-dirs
option when using--bare-mode
. #2046Starting VAST no longer fails if creating the database directory requires creating intermediate directories. #2046
2021.12.16
Changes
- VAST's internal type system has a new on-disk data representation. While we still support reading older databases, reverting to an older version of VAST will not be possible after this change. Alongside this change, we've implemented numerous fixes and streamlined handling of field name lookups, which now more consistently handles the dot-separator. E.g., the query
#field == "ip"
still matches the fieldsource.ip
, but no longer the fieldsource_ip
. The change is also performance-relevant in the long-term: For data persisted from previous versions of VAST we convert to the new type system on the fly, and for newly ingested data we now have near zero-cost deserialization for types, which should result in an overall speedup once the old data is rotated out by the disk monitor. #1888
Features
All metrics events now contain the version of VAST. Additionally, VAST now emits startup and shutdown metrics at the start and stop of the VAST server. #1973
JSON field selectors are now configurable instead of being hard-coded for Suricata Eve JSON and Zeek Streaming JSON. E.g.,
vast import json --selector=event_type:suricata
is now equivalent tovast import suricata
. This allows for easier integration of JSONL data containing a field that indicates its type. #1974Metrics events now optionally contain a metadata field that is a key-value mapping of string to string, allowing for finer-grained introspection. For now this enables correlation of metrics events and individual queries. A set of new metrics for query lookup use this feature to include the query ID. #1987 #1992
Bug Fixes
- The field-based default selector of the JSON import now correctly matches types with nested record types. #1988
2021.11.18
Changes
The
max-queries
configuration option now works at a coarser granularity. It used to limit the number of queries that could simultaneously retrieve data, but it now sets the number of queries that can be processed at the same time. #1896VAST no longer vendors xxHash, which is now a regular required dependency. Internally, VAST switched its default hash function to XXH3, providing a speedup of up to 3x. #1905
Building VAST from source now requires CMake 3.18+. #1914
A recently added features allows for exporting everything when no query is provided. We've restricted this to prefer reading a query from stdin if available. Additionally, conflicting ways to read the query now trigger errors. #1917
Features
A new 'apply' handler in the index gives plugin authors the ability to apply transforms over entire partitions. Previously, transforms were limited to streams of table slice during import or export. #1887
The export command now has a
--low-priority
option to reduce the priority of the request while query backlogs are being worked down. #1929 #1947The keys
query.backlog.normal
andquery.backlog.low
have been added to the metrics output. The values indicate the number of quries that are currently in the backlog. #1942
Bug Fixes
The timeout duration to delete partitions has been increased to one minute, reducing the frequency of warnings for hitting this timeout significantly. #1897
When reading IPv6 addresses from PCAP data, only the first 4 bytes have been considered. VAST now stores all 16 bytes. #1905
Store files now get deleted correctly if the database directory differs from the working directory. #1912
Debug builds of VAST no longer segfault on a status request with the
--debug
option. #1915The
suricata.dns
schema has been updated to match the currently used EVE-JSON structure output by recent Suricata versions. #1919VAST no longer tries to create indexes for fields of type
list<record{...}>
as that wasn't supported in the first place. #1933Static plugins are no longer always loaded, but rather need to be explicitly enabled as documented. To restore the behavior from before this bug fix, set
vast.plugins: [bundled]
in your configuration file. #1959
2021.09.30
Changes
The default store backend now is
segment-store
in order to enable the use of partition transforms in the future. To continue using the (now deprecated) legacy store backend, setvast.store-backend
to archive. #1876Example configuration files are now installed to the datarootdir as opposed to the sysconfdir in order to avoid overriding previously installed configuration files. #1880
Features
If present in the plugin source directory, the build scaffolding now automatically installs
<plugin>.yaml.example
files, commenting out every line so the file has no effect. This serves as documentation for operators that can modify the installed file in-place. #1860The
broker
plugin is now a also writer plugin on top of being already a reader plugin. The new plugin enables exporting query results directly into a a Zeek process, e.g., to write Zeek scripts that incorporate context from the past. Runvast export broker <expr>
to ship events via Broker that Zeek dispatches under the eventVAST::data(layout: string, data: any)
. #1863The new tool
mdx-regenerate
allows operators to re-create all.mdx
files in a database directory to the latest file format version while VAST is running. This is useful for advanced users in preparation for version upgrades that bump the format version. #1866Running
vat status --detailed
now lists all loaded configuration files undersystem.config-files
. #1871The query argument to the export and count commands may now be omitted, which causes the commands to operate on all data. Note that this may be a very expensive operation, so use with caution. #1879
The output of
vast status --detailed
now contains information about queries that are currently processed in the index. #1881
Bug Fixes
The status command no longer occasionally contains garbage keys when the VAST server is under high load. #1872
Remote sources and sinks are no longer erroneously included in the output of VAST status. #1873
The index now correctly cancels pending queries when the requester dies. #1884
Import filter expressions now work correctly with queries using field extractors, e.g.,
vast import suricata 'event_type == "alert"' < path/to/eve.json
. #1885Expression predicates of the
#field
type now produce error messages instead of empty result sets for operations that are not supported. #1886The disk monitor no longer fails to delete segments of particularly busy partitions with the
segment-store
store backend. #1892
2021.08.26
Changes
VAST no longer strips link-layer framing when ingesting PCAPs. The stored payload is the raw PCAP packet. Similarly,
vast export pcap
now includes a Ethernet link-layer framing, per libpcap'sDLT_EN10MB
link type. #1797Strings in error or warning log messages are no longer escaped, greatly improving readability of messages containing nested error contexts. #1842
VAST now supports building against {fmt} 8 and spdlog 1.9.2, and now requires at least {fmt} 7.1.3. #1846
VAST now ships with an updated schema type for the
suricata.dhcp
event, covering all fields of the extended output. #1854
Features
The
segment-store
store backend works correctly withvast get
andvast explore
. #1805VAST can now process Eve JSON events of type
suricata.packet
that Suricata emits when the config optiontagged-packets
is set and a rule tags a packet using, e.g.,tag:session,5,packets;
. #1819 #1833
Bug Fixes
Previously missing fields of suricata event types are now part of the concept definitions of
net.src.ip
,net.src.port
,net.dst.ip
,net.dst.port
,net.app
,net.proto
,net.community_id
,net.vlan
, andnet.packets
. #1798Invalid segment files will no longer crash VAST at startup. #1820
Plugins in the prebuilt Docker images no longer show
unspecified
as their version. #1828The configuration options
vast.metrics.{file,uds}-sink.path
now correctly specify paths relative to the database directory of VAST, rather than the current working directory of the VAST server. #1848The
segment-store
store backend and built-in transform steps (hash
,replace
, anddelete
) now function correctly in static VAST binaries. #1850The output of VAST status now includes status information for sources and sinks spawned in the VAST node, i.e., via
vast spawn source|sink <format>
rather thanvast import|export <format>
. #1852In order to align with the GNU Coding Standards, the static binary (and other relocatable binaries) now uses
/etc
as sysconfdir for installations to/usr/bin/vast
. #1856VAST now only switches to journald style logging by default when it is actually supported. #1857
The CSV parser now correctly parses quoted fields in non-string types. E.g.,
"127.0.0.1"
in CSV now successfully parsers when a matching schema contains anaddress
type field. #1858The memory counts in the output of
vast status
now represent bytes consistently, as opposed to a mix of bytes and kilobytes. #1862
2021.07.29
Changes
VAST no longer officially supports Debian Buster with GCC-8. In CI, VAST now runs on Debian Bullseye with GCC-10. The provided Docker images now use
debian:bullseye-slim
as base image. Users that require Debian Buster support should use the provided static builds instead. #1765From now on VAST is compiled with the C++20 language standard. Minimum compiler versions have increased to GCC 10, Clang 11, and AppleClang 12.0.5. #1768
The
vast
binaries in our prebuilt Docker images no longer contain AVX instructions for increased portability. Building the image locally continues to add supported auto-vectorization flags automatically. #1778The following new build options exist:
VAST_ENABLE_AUTO_VECTORIZATION
enables/disables all auto-vectorization flags, andVAST_ENABLE_SSE_INSTRUCTIONS
enables-msse
; similar options exist for SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, and AVX2. #1778
Features
VAST has new a
store_plugin
type for custom store backends that hold the raw data of a partition. The new settingvast.store-backend
controls the selection of the store implementation, which has a default value issegment-store
. This is still an opt-in feature: unless the configuration value is set, VAST defaults to the old implementation. #1720 #1762 #1802VAST now supports import filter expressions. They act as the dual to export query expressions:
vast import suricata '#type == "suricata.alert"' < eve.json
will import onlysuricata.alert
events, discarding all other events. #1742VAST now comes with a
tenzir/vast-dev
Docker image in addition to the regulartenzir/vast
. Thevast-dev
image targets development contexts, e.g., when building additional plugins. The image contains all build-time dependencies of VAST and runs asroot
rather than thevast
user. #1749lsvast
now prints extended information for hash indexes. #1755The new Broker plugin enables seamless log ingestion from Zeek to VAST via a TCP socket. Broker is Zeek's messaging library and the plugin turns VAST into a Zeek logger node. Use
vast import broker
to establish a connection to a Zeek node and acquire logs. #1758Plugin versions are now unique to facilitate debugging. They consist of three optional parts: (1) the CMake project version of the plugin, (2) the Git revision of the last commit that touched the plugin, and (3) a
dirty
suffix for uncommited changes to the plugin. Plugin developers no longer need to specify the version manually in the plugin entrypoint. #1764VAST now supports the arm64 architecture. #1773
Installing VAST now includes a
vast.yaml.example
configuration file listing all available options. #1777VAST now exports per-layout import metrics under the key
<reader>.events.<layout-name>
in addition to the regular<reader>.events
. This makes it easier to understand the event type distribution. #1781The static binary now bundles the Broker plugin. #1789
Bug Fixes
Configuring VAST to use CAF's built-in OpenSSL module via the
caf.openssl.*
options now works again as expected. #1740The the
status
command now prints information about input and output transformations. #1748A
[*** LOG ERROR #0001 ***]
error message on startup under Linux no longer occurs. #1754Queries against fields using a
#index=hash
attribute could have missed some results. Fixing a bug in the offset calculation during bitmap processing resolved the issue. #1755A regression caused VAST's plugins to be loaded in random order, which printed a warning about mismatching plugins between client and server. The order is now deterministic. #1756
VAST does not abort JSON imports anymore when encountering something other than a JSON object, e.g., a number or a string. Instead, VAST skips the offending line. #1759
Import processes now respond quicker. Shutdown requests are no longer delayed when the server process has busy imports, and metrics reports are now written in a timely manner. #1771
Particularly busy imports caused the shutdown of the server process to hang, if import processes were still running or had not yet flushed all data. The server now shuts down correctly in these cases. #1771
The static binary no longer behaves differently than the regular build with regards to its configuration directories: system-wide configuration files now reside in
<prefix>/etc/vast/vast.yaml
rather than/etc/vast/vast.yaml
. #1777The
VAST_ENABLE_JOURNALD_LOGGING
CMake option is no longer ignored. #1780Plugins built against an external libvast no longer require the
CMAKE_INSTALL_LIBDIR
to be specified as a path relative to the configuredCMAKE_INSTALL_PREFIX
. This fixes an issue with plugins in separate packages for some package managers, e.g., Nix. #1786The official Docker image and static binary distribution of VAST now produce the correct version output for plugins from the
vast version
command. #1799The disk budget feature no longer triggers a rare segfault while deleting partitions. #1804 #1809
2021.06.24
Breaking Changes
Apache Arrow is now a required dependency. The previously deprecated build option
-DVAST_ENABLE_ARROW=OFF
no longer exists. #1683VAST no longer loads static plugins by default. Generally, VAST now treats static plugins and bundled dynamic plugins equally, allowing users to enable or disable static plugins as needed for their deployments. #1703
Changes
The VAST community chat moved from Gitter to Slack. Join us in the
#vast
channel for vibrant discussions. #1696The tenzir/vast Docker image bundles the PCAP plugin. #1705
VAST merges lists from configuration files. E.g., running VAST with
--plugins=some-plugin
andvast.plugins: [other-plugin]
in the configuration now results in bothsome-plugin
andother-plugin
being loaded (sorted by the usual precedence), instead of justsome-plugin
. #1721 #1734
Features
The new option
vast.start.commands
allows for specifying an ordered list of VAST commands that run after successful startup. The effect is the same as first starting a node, and then using another VAST client to issue commands. This is useful for commands that have side effects that cannot be expressed through the config file, e.g., starting a source inside the VAST server that listens on a socket or reads packets from a network interface. #1699The options
vast.plugins
andvast.plugin-dirs
may now be specified on the command line as well as the configuration. Use the options--plugins
and--plugin-dirs
respectively. #1703Add the reserved plugin name
bundled
tovast.plugins
to enable load all bundled plugins, i.e., static or dynamic plugins built alongside VAST, or use--plugins=bundled
on the command line. The reserved plugin nameall
causes all bundled and external plugins to be loaded, i.e., all shared libraries matchinglibvast-plugin-*
from the configuredvast.plugin-dirs
. #1703It's now possible to configure the VAST endpoint as an environment variable by setting
VAST_ENDPOINT
. This has higher precedence than settingvast.endpoint
in configuration files, but lower precedence than passing--endpoint=
on the command-line. #1714Plugins load their respective configuration from
<configdir>/vast/plugin/<plugin-name>.yaml
in addition to the regular configuration file at<configdir>/vast/vast.yaml
. The new plugin-specific file does not require putting configuration under the keyplugins.<plugin-name>
. This allows for deploying plugins without needing to touch the<configdir>/vast/vast.yaml
configuration file. #1724
Bug Fixes
VAST no longer crashes when querying for string fields with non-string values. Instead, an error message warns the user about an invalid query. #1685
Building plugins against an installed VAST no longer requires manually specifying
-DBUILD_SHARED_LIBS=ON
. The option is now correctly enabled by default for external plugins. #1697The UDS metrics sink continues to send data when the receiving socket is recreated. #1702
The
vast.log-rotation-threshold
option was silently ignored, causing VAST to always use the default log rotation threshold of 10 MiB. The option works as expected now. #1709Additional tags for the tenzir/vast Docker image for the release versions exist, e.g.,
tenzir/vast:2021.05.27
. #1711The
import csv
command handles quoted fields correctly. Previously, the quotes were part of the parsed value, and field separators in quoted strings caused the parser to fail. #1712Import processes no longer hang on receiving SIGINT or SIGKILL. Instead, they shut down properly after flushing yet to be processed data. #1718
2021.05.27
Breaking Changes
Schemas are no longer implicitly shared between sources, i.e., an
import
process importing data with a custom schema will no longer affect other sources started at a later point in time. Schemas known to the VAST server process are still available to allimport
processes. We do not expect this change to have a real-world impact, but it could break setups where some sources have been installed on hosts without their own schema files, the VAST server did not have up-to-date schema files, and other sources were (ab)used to provide the latest type information. #1656The
configure
script was removed. This was a custom script that mimicked the functionality of an autotools-basedconfigure
script by writing directly to the cmake cache. Instead, users now must use thecmake
and/orccmake
binaries directly to configure VAST. #1657
Changes
- Building VAST without Apache Arrow via
-DVAST_ENABLE_ARROW=OFF
is now deprecated, and support for the option will be removed in a future release. As the Arrow ecosystem and libraries matured, we feel confident in making it a required dependency and plan to build upon it more in the future. #1682
Features
The new transforms feature allows VAST to apply transformations to incoming and outgoing data. A transform consists of a sequence of steps that execute sequentially, e.g., to remove, overwrite, hash, encrypt data. A new plugin type makes it easy to write custom transforms. #1517 #1656
Plugin schemas are now installed to
<datadir>/vast/plugin/<plugin>/schema
, while VAST's built-in schemas reside in<datadir>/vast/schema
. The load order guarantees that plugins are able to reliably override the schemas bundled with VAST. #1608The new option
vast export --timeout=<duration>
allows for setting a timeout for VAST queries. Cancelled exports result in a non-zero exit code. #1611To enable easier post-processing, the new option
vast.export.json.numeric-durations
switches JSON output ofduration
types from human-readable strings (e.g.,"4.2m"
) to numeric (e.g.,252.15
) in fractional seconds. #1628The
status
command now prints the VAST server version information under theversion
key. #1652The new setting
vast.disk-monitor-step-size
enables the disk monitor to remove N partitions at once before re-checking if the new size of the database directory is now small enough. This is useful when checking the size of a directory is an expensive operation itself, e.g., on compressed filesystems. #1655
Bug Fixes
VAST now correctly refuses to run when loaded plugins fail their initialization, i.e., are in a state that cannot be reasoned about. #1618
A recent change caused imports over UDP not to forward its events to the VAST server process. Running
vast import -l :<port>/udp <format>
now works as expected again. #1622Non-relocatable VAST binaries no longer look for configuration, schemas, and plugins in directories relative to the binary location. Vice versa, relocatable VAST binaries no longer look for configuration, schemas, and plugins in their original install directory, and instead always use paths relative to their binary location. On macOS, we now always build relocatable binaries. Relocatable binaries now work correctly on systems where the libary install directory is
lib64
instead oflib
. #1624VAST no longer erroneously skips the version mismatch detection between client and server. The check now additionally compares running plugins. #1652
Executing VAST's unit test suite in parallel no longer fails. #1659
VAST and transform plugins now build without Arrow support again. #1673
The
delete
transform step correctly deletes fields from the layout when running VAST with Arrow disabled. #1673VAST no longer erroneously warns about a version mismatch between client and server when their plugin load order differs. #1679
2021.04.29
Breaking Changes
The previously deprecated (#1409) option
vast.no-default-schema
no longer exists. #1507Plugins configured via
vast.plugins
in the configuration file can now be specified using either the plugin name or the full path to the shared plugin library. We no longer allow omitting the extension from specified plugin files, and recommend using the plugin name as a more portable solution, e.g.,example
overlibexample
and/path/to/libexample.so
over/path/to/libexample
. #1527The previously deprecated usage (#1354) of format-independent options after the format in commands is now no longer possible. This affects the options
listen
,read
,schema
,schema-file
,type
, anduds
for import commands and thewrite
anduds
options for export commands. #1529Plugins must define a separate entrypoint in their build scaffolding using the argument
ENTRYPOINT
to the CMake functionVASTRegisterPlugin
. If only a single value is given to the argumentSOURCES
, it is interpreted as theENTRYPOINT
automatically. #1549To avoid confusion between the PCAP plugin and libpcap, which both have a library file named
libpcap.so
, we now generally prefix the plugin library output names withvast-plugin-
. E.g., The PCAP plugin library file is now namedlibvast-plugin-pcap.so
. Plugins specified with a full path in the configuration undervast.plugins
must be adapted accordingly. #1593
Changes
The metrics for Suricata Eve JSON and Zeek Streaming JSON imports are now under the categories
suricata-reader
andzeek-reader
respectively so they can be distinguished from the regular JSON import, which is still underjson-reader
. #1498VAST now ships with a schema record type for Suricata's
rfb
event type. #1499 @sattaWe upstreamed the Debian patches provided by @satta. VAST now prefers an installed
tsl-robin-map>=0.6.2
to the bundled one unless configured with--with-bundled-robin-map
, and we provide a manpage forlsvast
ifpandoc
is installed. #1515The Suricata
dns
schema type now defines thedns.grouped.A
field containing a list of all returned addresses. #1531The status output of Analyzer Plugins moved from the
importer.analyzers
key into the top-level record. #1544The new option
--disable-default-config-dirs
disables the loading of user and system configuration, schema, and plugin directories. We use this option internally when running integration tests. #1557Building VAST now requires CMake >= 3.15. #1559
The VAST community chat moved from Element to Gitter. Join us at gitter.im/tenzir/vast or via Matrix at
#tenzir_vast:gitter.im
. #1591
Features
The disk monitor gained a new
vast.start.disk-budget-check-binary
option that can be used to specify an external binary to determine the size of the database directory. This can be useful in cases wherestat()
does not give the correct answer, e.g. on compressed filesystems. #1453The
VAST_PLUGIN_DIRS
andVAST_SCHEMA_DIRS
environment variables allow for setting additional plugin and schema directories separated with:
with higher precedence than other plugin and schema directories. #1532 #1541It is now possible to build plugins against an installed VAST. This requires a slight adaptation to every plugin's build scaffolding. The example plugin was updated accordingly. #1532
Component Plugins are a new category of plugins that execute code within the VAST server process. Analyzer Plugins are now a specialization of Component Plugins, and their API remains unchanged. #1544 #1547 #1588
Reader Plugins and Writer Plugins are a new family of plugins that add import/export formats. The previously optional PCAP format moved into a dedicated plugin. Configure with
--with-pcap-plugin
and addpcap
tovast.plugins
to enable the PCAP plugin. #1549
Bug Fixes
VAST no longer erroneously tries to load explicitly specified plugins dynamically that are linked statically. #1528
Custom commands from plugins ending in
start
no longer try to write to the server instead of the client log file. #1530Linking against an installed VAST via CMake now correctly resolves VAST's dependencies. #1532
VAST no longer refuses to start when any of the configuration file directories is unreadable, e.g., because VAST is running in a sandbox. #1533
The CSV reader no longer crashes when encountering nested type aliases. #1534
The command-line parser no longer crashes when encountering a flag with missing value in the last position of a command invocation. #1536
A bug in the parsing of ISO8601 formatted dates that incorrectly adjusted the time to the UTC timezone has been fixed. #1537
Plugin unit tests now correctly load and initialize their respective plugins. #1549
The shutdown logic contained a bug that would make the node fail to terminate in case a plugin actor is registered at said node. #1563
A race condition in the shutdown logic that caused an assertion was fixed. #1563
VAST now correctly builds within shallow clones of the repository. If the build system is unable to determine the correct version from
git-describe
, it now always falls back to the version of the last release. #1570We fixed a regression that made it impossible to build static binaries from outside of the repository root directory. #1573
The
VASTRegisterPlugin
CMake function now correctly removes theENTRYPOINT
from the givenSOURCES
, allowing for plugin developers to easily glob for sources again. #1573The
exporter.selectivity
metric is now 1.0 instead of NaN for idle periods. #1574VAST no longer renders JSON numbers with non-finite numbers as
NaN
,-NaN
,inf
, or-inf
, resulting in invalid JSON output. Instead, such numbers are now rendered asnull
. #1574Specifying relative
CMAKE_INSTALL_*DIR
in the build configuration no longer causes VAST not to pick up system-wide installed configuration files, schemas, and plugins. The configured install prefix is now used correctly. The defunctVAST_SYSCONFDIR
,VAST_DATADIR
, andVAST_LIBDIR
CMake options no longer exist. Use a combination ofCMAKE_INSTALL_PREFIX
andCMAKE_INSTALL_*DIR
instead. #1580Spaces before SI prefixes in command line arguments and configuration options are now generally ignored, e.g., it is now possible to set the disk monitor budgets to
2 GiB
rather than2GiB
. #1590
2021.03.25
Breaking Changes
The previously deprecated
#timestamp
extractor has been removed from the query language entirely. Use:timestamp
instead. #1399Plugins can now be linked statically against VAST. A new
VASTRegisterPlugin
CMake function enables easy setup of the build scaffolding required for plugins. Configure with--with-static-plugins
or build a static binary to link all plugins built alongside VAST statically. All plugin build scaffoldings must be adapted, older plugins do no longer work. #1445 #1452
Changes
The default size of table slices (event batches) that is created from
vast import
processes has been changed from 1,000 to 1,024. #1396VAST now ships with schema record types for Suricata's
mqtt
andanomaly
event types. #1408 @sattaThe option
vast.no-default-schema
is deprecated, as it is no longer needed to override types from bundled schemas. #1409Query latency for expressions that contain concept names has improved substantially. For DB sizes in the TB region, and with a large variety of event types, queries with a high selectivity experience speedups of up to 5x. #1433
The zeek-to-vast utility was moved to the tenzir/zeek-vast repository. All options related to zeek-to-vast and the bundled Broker submodule were removed. #1435
The type extractor in the expression language now works with type aliases. For example, given the type definition for port from the base schema
type port = count
, a search for:count
will also consider fields of typeport
. #1446
Features
The schema language now supports 4 operations on record types:
+
combines the fields of 2 records into a new record.<+
and+>
are variations of+
that give precedence to the left and right operand respectively.-
creates a record with the field specified as its right operand removed. #1407 #1487 #1490VAST now supports nested records in Arrow table slices and in the JSON import, e.g., data of type
list<record<name: string, age: count>
. While nested record fields are not yet queryable, ingesting such data will no longer cause VAST to crash. MessagePack table slices don't support records in lists yet. #1429
Bug Fixes
Some non-null pointers were incorrectly rendered as
*nullptr
in log messages. #1430Data that was ingested before the deprecation of the
#timestamp
attribute wasn't exported correctly with newer versions. This is now corrected. #1432The JSON parser now accepts data with numerical or boolean values in fields that expect strings according to the schema. VAST converts these values into string representations. #1439
A query for a field or field name suffix that matches multiple fields of different types would erroneously return no results. #1447
The disk monitor now correctly erases partition synopses from the meta index. #1450
The archive, index, source, and sink components now report metrics when idle instead of omitting them entirely. This allows for distinguishing between idle and not running components from the metrics. #1451
VAST no longer crashes when the disk monitor tries to calculate the size of the database while files are being deleted. Instead, it will retry after the configured scan interval. #1458
Insufficient permissions for one of the paths in the
schema-dirs
option would lead to a crash invast start
. #1472A race condition during server shutdown could lead to an invariant violation, resulting in a firing assertion. Streamlining the shutdown logic resolved the issue. #1473 #1485
Enabling the disk budget feature no longer prevents the server process from exiting after it was stopped. #1495
2021.02.24
Breaking Changes
VAST switched to spdlog >= 1.5.0 for logging. For users, this means: The
vast.console-format
andvast.file-format
now must be specified using the spdlog pattern syntax as described here. All settings undercaf.logger.*
are now ignored by VAST, and only thevast.*
counterparts are used for logger configuration. #1223 #1328 #1334 #1390 @a4zVAST now requires {fmt} >= 5.2.1 to be installed. #1330
All options in
vast.metrics.*
had underscores in their names replaced with dashes to align with other options. For example,vast.metrics.file_sink
is nowvast.metrics.file-sink
. The old options no longer work. #1368User-supplied schema files are now picked up from
<SYSCONFDIR>/vast/schema
and<XDG_CONFIG_HOME>/vast/schema
instead of<XDG_DATA_HOME>/vast/schema
. #1372The previously deprecated options
vast.spawn.importer.ids
andvast.schema-paths
no longer work. Furthermore, queries spread over multiple arguments are now disallowed instead of triggering a deprecation warning. #1374The special meaning of the
#timestamp
attribute has been removed from the schema language. Timestamps can from now on be marked as such by using thetimestamp
type instead. Queries of the form#timestamp <op> value
remain operational but are deprecated in favor of:timestamp
. Note that this change also affects:time
queries, which aren't supersets of#timestamp
queries any longer. #1388
Changes
Schema parsing now uses a 2-pass loading phase so that type aliases can reference other types that are later defined in the same directory. Additionally, type definitions from already parsed schema dirs can be referenced from schema types that are parsed later. Types can also be redefined in later directories, but a type can not be defined twice in the same directory. #1331
The
infer
command has an improved heuristic for the number typesint
,count
, andreal
. #1343 #1356 @ngrodzitskiThe options
listen
,read
,schema
,schema-file
,type
, anduds
can from now on be supplied to theimport
command directly. Similarly, the optionswrite
anduds
can be supplied to theexport
command. All options can still be used after the format subcommand, but that usage is deprecated. #1354The query normalizer interprets value predicates of type
subnet
more broadly: given a subnetS
, the parser expands this to the expression:subnet == S || :addr in S
. This change makes it easier to search for IP addresses belonging to a specific subnet. #1373The output of
vast help
andvast documentation
now goes to stdout instead of to stderr. Erroneous invocations ofvast
also print the helptext, but in this case the output still goes to stderr to avoid interference with downstream tooling. #1385
Experimental Features
- Sigma rules are now a valid format to represent query expression. VAST parses the
detection
attribute of a rule and translates it into a native query expression. To run a query using a Sigma rule, pass it on standard input, e.g.,vast export json < rule.yaml
. #1379
Features
VAST rotates server logs by default. The new config options
vast.disable-log-rotation
andvast.log-rotation-threshold
can be used to control this behaviour. #1223 #1362The meta index now stores partition synopses in separate files. This will decrease restart times for systems with large databases, slow disks and aggressive
readahead
settings. A new config settingvast.meta-index-dir
allows storing the meta index information in a separate directory. #1330 #1376The JSON import now always relies upon simdjson. The previously experimental
--simdjson
option to thevast import json|suricata|zeek-json
commands no longer exist as the feature is considered stable. #1343 #1356 @ngrodzitskiThe new options
vast.metrics.file-sink.real-time
andvast.metrics.uds-sink.real-time
enable real-time metrics reporting for the file sink and UDS sink respectively. #1368The type extractor in the expression language now works with user defined types. For example the type
port
is defined astype port = count
in the base schema. This type can now be queried with an expression like:port == 80
. #1382
Bug Fixes
An ordering issue introduced in #1295 that could lead to a segfault with long-running queries was reverted. #1381
A bug in the new simdjson based JSON reader introduced in #1356 could trigger an assertion in the
vast import
process if an input field could not be converted to the field type in the target layout. This is no longer the case. #1386
2021.01.28
Breaking Changes
The new short options
-v
,-vv
,-vvv
,-q
,-qq
, and-qqq
map onto the existing verbosity levels. The existing short syntax, e.g.,-v debug
, no longer works. #1244The GitHub CI changed to Debian Buster and produces Debian artifacts instead of Ubuntu artifacts. Similarly, the Docker images we provide on Docker Hub use Debian Buster as base image. To build Docker images locally, users must set
DOCKER_BUILDKIT=1
in the build environment. #1294
Changes
VAST preserves nested JSON objects in events instead of formatting them in a flattened form when exporting data with
vast export json
. The old behavior can be enabled withvast export json --flatten
. #1257 #1289vast start
prints the endpoint it is listening on when providing the option--print-endpoint
. #1271The option
vast.schema-paths
is renamed tovast.schema-dirs
. The old option is deprecated and will be removed in a future release. #1287
Experimental Features
VAST features a new plugin framework to support efficient customization points at various places of the data processing pipeline. There exist several base classes that define an interface, e.g., for adding new commands or spawning a new actor that processes the incoming stream of data. The directory
examples/plugins/example
contains an example plugin. #1208 #1264 #1275 #1282 #1285 #1287 #1302 #1307 #1316VAST relies on simdjson for JSON parsing. The substantial gains in throughput shift the bottleneck of the ingest path from parsing input to indexing at the node. To use the (yet experimental) feature, use
vast import json|suricata|zeek-json --simdjson
. #1230 #1246 #1281 #1314 #1315 @ngrodzitski
Features
The new
import zeek-json
command allows for importing line-delimited Zeek JSON logs as produced by the json-streaming-logs package. Unlike stock Zeek JSON logs, where one file contains exactly one log type, the streaming format contains different log event types in a single stream and uses an additional_path
field to disambiguate the log type. For stock Zeek JSON logs, use the existingimport json
with the-t
flag to specify the log type. #1259VAST queries also accept
nanoseconds
,microseconds
,milliseconds
seconds
andminutes
as units for a duration. #1265The output of
vast status
contains detailed memory usage information about active and cached partitions. #1297VAST installations bundle a LICENSE.3rdparty file alongside the regular LICENSE file that lists all embedded code that is under a separate license. #1306
Bug Fixes
Invalid Arrow table slices read from disk no longer trigger a segmentation fault. Instead, the invalid on-disk state is ignored. #1247
Manually specified configuration files may reside in the default location directories. Configuration files can be symlinked. #1248
For relocatable installations, the list of schema loading paths does not include a build-time configured path any more. #1249
Values in JSON fields that can't be converted to the type that is specified in the schema won't cause the containing event to be dropped any longer. #1250
Line based imports correctly handle read timeouts that occur in the middle of a line. #1276
Disk monitor quota settings not ending in a 'B' are no longer silently discarded. #1278
A potential race condition that could lead to a hanging export if a partition was persisted just as it was scanned no longer exists. #1295
2020.12.16
Breaking Changes
The
splunk-to-vast
script has a new name:taxonomize
. The script now also generates taxonomy declarations for Azure Sentinel. #1134CAF-encoded table slices no longer exist. As such, the option
vast.import.batch-encoding
now only supportsarrow
andmsgpack
as arguments. #1142The on-disk format for table slices now supports versioning of table slice encodings. This breaking change makes it so that adding further encodings or adding new versions of existing encodings is possible without breaking again in the future. #1143 #1157 #1160 #1165
Archive segments no longer include an additional, unnecessary version identifier. We took the opportunity to clean this up bundled with the other recent breaking changes. #1168
The build configuration of VAST received a major overhaul. Inclusion of libvast in other procects via
add_subdirectory(path/to/vast)
is now easily possible. The names of all build options were aligned, and the new build summary shows all available options. #1175The
port
type is no longer a first-class type. The new way to represent transport-layer ports relies oncount
instead. In the schema, VAST ships with a new aliastype port = count
to keep existing schema definitions in tact. However, this is a breaking change because the on-disk format and Arrow data representation changed. Queries with:port
type extractors no longer work. Similarly, the syntax53/udp
no longer exists; usecount
syntax53
instead. Since mostport
occurrences do not carry a known transport-layer type, and the type information exists typically in a separate field, removingport
as native type streamlines the data model. #1187
Changes
VAST no longer requires you to manually remove a stale PID file from a no-longer running
vast
process. Instead, VAST prints a warning and overwrites the old PID file. #1128VAST does not produce metrics by default any more. The option
--disable-metrics
has been renamed to--enable-metrics
accordingly. #1137VAST now processes the schema directory recursively, as opposed to stopping at nested directories. #1154
The default segment size in the archive is now 1 GiB. This reduces fragmentation of the archive meta data and speeds up VAST startup time. #1166
VAST now listens on port 42000 instead of letting the operating system choose the port if the option
vast.endpoint
specifies an endpoint without a port. To restore the old behavior, set the port to 0 explicitly. #1170The Suricata schemas received an overhaul: there now exist
vlan
andin_iface
fields in all types. In addition, VAST ships with new types forikev2
,nfs
,snmp
,tftp
,rdp
,sip
anddcerpc
. Thetls
type gets support for the additionalsni
andsession_resumed
fields. #1176 #1180 #1186 #1237 @sattaInstalled schema definitions now reside in
<datadir>/vast/schema/types
, taxonomy definitions in<datadir>/vast/schema/taxonomy
, and concept definitions in<datadir/vast/schema/concepts
, as opposed to them all being in the schema directory directly. When overriding an existing installation, you may have to delete the old schema definitions by hand. #1194The
zeek
export format now strips off the prefixzeek.
to ensure full compatibility with regular Zeek output. For all non-Zeek types, the prefix remains intact. #1205
Experimental Features
VAST now ships with its own taxonomy and basic concept definitions for Suricata, Zeek, and Sysmon. #1135 #1150
The query language now supports models. Models combine a list of concepts into a semantic unit that can be fulfiled by an event. If the type of an event contains a field for every concept in a model. Turn to the documentation for more information. #1185 #1228
The expression language gained support for the
#field
meta extractor. It is the complement for#type
and uses suffix matching for field names at the layout level. #1228
Features
The new option
vast.client-log-file
enables client-side logging. By default, VAST only writes log files for the server process. #1132The new option
--print-bytesizes
oflsvast
prints information about the size of certain fields of the flatbuffers inside a VAST database directory. #1149The storage required for index IP addresses has been optimized. This should result in significantly reduced memory usage over time, as well as faster restart times and reduced disk space requirements. #1172 #1200 #1216
A new key 'meta-index-bytes' appears in the status output generated by
vast status --detailed
. #1193The new
dump
command prints configuration and schema-related information. The implementation allows for printing all registered concepts and models, viavast dump concepts
andvast dump models
. The flag to--yaml
todump
switches from JSON to YAML output, such that it confirms to the taxonomy configuration syntax. #1196 #1233On Linux, VAST now contains a set of built-in USDT tracepoints that can be used by tools like
perf
orbpftrace
when debugging. Initially, we provide the two tracepointschunk_make
andchunk_destroy
, which trigger every time avast::chunk
is created or destroyed. #1206Low-selectivity queries of string (in)equality queries now run up to 30x faster, thanks to more intelligent selection of relevant index partitions. #1214
Bug Fixes
vast import
no longer stalls when it doesn't receive any data for more than 10 seconds. #1136The
vast.yaml.example
contained syntax errors. The example config file now works again. #1145VAST no longer starts if the specified config file does not exist. #1147
The output of
vast status --detailed
now contains informations about runnings sinks, e.g.,vast export <format> <query>
processes. #1155VAST no longer blocks when an invalid query operation is issued. #1189
The type registry now detects and handles breaking changes in schemas, e.g., when a field type changes or a field is dropped from record. #1195
The index now correctly drops further results when queries finish early, thus improving the performance of queries for a limited number of events. #1209
The index no longer crashes when too many parallel queries are running. #1210
The index no longer causes exporters to deadlock when the meta index produces false positives. #1225
The summary log message of
vast export
now contains the correct number of candidate events. #1228The
vast status
command does not collect status information from sources and sinks any longer. They were often too busy to respond, leading to a long delay before the command completed. #1234Concepts that reference other concepts are now loaded correctly from their definition. #1236
2020.10.29
Changes
The new option
import.read-timeout
allows for setting an input timeout for low volume sources. Reaching the timeout causes the current batch to be forwarded immediately. This behavior was previously controlled byimport.batch-timeout
, which now only controls the maximum buffer time before the source forwards batches to the server. #1096VAST will now warn if a client command connects to a server that runs on a different version of the vast binary. #1098
Log files are now less verbose because class and function names are not printed on every line. #1107
The default database directory moved to
/var/lib/vast
for Linux deployments. #1116
Experimental Features
The query language now comes with support for concepts, the first part of taxonomies. Concepts is a mechanism to unify the various naming schemes of different data formats into a single, coherent nomenclature. #1102
A new disk monitor component can now monitor the database size and delete data that exceeds a specified threshold. Once VAST reaches the maximum amount of disk space, the disk monitor deletes the oldest data. The command-line options
--disk-quota-high
,--disk-quota-low
, and--disk-quota-check-interval
control the rotation behavior. #1103
Features
When running VAST under systemd supervision, it is now possible to use the
Type=notify
directive in the unit file to let VAST notify the service manager when it becomes ready. #1091The new options
vast.segments
andvast.max-segment-size
control how the archive generates segments. #1103The new script
splunk-to-vast
converts a splunk CIM model file in JSON to a VAST taxonomy. For example,splunk-to-vast < Network_Traffic.json
renders the concept definitions for the Network Traffic datamodel. The generated taxonomy does not include field definitions, which users should add separately according to their data formats. #1121The expression language now accepts records without field names. For example,
id == <192.168.0.1, 41824, 143.51.53.13, 25, "tcp">
is now valid syntax and instantiates a record with 5 fields. Note: expressions with records currently do not execute. #1129
Bug Fixes
The lookup for schema directories now happens in a fixed order. #1086
Sources that receive no or very little input do not block
vast status
any longer. #1096The
vast status --detailed
command now correctly shows the status of all sources, i.e.,vast import
orvast spawn source
commands. #1109VAST no longer opens a random public port, which used to be enabled in the experimental VAST cluster mode in order to transparently establish a full mesh. #1110
The
lsvast
tool failed to print FlatBuffers schemas correctly. The output now renders correctly. #1123
2020.09.30
Breaking Changes
Data exported in the Apache Arrow format now contains the name of the payload record type in the metadata section of the schema. #1072
The persistent storage format of the index now uses FlatBuffers. #863
Changes
The JSON export format now renders
duration
andport
fields using strings as opposed to numbers. This avoids a possible loss of information and enables users to re-use the output in follow-up queries directly. #1034The delay between the periodic log messages for reporting the current event rates has been increased to 10 seconds. #1035
The global VAST configuration now always resides in
<sysconfdir>/vast/vast.conf
, and bundled schemas always in<datadir>/vast/schema/
. VAST no longer supports reading avast.conf
file in the current working directory. #1036The proprietary VAST configuration file has changed to the more ops-friendly industry standard YAML. This change introduced also a new dependency: yaml-cpp version 0.6.2 or greater. The top-level
vast.yaml.example
illustrates how the new YAML config looks like. Please rename existing configuration files fromvast.conf
tovast.yaml
. VAST still readsvast.conf
but will soon only look forvast.yaml
orvast.yml
files in available configuration file paths. #1045 #1055 #1059 #1062The options that affect batches in the
import
command received new, more user-facing names:import.table-slice-type
,import.table-slice-size
, andimport.read-timeout
are now calledimport.batch-encoding
,import.batch-size
, andimport.read-timeout
respectively. #1058All configuration options are now grouped into
vast
andcaf
sections, depending on whether they affect VAST itself or are handed through to the underlying actor framework CAF directly. Take a look at the bundledvast.yaml.example
file for an explanation of the new layout. #1073We refactored the index architecture to improve stability and responsiveness. This includes fixes for several shutdown issues. #863
Experimental Features
- The
vast get
command has been added. It retrieves events from the database directly by their ids. #938
Features
VAST now supports the XDG base directory specification: The
vast.conf
is now found at${XDG_CONFIG_HOME:-${HOME}/.config}/vast/vast.conf
, and schema files at${XDG_DATA_HOME:-${HOME}/.local/share}/vast/schema/
. The user-specific configuration file takes precedence over the global configuration file in<sysconfdir>/vast/vast.conf
. #1036VAST now merges the contents of all used configuration files instead of using only the most user-specific file. The file specified using
--config
takes the highest precedence, followed by the user-specific path${XDG_CONFIG_HOME:-${HOME}/.config}/vast/vast.conf
, and the compile-time path<sysconfdir>/vast/vast.conf
. #1040VAST now ships with a new tool
lsvast
to display information about the contents of a VAST database directory. Seelsvast --help
for usage instructions. #863The output of the
status
command was restructured with a strong focus on usability. The new flags--detailed
and--debug
add additional content to the output. #995
Bug Fixes
- Stalled sources that were unable to generate new events no longer stop import processes from shutting down under rare circumstances. #1058
2020.08.28
Breaking Changes
- We now bundle a patched version of CAF, with a changed ABI. This means that if you're linking against the bundled CAF library, you also need to distribute that library so that VAST can use it at runtime. The versions are API compatible so linking against a system version of CAF is still possible and supported. #1020
Changes
The
set
type has been removed. Experience with the data model showed that there is no strong use case to separate sets from vectors in the core. While this may be useful in programming languages, VAST deals with immutable data where set constraints have been enforced upstream. This change requires updating existing schemas by changingset<T>
tovector<T>
. In the query language, the new symbol for the emptymap
changed from{-}
to{}
, as it now unambiguously identifiesmap
instances. #1010The
vector
type has been renamed tolist
. In an effort to streamline the type system vocabulary, we favorlist
overvector
because it's closer to existing terminology (e.g., Apache Arrow). This change requires updating existing schemas by changingvector<T>
tolist<T>
. #1016The expression field parser now allows the '-' character. #999
Features
VAST now writes a PID lock file on startup to prevent multiple server processes from accessing the same persistent state. The
pid.lock
file resides in thevast.db
directory. #1001The default schema for Suricata has been updated to support the
suricata.ftp
andsuricata.ftp_data
event types. #1009VAST now prints the location of the configuration file that is used. #1009
Bug Fixes
The shutdown process of the server process could potentially hang forever. VAST now uses a 2-step procedure that first attempts to terminate all components cleanly. If that fails, it will attempt a hard kill afterwards, and if that fails after another timeout, the process will call
abort(3)
. #1005When continuous query in a client process terminated, the node did not clean up the corresponding server-side state. This memory leak no longer exists. #1006
The port encoding for Arrow-encoded table slices is now host-independent and always uses network-byte order. #1007
Importing JSON no longer fails for JSON fields containing
null
when the corresponding VAST type in the schema is a non-trivial type likevector<string>
. #1009Some file descriptors remained open when they weren't needed any more. This descriptor leak has been fixed. #1018
When running VAST under heavy load, CAF stream slot ids could wrap around after a few days and deadlock the system. As a workaround, we extended the slot id bit width to make the time until this happens unrealistically large. #1020
Incomplete reads have not been handled properly, which manifested for files larger than 2GB. On macOS, writing files larger than 2GB may have failed previously. VAST now respects OS-specific constraints on the maximum block size. #1025
VAST would overwrite existing on-disk state data when encountering a partial read during startup. This state-corrupting behavior no longer exists. #1026
VAST did not terminate when a critical component failed during startup. VAST now binds the lifetime of the node to all critical components. #1028
MessagePack-encoded table slices now work correctly for nested container types. #984
A bug in the expression parser prevented the correct parsing of fields starting with either 'F' or 'T'. #999
2020.07.28
Breaking Changes
- FlatBuffers is now a required dependency for VAST. The archive and the segment store use FlatBuffers to store and version their on-disk persistent state. #972
Changes
The suricata schema file contains new type definitions for the stats, krb5, smb, and ssh events. #954 #986
VAST now recognizes
/etc/vast/schema
as an additional default directory for schema files. #980
Features
Starting with this release, installing VAST on any Linux becomes significantly easier: A static binary will be provided with each release on the GitHub releases page. #966
We open-sourced our MessagePack-based table slice implementation, which provides a compact row-oriented encoding of data. This encoding works well for binary formats (e.g., PCAP) and access patterns that involve materializing entire rows. The MessagePack table slice is the new default when Apache Arrow is unavailable. To enable parsing into MessagePack, you can pass
--table-slice-type=msgpack
to theimport
command, or set the configuration optionimport.table-slice-type
to'msgpack'
. #975
Bug Fixes
- The PCAP reader now correctly shows the amount of generated events. #954
2020.06.25
Changes
The options
system.table-slice-type
andsystem.table-slice-size
have been removed, as they duplicatedimport.table-slice-type
andimport.table-slice-size
respectively. #908 #951The JSON export format now renders timestamps using strings instead of numbers in order to avoid possible loss of precision. #909
The
default
table slice type has been renamed tocaf
. It has not been the default when built with Apache Arrow support for a while now, and the new name more accurately reflects what it is doing. #948
Experimental Features
- VAST now supports aging out existing data. This feature currently only concerns data in the archive. The options
system.aging-frequency
andsystem.aging-query
configure a query that runs on a regular schedule to determine which events to delete. It is also possible to trigger an aging cycle manually. #929
Features
VAST now has options to limit the amount of results produced by an invocation of
vast explore
. #882The
import json
command's type restrictions are more relaxed now, and can additionally convert from JSON strings to VAST internal data types. #891VAST now supports
/etc/vast/vast.conf
as an additional fallback for the configuration file. The following file locations are looked at in order: Path specified on the command line via--config=path/to/vast.conf
,vast.conf
in current working directory,${INSTALL_PREFIX}/etc/vast/vast.conf
, and/etc/vast/vast.conf
. #898The
import
command gained a new--read-timeout
option that forces data to be forwarded to the importer regardless of the internal batching parameters and table slices being unfinished. This allows for reducing the latency between theimport
command and the node. The default timeout is 10 seconds. #916The output format for the
explore
andpivot
commands can now be set using theexplore.format
andpivot.format
options respectively. Both default to JSON. #921The meta index now uses Bloom filters for equality queries involving IP addresses. This especially accelerates queries where the user wants to know whether a certain IP address exists in the entire database. #931
Bug Fixes
A use after free bug would sometimes crash the node while it was shutting down. #896
A bogus import process that assembled table slices with a greater number of events than expected by the node was able to lead to wrong query results. #908
The
export json
command now correctly unescapes its output. #910VAST now correctly checks for control characters in inputs. #910
2020.05.28
Changes
The command line flag for disabling the accountant has been renamed to
--disable-metrics
to more accurately reflect its intended purpose. The internalvast.statistics
event has been renamed tovast.metrics
. #870Spreading a query over multiple command line arguments in commands like explore/export/pivot/etc. has been deprecated. #878
Experimental Features
- Added a new
explore
command to VAST that can be used to show data records within a certain time from the results of a query. #873#877
Features
All input parsers now support mixed
\n
and\r\n
line endings. #865When importing events of a new or updated type, VAST now only requires the type to be specified once (e.g., in a schema file). For consecutive imports, the event type does not need to be specified again. A list of registered types can now be viewed using
vast status
under the keynode.type-registry.types
. #875When importing JSON data without knowing the type of the imported events a priori, VAST now supports automatic event type deduction based on the JSON object keys in the data. VAST selects a type iff the set of fields match a known type. The
--type
/-t
option to theimport
command restricts the matching to the set of types that share the provided prefix. Omitting-t
attempts to match JSON against all known types. If only a single variant of a type is matched, the import falls back to the old behavior and fills innil
for mismatched keys. #875VAST now prints a message when it is waiting for user input to read a query from a terminal. #878
VAST now ships with a schema suitable for Sysmon import. #886
Bug Fixes
The parser for Zeek tsv data used to ignore attributes that were defined for the Zeek-specific types in the schema files. It has been modified to respect and prefer the specified attributes for the fields that are present in the input data. #847
Fixed a bug that caused
vast import
processes to produce'default'
table slices, despite having the'arrow'
type as the default. #866Fixed a bug where setting the
logger.file-verbosity
in the config file would not have an effect. #866
2020.04.29
Changes
The index specific options
max-partition-size
,max-resident-partitions
,max-taste-partitions
, andmax-queries
can now be specified on the command line when starting a node. #728The default bind address has been changed from
::
tolocalhost
. #828The option
--skip-candidate-checks
/-s
for thecount
command was renamed to--estimate
/-e
. #843
Features
Packet drop and discard statistics are now reported to the accountant for PCAP import, and are available using the keys
pcap-reader.recv
,pcap-reader.drop
,pcap-reader.ifdrop
,pcap-reader.discard
, andpcap-reader.discard-rate
in thevast.statistics
event. If the number of dropped packets exceeds a configurable threshold, VAST additionally warns about packet drops on the command line. #827 #844Bash autocompletion for
vast
is now available via the autocomplete script located atscripts/vast-completions.bash
in the VAST source tree. #833
Bug Fixes
Archive lookups are now interruptible. This change fixes an issue that caused consecutive exports to slow down the node, which improves the overall performance for larger databases considerably. #825
Fixed a crash when importing data while a continuous export was running for unrelated events. #830
Queries of the form
x != 80/tcp
were falsely evaluated asx != 80/? && x != ?/tcp
. (The syntax in the second predicate does not yet exist; it only illustrates the bug.) Port inequality queries now correctly evaluatex != 80/? || x != ?/tcp
. E.g., the result now contains values like80/udp
and80/?
, but also8080/tcp
. #834Fixed a bug that could cause stalled input streams not to forward events to the index and archive components for the JSON, CSV, and Syslog readers, when the input stopped arriving but no EOF was sent. This is a follow-up to #750. A timeout now ensures that that the readers continue when some events were already handled, but the input appears to be stalled. #835
For some queries, the index evaluated only a subset of all relevant partitions in a non-deterministic manner. Fixing a violated evaluation invariant now guarantees deterministic execution. #842
The
stop
command always returned immediately, regardless of whether it succeeded. It now blocks until the remote node shut down properly or returns an error exit code upon failure. #849
2020.03.26
Changes
The VERBOSE log level has been added between INFO and DEBUG. This level is enabled at build time for all build types, making it possible to get more detailed logging output from release builds. #787
The internal statistics event type
vast.account
has been renamed tovast.statistics
for clarity. #789The command line options prefix for changing CAF options was changed from
--caf#
to--caf.
. #797The log folder
vast.log/
in the current directory will not be created by default any more. Users must explicitly set thesystem.file-verbosity
option if they wish to keep the old behavior. #803The config option
system.log-directory
was deprecated and replaced by the new optionsystem.log-file
. All logs will now be written to a single file. #806
Features
The new
vast import syslog
command allows importing Syslog messages as defined in RFC5424. #770The option
--disable-community-id
has been added to thevast import pcap
command for disabling the automatic computation of Community IDs. #777Continuous export processes can now be stopped correctly. Before this change, the node showed an error message and the exporting process exited with a non-zero exit code. #779
The short option
-c
for setting the configuration file has been removed. The long option--config
must now be used instead. This fixed a bug that did not allow for-c
to be used for continuous exports. #781Expressions must now be parsed to the end of input. This fixes a bug that caused malformed queries to be evaluated until the parser failed. For example, the query
#type == "suricata.http" && .dest_port == 80
was erroneously evaluated as#type == "suricata.http"
instead. #791The hash index has been re-enabled after it was outfitted with a new high-performance hash map implementation that increased performance to the point where it is on par with the regular index. #796
An under-the-hood change to our parser-combinator framework makes sure that we do not discard possibly invalid input data up the the end of input. This uncovered a bug in our MRT/bgpdump integrations, which have thus been disabled (for now), and will be fixed at a later point in time. #808
2020.02.27
Changes
The build system will from now on try use the CAF library from the system, if one is provided. If it is not found, the CAF submodule will be used as a fallback. #740
VAST now supports (and requires) Apache Arrow >= 0.16. #751
The option
--historical
for export commands has been removed, as it was the default already. #754The option
--directory
has been replaced by--db-directory
andlog-directory
, which set directories for persistent state and log files respectively. The default log file path has changed fromvast.db/log
tovast.log
. #758Hash indices have been disabled again due to a performance regression. #765
Features
- For users of the Nix package manager, expressions have been added to generate reproducible development environments with
nix-shell
. #740
Bug Fixes
- Continuously importing events from a Zeek process with a low rate of emitted events resulted in a long delay until the data would be included in the result set of queries. This is because the import process would buffer up to 10,000 events before sending them to the server as a batch. The algorithm has been tuned to flush its buffers if no data is available for more than 500 milliseconds. #750
2020.01.31
Changes
The
import pcap
command no longer takes interface names via--read,-r
, but instead from a separate option named--interface,-i
. This change has been made for consistency with other tools. #641Record field names can now be entered as quoted strings in the schema and expression languages. This lifts a restriction where JSON fields with whitespaces or special characters could not be ingested. #685
Build configuration defaults have been adapated for a better user experience. Installations are now relocatable by default, which can be reverted by configuring with
--without-relocatable
. Additionally, new sets of defaults named--release
and--debug
(renamed from--dev-mode
) have been added. #695Two minor modifications were done in the parsing framework: (i) the parsers for enums and records now allow trailing separators, and (ii) the dash (
-
) was removed from the allowed characters of schema type names. #706VAST is switching to a calendar-based versioning scheme starting with this release. #739
Features
When a record field has the
#index=hash
attribute, VAST will choose an optimized index implementation. This new index type only supports (in)equality queries and is therefore intended to be used with opaque types, such as unique identifiers or random strings. #632 #726Added Apache Arrow as new export format. This allows users to export query results as Apache Arrow record batches for processing the results downstream, e.g., in Python or Spark. #633
The
import pcap
command now takes an optional snapshot length via--snaplen
. If the snapshot length is set to snaplen, and snaplen is less than the size of a packet that is captured, only the first snaplen bytes of that packet will be captured and provided as packet data. #642An experimental new Python module enables querying VAST and processing results as pyarrow tables. #685
The long option
--config
, which sets an explicit path to the VAST configuration file, now also has the short option-c
. #689On FreeBSD, a VAST installation now includes an rc.d script that simpliefies spinning up a VAST node. CMake installs the script at
PREFIX/etc/rc.d/vast
. #693
Bug Fixes
In some cases it was possible that a source would connect to a node before it was fully initialized, resulting in a hanging
vast import
process. #647PCAP ingestion failed for traces containing VLAN tags. VAST now strips IEEE 802.1Q headers instead of skipping VLAN-tagged packets. #650
Importing events over UDP with
vast import <format> --listen :<port>/udp
failed to register the accountant component. This caused an unexpected message warning to be printed on startup and resulted in losing import statistics. VAST now correctly registers the accountant. #655The import process did not print statistics when importing events over UDP. Additionally, warnings about dropped UDP packets are no longer shown per packet, but rather periodically reported in a readable format. #662
A bug in the quoted string parser caused a parsing failure if an escape character occurred in the last position. #685
A race condition in the index logic was able to lead to incomplete or empty result sets for
vast export
. #703The example configuration file contained an invalid section
vast
. This has been changed to the correct namesystem
. #705
0.2 - 2019.10.30
Changes
The query language has been extended to support expression of the form
X == /pattern/
, whereX
is a compatible LHS extractor. Previously, patterns only supports the match operator~
. The two operators have the same semantics when one operand is a pattern.CAF and Broker are no longer required to be installed prior to building VAST. These dependencies are now tracked as git submodules to ensure version compatibility. Specifying a custom build is still possible via the CMake variables
CAF_ROOT_DIR
andBROKER_ROOT_DIR
.When exporting data in
pcap
format, it is no longer necessary to manually restrict the query by adding the predicate#type == "pcap.packet"
to the expression. This now happens automatically because only this type contains the raw packet data.When defining schema attributes in key-value pair form, the value no longer requires double-quotes. For example,
#foo=x
is now the same as#foo="x"
. The form without double-quotes consumes the input until the next space and does not support escaping. In case an attribute value contains whitespace, double-quotes must be provided, e.g.,#foo="x y z"
.The PCAP packet type gained the additional field
community_id
that contains the Community ID flow hash. This identifier facilitates pivoting to a specific flow from data sources with connnection-level information, such Zeek or Suricata logs.Log files generally have some notion of timestamp for recorded events. To make the query language more intuitive, the syntax for querying time points thus changed from
#time
to#timestamp
. For example,#time > 2019-07-02+12:00:00
now reads#timestamp > 2019-07-02+12:00:00
.Default schema definitions for certain
import
formats changed from hard-coded to runtime-evaluated. The default location of the schema definition files is$(dirname vast-executable)
/../share/vast/schema. Currently this is used for the Suricata JSON log reader.The default directory name for persistent state changed from
vast
tovast.db
. This makes it possible to run./vast
in the current directory without having to specify a different state directory on the command line.Nested types are from now on accessed by the
.
-syntax. This means VAST now has a unified syntax to select nested types and fields. For example, what used to bezeek::http
is now justzeek.http
.The (internal) option
--node
for theimport
andexport
commands has been renamed from-n
to-N
, to allow usage of-n
for--max-events
.To make the export option to limit the number of events to be exported more idiomatic, it has been renamed from
--events,e
to--max-events,n
. Nowvast export -n 42
generates at most 42 events.
Features
The default schema for Suricata has been updated to support the new
suricata.smtp
event type in Suricata 5.The
export null
command retrieves data, but never prints anything. Its main purpose is to make benchmarking VAST easier and faster.The new
pivot
command retrieves data of a related type. It inspects each event in a query result to find an event of the requested type. If a common field exists in the schema definition of the requested type, VAST will dynamically create a new query to fetch the contextual data according to the type relationship. For example, if two recordsT
andU
share the same fieldx
, and the user requests to pivot viaT.x == 42
, then VAST will fetch all data forU.x == 42
. An example use case would be to pivot from a Zeek or Suricata log entry to the corresponding PCAP packets. VAST uses the fieldcommunity_id
to pivot between the logs and the packets. Pivoting is currently implemented for Suricata, Zeek (with community ID computation enabled), and PCAP.The new
infer
command performs schema inference of input data. The command can deduce the input format and creates a schema definition that is sutable to use with the supplied data. Supported input types include Zeek TSV and JSONLD.The newly added
count
comman allows counting hits for a query without exporting data.Commands now support a
--documentation
option, which returns Markdown-formatted documentation text.A new schema for Argus CSV output has been added. It parses the output of
ra(1)
, which produces CSV output when invoked with-L 0 -c ,
.The schema language now supports comments. A double-slash (
//
) begins a comment. Comments last until the end of the line, i.e., until a newline character (\n
).The
import
command now supports CSV formatted data. The type for each column is automatically derived by matching the column names from the CSV header in the input with the available types from the schema definitions.Configuring how much status information gets printed to STDERR previously required obscure config settings. From now on, users can simply use
--verbosity=<level>,-v <level>
, where<level>
is one ofquiet
,error
,warn
,info
,debug
, ortrace
. However,debug
andtrace
are only available for debug builds (otherwise they fall back to log levelinfo
).The query expression language now supports data predicates, which are a shorthand for a type extractor in combination with an equality operator. For example, the data predicate
6.6.6.6
is the same as:addr == 6.6.6.6
.The
index
object in the output fromvast status
has a new fieldstatistics
for a high-level summary of the indexed data. Currently, there exists a nestedlayouts
objects with per-layout statistics about the number of events indexed.The
accountant
object in the output fromvast status
has a new fieldlog-file
that points to the filesystem path of the accountant log file.Data extractors in the query language can now contain a type prefix. This enables an easier way to extract data from a specific type. For example, a query to look for Zeek conn log entries with responder IP address 1.2.3.4 had to be written with two terms,
#type == zeek.conn && id.resp_h == 1.2.3.4
, because the nested id record can occur in other types as well. Such queries can now written more tersely aszeek.conn.id.resp_h == 1.2.3.4
.VAST gained support for importing Suricata JSON logs. The import command has a new suricata format that can ingest EVE JSON output.
The data parser now supports
count
andinteger
values according to the International System for Units (SI). For example,1k
is equal to1000
and1Ki
equal to1024
.VAST can now ingest JSON data. The
import
command gained thejson
format, which allows for parsing line-delimited JSON (LDJSON) according to a user-selected type with--type
. The--schema
or--schema-file
options can be used in conjunction to supply custom types. The JSON objects in the input must match the selected type, that is, the keys of the JSON object must be equal to the record field names and the object values must be convertible to the record field types.For symmetry to the
export
command, theimport
command gained the--max-events,n
option to limit the number of events that will be imported.The
import
command gained the--listen,l
option to receive input from the network. Currently only UDP is supported. Previously, one had to use a clever netcat pipe with enough receive buffer to achieve the same effect, e.g.,nc -I 1500 -p 4200 | vast import pcap
. Now this pipe degenerates tovast import pcap -l
.The new
--disable-accounting
option shuts off periodic gathering of system telemetry in the accountant actor. This also disables output in theaccounting.log
.
Bug Fixes
The user environments
LDFLAGS
were erroneously passed toar
. Instead, the user environmentsARFLAGS
are now used.Exporting data with
export -n <count>
crashed whencount
was a multiple of the table slice size. The command now works as expected.Queries of the form
#type ~ /pattern/
used to be rejected erroneously. The validation code has been corrected and such queries are now working as expected.When specifying
enum
types in the schema, ingestion failed because there did not exist an implementation for such types. It is now possible to use define enumerations in schema as expected and query them as strings.Queries with the less
<
or greater>
operators produced off-by-one results for theduration
when the query contained a finer resolution than the index. The operator now works as expected.Timestamps were always printed in millisecond resolution, which lead to loss of precision when the internal representation had a higher resolution. Timestamps are now rendered up to nanosecond resolution - the maximum resolution supported.
All query expressions in the form
#type != X
were falsely evaluated as#type == X
and consequently produced wrong results. These expressions now behave as expected.Parsers for reading log input that relied on recursive rules leaked memory by creating cycling references. All recursive parsers have been updated to break such cycles and thus no longer leak memory.
The Zeek reader failed upon encountering logs with a
double
column, as it occurs incapture_loss.log
. The Zeek parser generator has been fixed to handle such types correctly.Some queries returned duplicate events because the archive did not filter the result set properly. This no longer occurs after fixing the table slice filtering logic.
The
map
data parser did not parse negative values correctly. It was not possible to parse strings of the form"{-42 -> T}"
because the parser attempted to parse the token for the empty map"{-}"
instead.The CSV printer of the
export
command used to insert 2 superfluous fields when formatting an event: The internal event ID and a deprecated internal timestamp value. Both fields have been removed from the output, bringing it into line with the other output formats.When a node terminates during an import, the client process remained unaffected and kept processing input. Now the client terminates when a remote node terminates.
Evaluation of predicates with negations return incorrect results. For example, the expression
:addr !in 10.0.0.0/8
created a disjunction of all fields to which:addr
resolved, without properly applying De-Morgan. The same bug also existed for key extractors. De-Morgan is now applied properly for the operations!in
and!~
.
0.1 - 2019.02.28
This is the first official release.