Skip to main content

Changelog

This changelog documents all notable changes to Tenzir and is updated on every release.

Next

Changes

  • Errors from the startup of configured pipelines, including those coming from configured packages, now have improved rendering. #4886

  • The implicit sources and sinks that can be set via commandline options or configuration now use TQL2. #4921

  • The default implicit event sink now writes TQL values instead of JSON. #4921

Features

  • metrics "caf" offers insights into Tenzir's underlying actor system. This is primarily aimed at developers for performance benchmarking. #4897

  • The new merge function combines two records. merge(foo, bar) is a shorthand for {...foo, ...bar}. #4897

  • We added a to_asl operator that can be used to send OCSF normalized events to an Amazon Security Lake. #4911

  • You can use the new string.match_regex(regex:string) function to check whether a string partially matches a regular expression. #4917

  • You can use the new write_tql operator to print events as TQL expressions. #4921

  • We added strip options to write_json and write_ndjson, allowing you to strip null fields as well as empty records or lists. #4921

v4.25.0

Changes

  • The topic argument for load_kafka and save_kafka is now a positional argument, instead of a named argument. #4805

  • The array version of from that allowed you to create multiple events has been removed. Instead, you can just pass multiple records to from now. #4805

  • Functions can now return values of different types for the same input types. For example, x.otherwise(y) no longer requires that x has the same type as y. #4839

  • The compress and decompress operators have been deprecated in favor of separate operators for each compression algorithm. These new operators expose additional options, such as compress_gzip level=10, format="deflate". #4876

Features

  • We have added a new to_snowflake sink operator, writing events into a snowflake table. #4589

  • We have added the from operator that allows you to easily onboard data from most sources. For example, you can now write from "https://example.com/file.json.gz" to automatically deduce the load operator, compression, and format. #4805

  • We have added the to operator that allows you to easily send data to most destinations. For example, you can now write to "ftps://example.com/file.json.gz" to automatically deduce the save operator, compression, and format. #4805

  • You can use the new subnet(string) function to parse strings as subnets. #4805

  • Several new options are now available for the load_http operator: data, json, form, skip_peer_verification, skip_hostname_verification, chunked, and multipart. The skip_peer_verification and skip_hostname_verification options are now also available for the save_http operator. #4811

  • The read_csv, read_kv, read_ssv, read_tsv and read_xsv operators now support custom quote characters. #4837

  • The read_csv, read_ssv, read_tsv and read_xsv operators support doubled quote escaping. #4837

  • The read_csv, read_ssv, read_tsv and read_xsv operators now accept multi-character strings as separators. #4837

  • The list_sep option for the read_csv, read_ssv, read_tsv and read_xsv operators can be set to an empty string, which will disable list parsing. #4837

  • The new string.parse_leef() function can be used to parse a string as a LEEF message. #4837

  • Start your Tenzir Node with tenzir-node --tql2 or set the TENZIR_TQL2=true environment variable to enable TQL2-only mode for your node. In this mode, all pipelines will run as TQL2, with the old TQL1 pipelines only being available through the legacy operator. In Q1 2025, this option will be enabled by default, and later in 2025 the legacy operator and TQL1 support will be removed entirely. #4840

  • Whether an IP address is contained in a subnet can now be checked using expressions such as 1.2.3.4 in 1.2.0.0/16. Similarly, to check whether a subnet is included in another subnet, use 1.2.0.0/16 in 1.0.0.0/8. #4841

  • TQL2 now allows writing x not in y as an equivalent to not (x in y) for better readability. #4844

  • The save_email now accepts a tls option to specify TLS usage when establishing the SMTP connection. #4848

  • The deduplicate operator in TQL2 to help you remove events with a common key. The operator provides more flexibility than its TQL1 pendant by letting the common key use any expression, not just a field name. You can also control timeouts with finer granularity. #4850

  • The context::erase operator allows you to selectively remove entries from contexts. #4864

  • A new operator to_opensearch is now available for sending data to OpenSearch-compatible Bulk API providers including ElasticSearch. #4871

  • The new duration function now allows to parse expressions resulting in strings as duration values. #4877

  • Numbers and string expressions containing numbers can now be converted into float type values using the float function. #4882

  • User-defined operators can now be written and used in TQL2. To use TQL2, start your definition with the comment // tql2, or use the --tql2 flag to opt into TQL2 as the default. #4884

Bug Fixes

  • Metadata such as @name can now be set to a dynamically computed value that does not have to be a constant. For example, if the field event_name should be used as the event name, @name = event_name now correctly assigns the events their name instead of using the first value. #4839

  • The endpoint argument of the save_email operator was documented as optional but was not parsed as so. This has been fixed and the argument is now correctly optional. #4848

  • Pipelines that begin with export | where followed by an expression that does not depend on the incoming events, such as export | where 1 == 1, no longer cause an internal error. #4861

  • Warnings that happen very early during pipeline startup now correctly show up in the Tenzir Platform. #4867

  • write_parquet now gracefully handles nested empty records by replacing them with nulls. The Apache Parquet format does fundamentally not support empty nested records. #4874

  • Operator invocations that directly use parenthesis but continue after the closing parenthesis are no longer rejected. For example, where (x or y) and z is now being parsed correctly. #4885

v4.24.1

Bug Fixes

  • We fixed a rare crash on startup that would occur when starting the tenzir-node process was so slow that it would try to emit metrics before the component handling metrics was ready. #4846

  • The TQL2 nics operator had a bug causing the operator name to be nic. This has now been fixed and works as documented. #4847

  • We fixed the last aggregation function to return the last element. #4855

  • We fixed a bug introduced with v4.24.0 causing crashes on startup when some of the files in the node's state directory were smaller than 12 bytes. #4856

v4.24.0

Changes

  • The topics provided to the publish and subscribe operators now exactly match the topic field in the corresponding metrics. #4738

  • Using publish and subscribe without an explicitly provided topic now uses the topic main as opposed to an implementation-defined special name. #4738

  • The usage string that is reported when an operator or function is being used incorrectly now uses the same format as the documentation. #4740

  • The functions ocsf_category_name, ocsf_category_uid, ocsf_class_name, and ocsf_class_uid are now called ocsf::category_name, ocsf::category_uid, ocsf::class_name, and ocsf::class_uid, respectively. Similarly, the package_add, package_remove, packages, and show pipelines operators are now called package::add, package::remove, package::list, and pipeline::list, respectively. #4741 #4746

  • The cache operator's ttl and max_ttl options are now called read_timeout and write_timeout, respectively. #4758

  • The option ndjson for write_json operator has been removed in favor of a new operator write_ndjson. #4762

  • The tls_no_verify option of the to_splunk operator is now called skip_peer_verification. #4825

  • The new string function now replaces the str function. The older str name will be available as an alias for some time for compatibility but will be removed in a future release. #4834

Features

  • The new parse_time and format_time functions transform strings into timestamps and vice versa. #4576

  • The following operators are now available in TQL2 for loading and saving: load_amqp, save_amqp, load_ftp, save_ftp, load_nic, load_s3, save_s3, load_sqs, save_sqs, load_udp, save_udp, load_zmq, save_zmq, save_tcp and save_email. #4716 #4807

  • The following new operators are available in TQL2 to convert event streams to byte streams in various formats: write_csv, write_feather, write_json, write_lines, write_ndjson, write_parquet, write_pcap, write_ssv, write_tsv, write_xsv, write_yaml, write_zeek_tsv. #4716 #4807

  • The unroll operator is now available in TQL2. It takes a field of type list, and duplicates the surrounding event for every element of the list. #4736

  • The decapsulate function now handles SLL2 frames (Linux cooked capture encapsulation). #4744

  • The contexts feature is now available in TQL2. It has undergone significant changes to make use of TQL2's more powerful expressions. Contexts are shared between TQL1 and TQL2 pipelines. All operators are grouped in the context module, including the enrich and show contexts operators, which are now called context::enrich and context::list, respectively. To create a new context, use the context::create_lookup_table, context::create_bloom_filter, or context::create_geoip operators. #4753

  • Lookup table contexts now support separate create, write, and read timeouts via the create_timeout, write_timeout, and read_timeout options, respectively. The options are exclusive to contexts updated with TQL2's context::update operator. #4753

  • The --limit option for the TQL1 chart operator controls the previously hardcoded upper limit on the number of events in a chart. The option defaults to 10,000 events. #4757

  • The <list>.map(<capture>, <expression>) function replaces each value from <list> with the value from <expression>. Within <expression>, the elements are available as <capture>. For example, to add 5 to all elements in the list xs, use xs = xs.map(x, x + 5). #4788

  • The <list>.where(<capture>, <predicate>) removes all elements from <list> for which the <predicate> evaluates to false. Within <predicate>, the elements are available as <capture>. For example, to remove all elements smaller than 3 from the list xs, use xs = xs.where(x, x >= 3). #4788

  • The new append, prepend, and concatenate functions add an element to the end of a list, to the front of a list, and merge two lists, respectively. xs.append(y) is equivalent to [...xs, y], xs.prepend(y) is equivalent to [y, ...xs], and concatenate(xs, ys) is equivalent to [...xs, ..ys]. #4792

  • The function otherwise(primary:any, fallback:any) provides a simple way to specify a fallback expression when the primary expression evaluates to null. #4794

  • Indexing records with string expressions is now supported. #4795

  • The split and split_regex functions split a string into a list of strings based on a delimiter or a regular expression, respectively. #4799

  • The join aggregation function concatenates a strings into a single string, optionally separated by a delimiter. #4799

  • The zip function merges two lists into a single list of a record with two fields left and right. For example, zip([1, 2], [3, 4]) returns [{left: 1, right: 3}, {left: 2, right: 4}]. #4803

  • The new functions encode_base64 and decode_base64 encode and decode blobs and strings as Base64. #4806

  • The functions encode_hex and decode_hex transform strings and blobs to/from their hexadecimal byte representation. #4815

  • Aggregation functions now work on lists. For example, [1, 2, 3].sum() will return 6, and ["foo", "bar", "baz"].map(x, x == "bar").any() will return true. #4821

  • The to_splunk operator now supports the cacert, certfile, and keyfile options to provide certificates for the TLS connection. #4825

  • The network function returns the network address of a CIDR subnet. For example, 192.168.0.0/16.network() returns 192.168.0.0. #4828

  • The local and remote operators allow for overriding the location of a pipeline. Local operators prefer running at a client tenzir process, and remote operators prefer running at a remote tenzir-node process. These operators are primarily intended for testing purposes. #4835

  • The unordered operator throws away the order of events in a pipeline. This causes some operators to run faster, e.g., read_ndjson is able to parse events out of order through this. This operator is primarily intended for testing purposes, as most of the time the ordering requirements are inferred from subsequent operators in the pipeline. #4835

Bug Fixes

  • The docs for the sqs connector now correctly reflect the default of 3s for the --poll-time option. #4716

  • context inspect crashed when used to inspect a context that was previously updated with context update with an input containing a field of type enum. This no longer happens. #4746

  • The last metric emitted for each run of the enrich operator was incorrectly named tenzir.enrich.metrics instead of tenzir.metrics.enrich, causing it not to be available via metrics enrich. #4753

  • The enumerate operator now correctly prepends the added index field instead of appending it. #4756

  • It is no longer possible to manually remove contexts that are installed as part of a package. #4768

  • The to_hive operator now correctly writes files relative to the working directory of a tenzir client process instead of relative to the node. #4771

  • The read_ndjson operator no longer uses an error-prone mechanism to continue parsing an NDJSON line that contains an error. Instead, the entire line is skipped. #4801

  • The str function no longer adds extra quotes when given a string. For example, str("") == "\"\"" was changed to str("") == "". #4809

  • The TQL1 and TQL2 sockets operators no longer crash on specific builds. #4816

  • The max_content_length option for the to_splunk operator was named incorrectly in an earlier version to send_timeout. This has now been fixed. #4825

  • We fixed an oversight in the syslog parsers, which caused it to not yield an event until the next line came in. #4829

  • The TQL2 save_http operator had a bug causing it to fail to connect and get stuck in an infinite loop. This is now fixed and works as expected. #4833

v4.23.1

Bug Fixes

  • The node doesn't try to recreate its cache directory on startup anymore, avoiding permissions issues on systems with strict access control. #4742

  • The docker compose setup now uses separate local volumes for each tenzir directory. This fixes a bug where restarting the container resets installed packages or pipelines. #4749

  • The parquet plugin is now available in the tenzir/tenzir and tenzir/tenzir-node Docker images. #4760

  • We fixed a bug in the kafka plugin so that it no longer wrongly splits config options from the yaml files at the dot character. #4761

  • We fixed a crash in pipelines that use the export operator and a subsequent where filter with certain expressions. #4774

  • We fixed a bug causing the syslog parser to never yield events until the input stream ended. #4777

  • We fixed a bug in TQL2's where operator that made it sometimes return incorrect results for events for which the predicate evaluated to null. Now, the operator consistently warns when this happens and drops the events. #4785

v4.23.0

Changes

  • We renamed the TQL2 azure_log_analytics operator to to_azure_log_analytics. #4726

  • We renamed the TQL2 velociraptor operator to from_velociraptor. #4726

Features

  • The relational operator in now supports checking for existence of an element in a list. For example, where x in ["important", "values"] is functionally equivalent to where x == "important" or x == "values". #4691

  • We've added new hash functions for commonly used algorithms: hash_md5, hash_sha1, hash_sha224, hash_sha256, hash_sha384, hash_sha512, hash_xxh3. #4705

  • ceil and floor join the existing round function for rounding numbers, durations, and timestamps upwards and downwards, respectively. #4712

  • The new to_splunk sink operator writes data to Splunk HEC endpoint. #4719

  • The new load_balance operator distributes events over a set of subpipelines. #4720

  • New load_kafka and save_kafka operators enable seamless integration with Apache Kafka in TQL2. #4725

  • The spread syntax ... can now be used inside lists to expand one list into another. For example, [1, ...[2, 3]] evaluates to [1, 2, 3]. #4729

  • TQL now supports "universal function call syntax," which means that every method is callable as a function and every function with at least one positional argument is callable as a method. #4730

Bug Fixes

  • TQL2's summarize now returns a single event when used with no groups and no input events just like in TQL1, making from [] | summarize count=count() return {count: 0} instead of nothing. #4709

  • We eliminated a rare crash in the serve operator that was introduced in v4.20.3. #4715

  • The str function no longer returns the numeric index of an enumeration value. Instead, the result is now the actual name associated with that value. #4717

v4.22.2

Features

  • The new value_counts aggregation function returns a list of values and their frequency. #4701

  • The new sort method sorts fields in records by name and lists by values. #4704

Bug Fixes

  • We fixed a bug that sometimes prevented incoming connections from load_tcp from closing properly. #4674

  • The google-cloud-pubsub connector and TQL2 operators load_google_cloud_pubsub save_google_cloud_pubsub operators are now available in the Docker image. #4690

  • We fixed a bug that caused the mode aggregation function to sometimes ignore some input values. #4701

  • We fixed a bug in the buffer operator that caused it to break when restarting a pipeline or using multiple buffers in a "parallel" context, such as in load_tcp's pipeline argument. #4702

v4.22.1

Features

  • We added three new, TQL2-exclusive aggregation functions: first, last, and mode. The functions return the first, last, and most common non-null value per group, respectively. #4679

Bug Fixes

  • The /serve endpoint now returns instantly when its pipeline fails before the endpoint is used for the first time. In the Tenzir Platform this causes the load more button in the Explorer to correctly stop showing for pipelines that fail shortly after starting. #4688

  • The boolean operators and/or now work correctly for the type null. Previously, null and false evaluated to null, and a warning was emitted. Now, it evaluates to false without a warning. #4689

  • Using the tenzir process from multiple users on the same host sometimes failed because the cache directory was not writable for all users. This no longer happens. #4694

v4.22.0

Features

  • The new google-cloud-pubsub connectors allow subscribing to a Google Cloud Pub/Sub subscription and publishing to a Google Cloud Pub/Sub topic. #4656

  • We added low-level actor metrics that help admins track the system health over time. #4668

Bug Fixes

  • We fixed a bug in the HTTP connectors, that caused them to not respect the http response codes. #4660

  • The node now wipes its cache directory whenever it restarts. #4669

v4.21.1

Features

  • A new sample operator now provides the ability to dynamically sample input data based on the frequency of the receiving events allowing relative sampling in situations of varying load. #4645

  • The grok parser now allows better control over the schema inference. #4657

  • The grok parser can now be directly used when reading input, allowing for read grok. #4657

Bug Fixes

  • We fixed a bug that sometimes caused the tenzir-node process to hang on shutdown. This was most likely to happen when the node shut down immediately after starting up, e.g., because of an invalid configuration file. #4613

  • Fixed a bug in the python operator that could lead to random valid file descriptors in the parent process being closed prematurely. #4646

  • The azure-blob-storage connector is now also available in the static linux binary distributions. #4649

  • We fixed a bug that caused the context_updates field in metrics lookup to be reported once per field specified in the corresponding lookup operator instead of being reported once per operator in total. #4655

v4.21.0

Changes

  • The JSON parser's --precise option is now deprecated, as the "precise" mode is the new default. Use --merge to get the previous "imprecise" behavior. #4527

  • The JSON parser's --no-infer option has been renamed to --schema-only. The old name is deprecated and will be removed in the future. #4527

  • We removed the unused --snapshot option from the lookup operator. #4613

  • Tenzir's internal wire format bitz is now considered stable. Note that the format underwent significant changes as part of its stabilization, and is incompatible with bitz from Tenzir Node v4.20 and older. #4633

  • The lookup operator now prefers recent data in searches for lookups against historical data instead of using the order in which context updates arrive. #4636

Features

  • The CEF, CSV, GELF, JSON, KV, LEEF, Suricata, Syslog, XSV, YAML and Zeek JSON parsers now properly adhere to the schema of the read data. Previously, parsers would merge heterogeneous input into a single, growing schema, inserting nulls for fields that did not exist in some events. #4527

  • The fluent-bit source now properly adheres to the schema of the read data. #4527

  • The CEF, CSV, GELF, JSON, KV, LEEF, Suricata, Syslog, XSV, YAML and Zeek JSON parsers now all support the --schema, --selector flags to parse their data according to some given schema, as well as various other flags to more precisely control their output schema. #4527

  • metrics tcp shows metrics for TCP connections, emitted once every second per connection. The metrics contains the reads and writes on the socket and the number of bytes transmitted. #4564

  • The JSON parser is now able to also handle extremely large events when not using the NDJSON or GELF mode. #4590

  • The kv parser now allows for keys and values to be enclosed in double quotes: Split matches within quotes will not be considered. Quotes will be trimmed of keys and values. For example "key"="nested = value, fun" will now successfully parse as { "key" : "nested = value, fun" }. #4591

  • The buffer operator now works with bytes inputs in addition to the existing support for events inputs. #4594

  • The lines parser can now handle null delimited "lines" with the --null flag. #4603

  • The new azure-blob-storage connector allows reading from and writing to Azure Blob Storage via an URI. #4617

Bug Fixes

  • We fixed various edge cases in parsers where values would not be properly parsed as typed data and were stored as plain text instead. No input data was lost, but no valuable type information was gained either. #4527

  • The import and partitions operators and the tenzir-ctl rebuild command no longer occasionally fail with request timeouts when the node is under high load. #4597

  • We fixed an accidentally quadratic scaling with the number of top-level array elements in read json --arrays-of-objects. As a result, using this option will now be much faster. #4601

  • Pipelines starting with from tcp no longer enter the failed state when an error occurrs in one of the connections. #4602

  • We fixed a very rare crash in the zero-copy parser implementation of read feather and read parquet that was caused by releasing shared memory too early. #4633

v4.20.3

Bug Fixes

  • We fixed a bug where the export, metrics, and diagnostics operators were sometimes missing events from up to the last 30 seconds. In the Tenzir Platform, this showed itself as a gap in activity sparkbars upon loading the page. #4583

  • The /serve endpoint now gracefully handles retried requests with the same continuation token, returning the same result for each request. #4585

v4.20.2

Bug Fixes

  • The empty record type is no longer rejected in schema definitions. #4558

  • We fixed a bug that caused the Demo Node package not to be pre-installed correctly when using the tenzir/tenzir-demo Docker image. #4559

  • We fixed a potential crash in the csv, ssv, and tsv parsers for slowly arriving inputs. #4570

  • The azure-log-analytics operator sometimes errored on startup complaining about am unknown window option. This no longer occurs. #4578

  • Restarting pipelines with the udp connector no longer fails to bind to the socket. #4579

  • The systemd unit now allows binding to privileged ports by default via the ambient capability CAP_NET_BIND_SERVICE. #4580

v4.20.1

Bug Fixes

  • We fixed a regression introduced with Tenzir v4.20 that sometimes caused the Tenzir Platform to fail to fetch results from pipelines. #4554

v4.20.0

Changes

  • The previously deprecated legacy metrics system configured via the tenzir.metrics configuration section no longer exists. Use the metrics operator instead. #4381

  • lookup metrics no longer contain the snapshot field; instead, the values show in the retro field. #4381

  • The show operator is deprecated. Use the operator <aspect> instead of show <aspect>. The information from show dependencies and show build is now available in the version operator. #4455 #4549

  • The lines printer now does not perform any escaping and is no longer an alias to the ssv printer. Additionally, nulls are skipped, instead of being printed as -. #4520

Features

  • The new rebuild metrics contain information about running partition rebuilds. #4381

  • The ingest metrics contain information about all ingested events and their schema. This is slightly different from the existing import metrics, which track only events imported via the import operator, and are separate per pipeline. #4381

  • The new unstoppable flag allows for pipelines to run and repeat indefinitely without the ability to stop or pause. #4513

  • The cache operator is a transformation that passes through events, creating an in-memory cache of events on the first use. On subsequent uses, the operator signals upstream operators no to start at all, and returns the cached events immediately. The operator may also be used as a source for reading from a cache only, or as a sink for writing to a cache only. #4515

  • The /pipeline/launch operator features four new parameters cache_id, cache_capacity,cache_ttl, and cache_max_ttl. If a cache_id is specified, the pipeline's implicit sink will use the cache operator under the hood. At least one of serve_id and cache_id must be specified. #4515

  • The lookup operator is now smarter about retroactive lookups for frequently updated contexts and avoids loading data from disk multiple times for context updates that arrive shortly after one another. #4535

Bug Fixes

  • We fixed a regression introduced in Tenzir v4.19.2 in the azure-log-analytics operator that prevented it from starting correctly. #4516

  • IPv6 addresses with a prefix that is a valid duration, for example 2dff:: with the prefix 2d, now correctly parse as an IP instead of a string. #4523

  • We fixed an issue where the export, metrics, or diagnostics operators crashed the node when started while the node was shutting down or after an unexpected filesystem error occurred. This happened frequently while using the Tenzir Platform, which subscribes to metrics and diagnostics automatically. #4530

  • context inspect <ctx> no longer crashes for lookup table contexts with values of multiple schemas when using subnets as keys. #4531

  • We fixed a bug that sometimes caused the retro.queued_events value in lookup metrics to stop going down again. #4535

v4.19.6

Features

  • The tenzir command-line utility gained a new option --strict, causing it to exit with a non-zero exit code for pipelines that emit at least one warning. #4506

Bug Fixes

  • The slice operator no longer crashes when used with a positive begin and negative end value when operating on less events than -end, e.g., when working on a single event and using slice 0:-1. #4505

  • We fixed a bug in the shell operator that could cause the process to crash when breaking its pipe. Now, the operator shuts down with an error diagnostic instead. #4508

  • Pipelines with the python operator now deploy more quickly, as their deployment no longer waits for the virtual environment to be set up successfully. #4508

v4.19.5

Bug Fixes

  • The serve operator no longer uses an excessive amount of CPU. #4499

v4.19.4

Bug Fixes

  • The packages plugin is now available in the static binary release artifacts. #4490

v4.19.3

Bug Fixes

  • Pipelines from packages now correctly remember their last run number and last state when the reinstalling the package. #4479

v4.19.2

Changes

  • We've made some changes that optimize Tenzir's memory usage. Pipeline operators that emit very small batches of events or bytes at a high frequency now use less memory. The serve operator's internal buffer is now soft-capped at 1Ki instead of 64Ki events, aligning the buffer size with the default upper limit for the number of events that can be fetched at once from /serve. The export, metrics, and diagnostics operators now handle back pressure better and utilize less memory in situations where the node has many small partitions. For expert users, the new tenzir.demand configuration section allows for controlling how eagerly operators demand input from their upstream operators. Lowering the demand reduces the peak memory usage of pipelines at some performance cost. #4447

Features

  • The throttle operator allows for limiting the bandwidth of a pipeline. #4448

Bug Fixes

  • The subscribe operator now delivers metrics more consistently. #4439

v4.19.1

Bug Fixes

  • Activating heartbeats via -X/--set on an amqp saver triggered socket errors if the interval between sent messages was larger than the heartbeat interval. This has been fixed by handling heartbeat communication correctly in such cases. #4428

v4.19.0

Changes

  • The python operator now resolves dependencies with every fresh pipeline run. Just restart your pipeline to upgrade to the latest available versions of your Python modules. #4336

  • The python operator no longer uses pip but rather uv. In case you set custom environment variables for pip you need to exchange those with alternative settings that work with uv. #4336

  • The /serve endpoint now always uses the simple output format for schema definitions. The option use_simple_format is now ignored. #4411

Features

  • The new package operator allows for adding and removing packages, a combination of pipelines and contexts deployed to a node as a set. Nodes load packages installed to <configdir>/tenzir/package/<package-name>/package.yaml on startup. #4344

  • The buffer operator buffers up to the specified number of events in an in-memory buffer. By default, operators in a pipeline run only when their downstream operators want to receive input. This mechanism is called back pressure. The buffer operator effectively breaks back pressure by storing up to the specified number of events in memory, always requesting more input, which allows upstream operators to run uninterruptedly even in case the downstream operators of the buffer are unable to keep up. This allows pipelines to handle data spikes more easily. #4404

Bug Fixes

  • Metrics emitted towards the end of an operator's runtime were sometimes not recorded correctly. This now works reliably. #4404

v4.18.5

Bug Fixes

  • The unflatten operator now correctly preserves field order and overwrites in case of a name conflict. #4405

v4.18.4

Bug Fixes

  • The subscribe operator no longer propagates back pressure to its corresponding publish operators when part of a pipeline that runs in the background, i.e., is not visible on the overview page on app.tenzir.com. An invisible subscriber should never be able to slow down a publisher. #4399

v4.18.3

Changes

  • metrics export now includes an additional field that shows the number of queued events in the pipeline. #4396

Bug Fixes

  • Fixed an issue where null records were sometimes transformed into non-null records with null fields. #4394

  • We fixed an issue that sometimes caused subscribe to fail when multiple publish operators pushed to the same topic at the exact same time. #4394

  • We fixed a bug that caused a potentially unbounded memory usage in export --live, metrics --live, and diagnostics --live. #4396

v4.18.2

Bug Fixes

  • We fixed a memory leak in export that was introduced with v4.18.1. #4389

v4.18.1

Features

  • Setting the tenzir.endpoint option to false now causes the node not to listen for node-to-node connections. Previously, the port was always exposed for other nodes or tenzir processes to connect. #4380

Bug Fixes

  • We fixed a bug that caused deduplicate <fields...> --distance <distance> to sometimes produce incorrect results when followed by where <expr> with an expression that filters on the deduplicated fields. #4379

  • Pipelines that use the every modifier with the export operator no longer terminate after the first run. #4382

v4.18.0

Changes

  • The deprecated vast symlink for the tenzir-ctl binary that offeres backwards compatiblity with versions older than Tenzir v4—when it was called VAST—no longer exists. #4343

  • The deprecated tenzir.db-directory option no longer exists. Use tenzir.state-directory instead. #4343

  • Diagnostics from managed pipelines are now deduplicated, showing each diagnostic at most once for each run. #4348

  • Pipeline activity for pipelines starting with subscribe | where <expr> will no longer report ingress that does not match the provided filter expression. #4349

  • The previously deprecated --low-priority option for the export operator no longer exists. The new --parallel <level> option allows tuning how many worker threads the operator uses at most for querying persisted events. #4365

  • We raised the default and maximum long-polling timeouts for /serve from 2s and 5s to 5s and 10s, respectively. #4370

Features

  • The publish, subscribe, import, export, lookup and enrich operators deliver their own, operator-specific metrics now. #4339 #4365

  • The new tenzir.metrics.api metrics record every API call made to a Tenzir Node. #4368

  • The metrics operator now optionally takes a metric name as an argument. For example, metrics cpu only shows CPU metrics. This is functionally equivalent to metrics | where #schema == "tenzir.metrics.cpu". #4369

  • The tenzir.metrics.platform metrics records every second whether the connection to the Tenzir Platform is working as expected from the node's perspective. #4374

Bug Fixes

  • We fixed a rarely occurring issue in the gelf parser that led to parsing errors for some events. #4341

  • We fixed a rare crash when one of multiple subscribe operators for the same topic disconnected while at least one of the other subscribers was overwhelmed and asked for corresponding publishers to throttle. #4346

  • Pipelines of the form export --live | where <expr> failed to filter with type extractors or concepts. This now works as expected. #4349

  • The SQS connector now honors system proxy settings. #4359

  • We fixed a rare bug that caused the lookup operator to exit unexpectedly when using a high value for the operator's --parallel option. #4363

  • The time parser now accepts the %F, %g, %G, %u, %V, %z, and %Z format specifiers. #4366

  • The tcp connector no longer fails in listen mode when you try to restart it directly after stopping it. #4367

v4.17.4

Bug Fixes

  • We fixed a bug that caused a "Bad file descriptor" error from the python operator, when multiple instances of it were started simultaneously. #4333

  • Shutting down a node no longer sets managed pipelines to the completed state unintentionally. #4334

  • Configured pipelines with retry on error enabled will not trigger an assertion anymore when they fail to launch. #4334

v4.17.3

Features

  • The partitions [<expr>] source operator supersedes show partitions (now deprecated) and supports an optional expression as a positional argument for showing only the partitions that would be considered in export | where <expr>. #4329

Bug Fixes

  • We fixed a bug in Tenzir v4.17.2 that sometimes caused the deletion of on-disk state of configured contexts on startup. #4330

v4.17.2

Bug Fixes

  • We fixed a bug that very rarely caused configured pipelines using contexts to fail starting up because the used context was not available, and similarly to fail shutting down because the used context was no longer available before the pipeline was shut down. #4295 #4322 #4325

  • We fixed an issue where diagnostics were not properly propagated and thus not available to the diagnostics operator. #4326

v4.17.1

Bug Fixes

  • We fixed a bug in Tenzir v4.17 that caused some nodes to error on startup with an "unreachable" error. #4322

v4.17.0

Changes

  • The built-in type aliases timestamp and port for time and uint64, respectively, no longer exist. They were an artifact of Tenzir from before it supported schema inference in most parsers, and did not play well with many operators when used together with inferred types from other parsers. #4299

  • show pipelines now includes "hidden" pipelines run by the by the Tenzir Platform or through the API. These pipelines usually run background jobs, so they're intentionally hidden from the /pipeline/list API. #4309

Features

  • The print operator allows for printing record fields as strings with any format. #4265

  • We fixed bug that caused python-pip to fail when creating the runtime environment for the python operator. #4279

  • The new azure-log-analytics operator makes it possible to upload events to supported or custom tables in Microsoft Azure. #4281

  • Newly created diagnostics returned from the diagnostics now contain a rendered field that contains a rendered form of the diagnostic. To restore the previous behavior, use diagnostics | drop rendered. #4290

  • The enrich operator no longer crashes when it is used to replace a field value with a context value of a different type and the context is not able to provide a substitute for all inputs. #4291

  • The lookup operator gained a new --parallel <level> option controlling the number of partitions the operator is allowed to open at once for retrospective lookups. This can significantly increase performance at the cost of higher resource usage. The option defaults to 3. To restore the previous behavior, set the option to 1. #4300

  • The /pipeline/list API now includes a new ttl field showing the TTL of the pipeline. The remaining TTL moved from ttl_expires_in_ns to a remaining_ttl field, aligning the output of the API with the show pipelines operator. #4314

  • context update <name> for lookup-table contexts now supports per-entry timeouts. The --create-timeout <duration> option sets the time after which lookup table entries expire, and the --update-timeout <duration> option sets the time after which lookup table entries expire if they are not accessed. #5126

Bug Fixes

  • subnet == ip and pattern == string predicates now behave just like ip == subnet and string == pattern predicates. #4280

  • The https and related savers now signal an error when the saver-related upload fails. #4281

  • Errors during pipeline startup are properly propagated instead of being replaced by error: failed to run in some situations. #4288

  • The summarize operator no longer crashes when grpuping by a field of type null, i.e., a field whose type could not be inferred because all of its values were null. #4289

  • We fixed a regression that caused excess CPU usage for some operators when idle. This was most visible with the subscribe, export, metrics, diagnostics, lookup and enrich operators. #4297

  • The -X option for overriding configuration options for librdkafka now works the kafka saver as well. Previously, the option was only exposed for the loader, unlike advertised in the documentation. #4317

v4.16.0

Changes

  • The approximate_median aggregation function is now called median. We found the longer name, despite being more accurate, to be rather unintuitive. #4273

Features

  • The publish operator's topics no longer have to be unique. Instead, any number of pipelines may use the publish operator with the same topic. This enables multi-producer, multi-consumer (MPMC) event routing, where streams of events from different pipelines can now be merged back together in addition to being split. #4270

  • Inter-pipeline data transfer with the publish and subscribe operators is now as fast as intra-pipeline data transfer between pipeline operators and utilizes the same amount of memory. #4270

  • Back pressure now propagates from subscribers back to publishers, i.e., if a pipeline with a subscribe operator is too slow then all pipelines with matching publish operators will be slowed down to a matching speed. This limits the memory usage of publish operators and prevents data loss. #4270

  • The p99, p95, p90, p75, and p50 aggregation functions calculate commonly used percentiles of grouped values in the summarize operator. #4273

  • For lookup-table contexts, the new --erase option for context update enables selective deletion of lookup table entries. #4274

  • The context update operator now defaults the --key <field> option to the first field in the input when no field is explicitly specified. #4274

Bug Fixes

  • Configured and non-configured contexts with the same name will not cause non-deterministic behavior upon loading anymore. The node will shut down instead. #4224

  • Predicates of the form ip == subnet and ip in [subnet1, subnet2, …] now work as expected. #4268

  • The lookup operator now correctly handles subnet keys when using the --retro or --snapshot options. #4268

v4.15.2

Bug Fixes

  • Some Running pipelines were considered Completed when the node shut down, causing them not to start up again automatically when the node restarted. Now, the node only considers pipelines Completed that entered the state on their own before the node's shutdown. #4261

v4.15.1

Bug Fixes

  • We fixed a regression that caused demo nodes not to start for Tenzir v4.15. #4258

v4.15.0

Features

  • The lookup-table context now performs longest-prefix matches when the table key is of type subnet and the to-be-enriched field of type ip. For example, a lookup table with key 10.0.0.0/8 will match when enriching the IP address 10.1.1.1. #4051

  • We now offer an RPM package for RedHat Linux and its derivatives. #4188

  • The /pipeline/update API endpoint now supports updating definitions of existing pipelines. #4196

  • The export, metrics, and diagnostics operators now features a --retro flag. This flag will make the operators first export past events, even when --live is set. Specify both options explicitly to first return past events and then immediately switch into live mode. #4203

  • The sort operator now supports sorting by multiple fields. #4242

  • Pipelines configured as code in the tenzir.yaml configuration file may now contain labels. #4247

  • The https connector supports the new options --skip-peer-verification and --skip-hostname-verification to disable verification of the peer's certificate and verification of the certificate hostname. #4248

  • Use write json --arrays-of-objects to write JSON arrays per batch of events instead of JSON objects per event. #4249

Bug Fixes

  • export --live no longer buffers the last batch of event that was imported, and instead immediately returns all imported events. #4203

  • context inspect will not crash anymore when encountering contexts that contain multi-schema data. #4236

  • Pipelines configured as code no longer always restart with the node. Instead, just like for other pipelines, they only restart when they were running before the node shut down. #4247

v4.14.0

Changes

  • The slice operator now expects its arguments in the form <begin>:<end>, where either the begin or the end value may be omitted. For example, slice 10: returns all but the first 10 events, slice 10:20 returns events 10 to 20 (exclusive), and slice :-10 returns all but the last 10 events. #4211

Features

  • The new mean aggregation function computes the mean of grouped numeric values. #4208

  • The new approximate_median aggregation function computes an approximate median of grouped numeric values using the t-digest algorithm. #4208

  • The new stddev and variance aggregation functions compute the standard deviation and variance of grouped numeric values, respectively. #4208

  • The new collect aggregation function collects a list of all non-null grouped values. Unlike distinct, this function does not remove dulicates and the results may appear in any order. #4208

  • The summarize operator gained two new options: timeout and update-timeout, which enable streaming aggregations. They specifiy the maximum time a bucket in the operator may exist, tracked from the arrival of the first and last event in the bucket, respectively. The timeout is useful to guarantee that events are held back no more than the specified duration, and the update-timeout is useful to finish aggregations earlier in cases where events that would be sorted into the same buckets arrive within the specified duration, allowing results to be seen earlier. #4209

  • The slice operator now supports strides in the form of slice <begin>:<end>:<stride>. Negative strides reverse the event order. The new reverse operator is a short form of slice ::-1 and reverses the event order. #4216

Bug Fixes

  • Paths for s3 and gs connectors are not broken anymore during loading/saving. #4222

  • The syslog parser incorrectly identified a message without hostname and tag as one with hostname and no tag. This resulted in a hostname with a trailing colon, e.g., zscaler-nss:. In such messages, the parser now correctly sets the hostname to null and assigns zscaler-nss as tag/app, without the trailing colon. #4225

v4.13.1

Bug Fixes

  • The slice operator no longer waits for all input to arrive when used with a positive begin and a negative (or missing) end value, which rendered it unusable with infinite inputs. Instead, the operator now yields results earlier. #4210

  • The amqp connector now properly signals more errors caused, for example, by networking issues. This enables pipelines using this connector to trigger their retry behavior. #4212 @satta

  • The node's CPU usage increased ever so slightly with every persisted partition, eventually causing imports and exports to get stuck. This no longer happens. #4214

  • The enrich, drop, extend, replace, and deduplicate operators failed for empty input events. This no longer happens. #4215

v4.13.0

Changes

  • The --clear parameter for clearing lookup table contexts during an update no longer exists. It has been superseded by the more robust context reset operator. #4179

  • The deprecated matcher plugin no longer exists. Use the superior lookup operator and contexts instead. #4187

  • The deprecated tenzir-ctl import and tenzir-ctl export commands no longer exists. They have been fully superseded by pipelines in the form … | import and export | …, respectively. #4187

Features

  • The geoip context now supports loading in a MaxMind database with context load <ctx>. For example, load s3://my-bucket/file.mmdb | context load my-ctx makes the GeoIP context use a remotely stored database. #4158

  • The json parser has a new --precise flag, which ensures that the layout of the emitted events precisely match the input. For example, it guarantees that no additional null fields will be added. This mode is implicitly enabled when using read gelf. #4169

  • The new leef parser supports parsing Log Event Extended Format (LEEF) version 1.0 and 2.0 events, e.g., LEEF:1.0|Microsoft|MSExchange|4.0 SP1|15345|src=192.0.2.0\tdst=172.50.123.1. #4178

  • The cron "<cron expression>" operator modifier executes an operator on a schedule. For example, cron "* */10 * * * MON-FRI" from https://example.org/api queries an endpoint on every 10th minute, Monday through Friday. #4192

Bug Fixes

  • The syslog parser no longer crops messages at unprintable characters, such as tab (\t). #4178

  • The syslog parser no longer eagerly attempts to grab an application name from the content, fixing issues when combined with CEF and LEEF. #4178

  • Some pipelines did not restart on failure. The retry mechanism now works for all kinds of failures. #4184

  • Pipelines that are configured to automatically restart on failure can now be stopped explicitly. Stopping a failed pipeline now always changes its state to the stopped state. #4184

  • Startup failures caused by invalid pipelines or contexts deployed as code in the configuration file sometimes caused the node to hang instead of shutting down with an error message. The node now shuts down as expected when this happens. #4187

  • A permission error caused python operator to fail when it was previously used by another system user with the same set of requirements. There now is a one Python environment per user and set of requirements. #4189

  • The CSV, TSV, and SSV printers no longer erroneously print the header multiple times when more than one event batch of events arrives. #4195

v4.12.2

Features

  • The chart operator now accepts the flags --x-axis-type and --y-axis-type for bar, line, and area charts, with the possible values being log and linear, with linear as the default value. Setting these flags defines the scale (logarithmic or linear) on the Tenzir App chart visualization. #4147

Bug Fixes

  • The python operator now checks for syntax errors on operator start up. #4139

  • Transformations or sinks used with the every operator modifier did not shut down correctly when exhausting their input. This now work as expected. #4166

  • We fixed a bug that prevented restarts of pipelines containing a listening connector under specific circumstances. #4170

  • The retry delay now works for pipelines that fail during startup. #4171

  • The chart operator failed to render a chart when the y-axis was not specified explicitly and the events contained more than two top-level fields. This no longer happens. #4173

  • We accidentally removed the implicit read json from from tcp in Tenzir v4.12. The shortform now works as expected again. #4175

v4.12.1

Bug Fixes

  • We fixed a misconfiguration that caused the publish and subscribe operators not to be available in the statically linked Linux builds. #4149

  • We fixed a crash on startup when selectively enabling or disabling plugins when at least two plugins with dependent plugins were disabled. #4149

v4.12.0

Changes

  • Lines of input containing an invalid syslog messages are now assumed to be a continuation of a message on a previous line, if there's any. #4080

  • The feather format now reads and writes Arrow IPC streams in addition to Feather files, and no longer requires random access to a file to function, making the format streamable with both read feather and write feather. #4089

  • The tenzir-ctl count <expr> command no longer exists. It has long been deprecated and superseded by pipelines of the form export | where <expr> | summarize count(.). #4103

  • The deprecated tenzir-ctl status command and the corresponding /status endpoint no longer exist. They have been superseded by the show and metrics operators that provide more detailed insight. #4103

  • The deprecated tenzir.aging-frequency and tenzir.aging-query options no longer exist. We recommend using the compaction or disk monitor mechanisms instead to delete persisted events. #4103

  • The show pipelines operator and /pipeline/list endpoint no longer include pipeline metrics. We recommend using the metrics operator instead, which offers the same data in a more flexible way. #4114

  • The parquet format more efficiently reads and writes Parquet files. The format is streamable for write parquet. #4116

  • The 0mq connector no longer automatically monitors TCP sockets to wait until at least one remote peer is present. Explicitly pass --monitor for this behavior. #4117

  • In the chart operator, unless otherwise specified, every field but the first one is taken to be a value for the Y-axis, instead of just the second one. #4119

  • If the value for -x/--name or -y/--value is explicitly specified, the other one must now be too. #4119

  • The --title option is removed from chart. Titles can instead be set directly in the web interface. #4119

  • The context create, context reset, context update, and context load operators no return information about the context. Pipelines ending with these operators will now be considered closed, and you will be asked to deploy them in the Explorer. Previously, users commonly added discard after these operators to force this behavior. #4143

Features

  • The new udp connector comes with a loader and saver to read bytes from and write bytes to a UDP socket. #4067

  • The deduplicate operator allows removing duplicate events based on specific fields. #4068

  • The unroll operator transforms an event that contains a list into a sequence of events where each output event contains one of the list elements. #4078

  • The bitz format resembles Tenzir's internal wire format. It enables lossless and quick transfer of events between Tenzir nodes through any connector. #4079

  • Syslog messages spanning multiple lines are now supported. #4080

  • The batch operator gained a new --timeout <duration> option that controls the maixmum latency for withholding events for batching. #4095

  • Stopping a failed pipeline now moves it into the stopped state in the app and through the /pipeline/update API, stopping automatic restarts on failure. #4108

  • Pipelines now restart on failure at most every minute. The new API parameter retry_delay is available in the /pipeline/create, /pipeline/launch, and /pipeline/update APIs to customize this value. For configured pipelines, the new restart-on-error option supersedes the previous autostart.failed option and may be set either to a boolean or to a duration, with the former using the default retry delay and the latter using a custom one. #4108

  • The output of show pipelines and the /pipeline/list API now includes the start time of the pipeline in the field start_time, the newly added retry delay in the field retry_delay, and whether the pipeline is hidden from the overview page on app.tenzir.com in the field hidden. #4108

  • The every <duration> operator modifier now supports all operators, turning blocking operators like tail, sort or summarize into operators that emit events every <duration>. #4109

  • The 0mq connector now supports inproc socket endpoint URLs, allowing you to create arbitrary publish/subscribe topologies within a node. For example, save zmq inproc://foo writes messages to the in-process socket named foo. #4117

  • Some charts supported by the chart operator (bar, line, and area) now have a --position argument, with the possible values of grouped and stacked. #4119

  • You can now define contexts and their creation parameters in the tenzir.contexts section of the configuration file. #4126

  • The show schemas operator lists all unique schemas of events stored at the node. #4131

  • The suricata parser's schema now more accurately reflects Suricata's Eve JSON output, adding many fields that were previously missing. #4133 #4138 @satta

Bug Fixes

  • The schema name of events returned by show contexts sometimes did not match the type of the context. This now works reliably. #4082

  • The tcp connector now supports accepting multiple connections in parallel when used with the from operator, parsing data separately per connection. #4084

  • The python operator no longer discards field that start with an underscore. #4085

  • The python operator no longer deadlocks when given an empty program. #4086

  • The JSON printer previously printed invalid JSON for inf and nan, which means that serve could sometimes emit invalid JSON, which is not handled well by platform/app. Instead, we now emit null. #4087

  • We fixed a bug in the http saver that prevented sending HTTP PUT requests with an empty request body. #4092

  • Pipelines run with the tenzir binary that connected to a Tenzir Node did sometimes not shut down correctly when the node shut down. This now happens reliably. #4093

  • Nodes now shut down with a non-zero exit code when pipelines configured as part of the tenzir.yaml file fail to start, making such configuration errors easier to spot. #4097

  • Tenzir Docker images no longer expose 5158/tcp by default, as this prevented running multiple containers in the same network or in host mode. #4099

  • Empty records and null values of record type are now correctly unflattened. #4104

  • We fixed a bug that caused the explorer to sometimes show 504 Gateway Timeout errors for pipelines where the first result took over two seconds to arrive. #4123

  • The http saver now correctly sets the Content-Length header value for HTTP POST requests. #4134

  • Lookup tables with more than 1M entries failed to load after the node was restarted. This no longer happens. #4137

  • The enrich operator sometimes stopped working when it encountered an event for which the specified fields did not exist. This no longer happens. #4143

v4.11.2

Changes

  • The python operator now requires Python 3.9 (down from Python 3.10) or newer, making it available on more systems. #4073@satta

Bug Fixes

  • The python operator often failed with a 504 Gateway Timeout error on app.tenzir.com when first run. This no longer happens. #4066

v4.11.0

Changes

  • The enrich and lookup operators now include the metadata in every context object to accomodate the new --replace and --separate options. Previously, the metadata was available once in the output field. #4040

  • The mode field in the enrichments returned from the lookup operator is now lookup.retro, lookup.live, or lookup.snapshot depending on the mode. #4040

  • The bloom-filter context now always returns true or null for the context instead of embedding the result in a record with a single data field. #4040

Features

  • The new sqs connector makes it possible to read from and write to Amazon SQS queues. #3819

  • The new files source lists file information for a given directory. #4035

  • The --replace option for the enrich operator causes the input values to be replaced with their context instead of extending the event with the context, resulting in a leaner output. #4040

  • The --separate option makes the enrich and lookup operators handle each field individually, duplicating the event for each relevant field, and returning at most one context per output event. #4040

  • The --yield <field> option allows for adding only a part of a context with the enrich and lookup operators. For example, with a geoip context with a MaxMind country database, --yield registered_country.iso_code will cause the enrichment to only consist of the country's ISO code. #4040

  • The new email saver allows for sending pipeline data via email by connecting to a mail server via SMTP or SMTPS. #4041

  • The every <interval> operator modifier executes a source operator repeatedly. For example, every 1h from http://foo.com/bar polls an endpoint every hour. #4050

  • The new set operator upserts fields, i.e., acts like replace for existing fields and like extend for new fields. It also supports setting the schema name explicitly via set #schema="new-name". #4057

  • The put operator now supports setting the schema name explicitly via put #schema="new-name". #4057

Bug Fixes

  • Source operators that do not quit on their own only freed their resources after they had emitted an additional output, even after the pipeline had already exited. This sometimes caused errors when restarting pipelines, and in rare cases caused Tenzir nodes to hang on shutdown. This no longer happens, and the entire pipeline shuts down at once. #3819

  • drop and select silently ignored all but the first match of the specified type extractors and concepts. This no longer happens. For example, drop :time drops all fields with type time from events. #4040

  • Enriching a field in adjacent events in lookup and enrich with a lookup-table context sometimes crashed when the lookup-table referred to values of different types. #4040

  • The geoip context sometimes returned incorrect values. This no longer happens. #4040

  • from <url> now also works when the url specifies username and password. #4043

  • We fixed a bug that caused every second context to become unavailable after a restarting the node. #4045

  • The compress and to operators no longer fail when compression is unable to further reduce the size of a batch of bytes. #4048

  • Disk metrics now work correctly for deployments with a customized state directory. #4058

v4.10.4

Bug Fixes

  • The http saver now correctly sets the Content-Length header when issuing HTTP requests. #4031

  • Using context load with large context files no longer causes a crash. #4033

  • The sigma operator crashed for some rules when trying to attach the rule to the matched event. This no longer happens. #4034

  • The code passed to the python operator no longer fails to resolve names when the local and global scope are both used. #4036

v4.10.3

Changes

  • Tenzir nodes no longer attempt reconnecting to app.tenzir.com immediately upon failure, but rather wait before reconnecting. #3997

Bug Fixes

  • The lookup operator no longer tries to match internal metrics and diagnostics events. #4028

  • The lookup operator no longer returns events for which none of the provided fields exist. #4028

v4.10.1

Bug Fixes

  • When upgrading from a previous version to Tenzir v4.10 and using configured pipelines for the first time, the node sometimes crashed on startup. This no longer happens. #4020

v4.10.0

Changes

  • We've replaced the tenzir.allow-unsafe-pipelines option with the tenzir.no-location-overrides option with an inverted default. The new option is a less confusing default for new users and more accurately describes what the option does, namely preventing operator locations to be overriden. #3978

  • Nodes now collect CPU, disk, memory, and process metrics every second instead of every ten seconds, improving the usability of metrics with the chart operator. Memory metrics now work as expected on macOS. #3982

Features

  • The enrich and lookup operators now support type extractors, concepts, and comma-separated lists of fields as arguments to --field. #3968

  • The tenzir/tenzir and tenzir/tenzir-node Docker images now run natively on arm64 in addition to amd64. #3989

  • The where operator now supports using and, or, and not as alternatives to &&, ||, and ! in expressions. #3993

  • S3 access and secret keys can now be specified in the S3 plugin's configuration file. #4001

  • We made it possible to set pipelines declaratively in the tenzir.yaml configuration file. #4006

Bug Fixes

  • The top and rare operators now correctly count null and absent values. Previously, they emitted a single event with a count of zero when any null or absent values were included in the input. #3990

  • Tenzir nodes sometimes failed when trying to canonicalize file system paths before opening them when the disk-monitor or compaction rotated them out. This is now handled gracefully. #3994

  • We fixed a problem with the TCP connector that caused pipeline restarts on the same port to fail if running shell or python operators were present. #3998

  • The python operator now works with when using the remote location override. #3999

  • The S3 connector no longer ignores the default credentials provider for the current user when any arguments are specified in the URI explicitly. #4001

  • The sigma operator sometimes crashed when pointed to a non-existent file or directory. This no longer happens. #4010

  • Parsing an invalid syslog message (using the schema syslog.unknown) no longer causes a crash. #4012

v4.9.0

Changes

  • Plugins may now depend on other plugins. Plugins with unmet dependencies are automatically disabled. For example, the lookup and enrich plugins now depend on the context plugin. Run show plugins to see all available plugins and their dependencies. #3877

  • The option tenzir.db-directory is deprecated in favor of the tenzir.state-directory option and will be removed in the future. #3889

  • We removed the tenzir-ctl start subcommand. Users should switch to the tenzir-node command instead, which accepts the same arguments and presents the same command-line interface. #3899

  • The binary format used by contexts for saving on disk on node shutdown is now versioned. A node can support loading of multiple different versions, and automigrate between them. #3945

  • Color escape codes are no longer emitted if NO_COLOR is set to a non-empty value, or when the output device is not a terminal. #3952

Features

  • The new bloom-filter context represents large sets in a space-efficient manner. #3834

  • The lines printer enables simple line-delimited formatting of events. #3847

  • The chart operator adds metadata to the schema of the input events, enabling rendering events as bar, area, line, or pie charts on app.tenzir.com. #3866

  • show pipelines and the /pipeline API endpoints now include created_at and last_modified fields that track the pipeline's creation and last manual modification time, respectively. Pipelines created with older versions of Tenzir will use the start time of the node as their creation time. #3869

  • The structured_data field in RFC 5424-style syslog messages is now parsed and included in the output. #3871

  • Managed pipelines now contain a new total_runs parameter that counts all started runs. The new run field is available in the events delivered by the metrics and diagnostics operators. #3883

  • The new context inspect <context-name> command dumps a specific context's user-provided data, usually the context's content. #3893

  • The openapi source operator generates Tenzir's OpenAPI specification. Use openapi | to ./openapi.yaml to generate a file with the canonical format. #3898

  • The --selector option of the json parser now works with nested fields, and integer fields. #3900

  • The python operator gained a new --file flag that allows loading python code from a file instead of providing it as part of the pipeline definition. #3901

  • The context reset operator allows for clearing the state of a context. #3908

  • The context save and context load operators allow serializing and deserializing the state of a context to/from bytes. #3908

  • The export operator gained a --low-priority option, which causes it to interfere less with regular priority exports at the cost of potentially running slower. #3909

  • The context match events now contain a new field mode that states the lookup mode of this particular match. #3920

  • The enrich operator gained a --filter option, which causes it to exclude enriched events that do not contain a context. #3920

  • When specifying a schema with a field typed as time #unit=<unit>, numeric values will be interpreted as offsets from the epoch. #3927

  • Operator metrics now separately track the time that an operator was paused or running in the time_paused and time_running values in addition to the wall-clock time in time_total. Throughput rates now exclude the paused time from their calculation. #3940

Bug Fixes

  • The xsv parser (and by extension the csv, tsv, and ssv parsers) skipped lines that had a mismatch between the number of values contained and the number of fields defined in the header. Instead, it now fills in null values for missing values and, if the new --auto-expand option is set, also adds new header fields for excess values. #3874

  • The /serve API sometimes returned an empty string for the next continuation token instead of null when there are no further results to fetch. It now consistently returns null. #3885

  • Commas are now allowed as subsecond separators in timestamps in TQL. Previously, only dots were allowed, but ISO 8601 allows for both. #3903

  • We fixed a bug that under rare circumstances led to an indefinite hang when using a high-volume source followed by a slow transformation and a fast sink. #3909

  • Retroactive lookups will now properly terminate when they have finished. #3910

  • We fixed a rare deadlock by changing the internal logger behavior from blocking until the oldest messages were consumed to overwriting them. #3911

  • Invalid schema definitions, where a record contains the same key multiple times, are now detected and rejected. #3929

  • The option to automatically restart on failure did not correctly trigger for pipelines that failed an operator emitted an error diagnostic, a new mechanism for improved error messages introduced with Tenzir v4.8. Such pipelines now restart automatically as expected. #3947

v4.8.2

Bug Fixes

  • The unflatten operator no longer ignores fields that begin or end with the separator. #3814

  • Some idle source operators and loaders, e.g., from tcp://localhost:3000 where no data arrives via TCP, consumed excessive amounts of CPU. This no longer happens. #3865

v4.8.1

Features

  • The velociraptor operator gained a new --profile <profile> option to support multiple configured Velociraptor instances. To opt into using profiles, move your Velociraptor configuration in <configdir>/tenzir/plugin/velociraptor.yaml from <config> to profiles.<profile>.<config>. #3848

Bug Fixes

  • The amqp connector plugin was incorrectly packaged and unavailable in some build configurations. The plugin is now available in all builds. #3854

  • Failing to create the virtualenv of the python operator caused subsequent uses of the python operator to silently fail. This no longer happens. #3854

  • The Debian package now depends on python3-venv, which is required for the python operator to create its virtualenv. #3854

v4.8.0

Changes

  • The fluent-bit source operator no longer performs JSON conversion from Fluent Bit prior to processing an event. Instead, it directly processes the MsgPack data that Fluent Bit uses internally for more robust and quicker event delivery. #3770

Features

  • The http and https loaders now also have savers to send data from a pipeline to a remote API. #3539

  • The http and https connectors have a new flag --form to submit the request body URL-encoded. This also changes the Content-Type header to application/x-www-form-urlencoded. #3539

  • The new timeshift operator adjusts timestamps relative to a given start time, with an optional speedup. #3701

  • The new delay operator delays events relative to a given start time, with an optional speedup. #3701

  • The new lookup operator performs live filtering of the import feed using a context, and translates context updates into historical queries. This effectively enables live and retro matching in a single operator. #3721

  • A Tenzir node will now automatically collect and store metrics about disk, cpu and memory usage of the host machine. #3736

  • The time parser allows parsing datetimes and timestamps from arbitrary strings using a strptime-like format string. #3738

  • The new gelf parser reads a stream of NULL-byte terminated messages in Graylog Extended Log Format (GELF). #3768

  • The csv, tsv, ssv and xsv parsers now support setting the header line manually with the --header option. #3778

  • On Linux systems, the process metrics now have an additional value open_fds showing the number of file descriptors opened by the node. #3784

  • Pipeline states in the /pipeline API will not change upon node shutdown anymore. When a node restarts afterwards, previously running pipelines will continue to run while paused pipelines will load in a stopped state. #3785

  • The metrics operator returns internal metrics events generated in a Tenzir node. Use metrics --live to get a feed of metrics as they are being generated. #3790

  • Concepts are now supported in more places than just the where operator: All operators and concepts that reference fields in events now support them transparently. For example, it is not possible to enrich with a lookup table against all source IP addresses defined in the concept net.src.ip, or to group by destination ports across different schemas with the concept net.dst.port. #3812

  • The csv, tsv, ssv and xsv printers now support not printing a header line with the --no-header option. #3821

  • The new diagnostics operator provides information about diagnostics that a pipeline may encounter during its lifetime. #3828

  • The RFC 3164 syslog parser now supports years in the message timestamp. #3833

Bug Fixes

  • The tenzir/tenzir:latest-slim Docker image now sets a default TENZIR_CACHE_DIRECTORY automatically. #3764

  • When reading Base64-encoded JSON strings with the blob type, = padding is now accepted. #3765

  • The /serve API now displays why a pipeline became unavailable in an error case instead of showing a generic error message. This causes runtime errors in pipelines to show up in the Explorer on app.tenzir.com. #3788

  • export --live sometimes got stuck, failing to deliver events. This no longer happens. #3790

  • The /pipeline/launch endpoint now optimizes the pipeline before starting it. #3801

  • Updating entries of a lookup-table context now overrides values with duplicate keys instead of ignoring them. #3808

  • The zeek-tsv printer incorrectly emitted metadata too frequently. It now only writes opening and closing tags when it encounters a new schema. #3836

  • Failing transfers using http(s) and ftp(s) connectors now properly return an error when the transfer broke. For example, from http://does.not.exist no longer returns silently a success. #3842

v4.7.1

Bug Fixes

  • We fixed a bug that caused operators that caused an increased memory usage for pipelines with slow operators immediately after a faster operator. #3758

  • We fixed a bug that caused short-running pipelines to sometimes hang. #3758

v4.7.0

Changes

  • The show operator now always connects to and runs at a node. Consequently, the version and nics aspects moved into operators of their own. #3521

  • The events created by the RFC 3164 syslog parser no longer has a tag field, but app_name and process_id. #3692

  • Records can now have fields where the name is empty. #3742

Features

  • With the new processes and sockets source operators, you can now get a snapshot of the operating system processes and sockets as pipeline input. #3521

  • The kv parser splits strings into key-value pairs. #3646

  • show partitions now contains location and size of the store, index, and sketch files of a partition, as well the aggregate size at diskusage. #3675

  • The grok parser, for use with the parse operator, enables powerful regex-based string dissection. #3683

  • The syslog parser now supports macOS-style syslog messages. #3692

  • The slice operator keeps a range of events within a half-closed interval. Begin and end of the interval can be specified relative to the first or last event. #3703

  • show operators now shows user-defined operators in addition to operators that ship with Tenzir or as plugins. #3723

  • The tcp connector is now also a saver in addition to a loader. #3727

  • The new geoip context is a built-in that reads MaxMind DB files and uses IP values in events to enrich them with the MaxMind DB geolocation data. #3731

Bug Fixes

  • Pipeline operators blocking in their execution sometimes caused results to be delayed. This is no longer the case. This bug fix also reduces the time to first result for pipelines with many operators. #3743

v4.6.4

Changes

  • When selecting default paths, the tenzir-node will now respect the systemd-provided variables STATE_DIRECTORY, CACHE_DIRECTORY and LOGS_DIRECTORY before falling back to $PWD/tenzir.db. #3714

Features

  • The tenzir.metrics.operator metric now contains an additional duration field with the timespan over which the metric was collected. #3713

Bug Fixes

  • The RFC 3164 syslog parser no longer requires a whitespace after the PRI-field (part in angle brackets in the beginning of a message). #3718

  • The yaml printer no longer crashes when receiving enums. #3719

v4.6.3

Bug Fixes

  • The Debian package sometimes failed to install, and the bundled systemd unit failed to start with Tenzir v4.6.2. This issue no longer exists. #3705

v4.6.0

Changes

  • Ingress and egress metrics for pipelines now indicate whether the pipeline sent/received events to/from outside of the node with a new internal flag. For example, when using the export operator, data is entering the pipeline from within the node, so its ingress is considered internal. #3658

  • We renamed the name of our python package from pytenzir to tenzir. #3660

  • We renamed the --bind option of the zmq connector to --listen. #3664

Features

  • The python operator adds the ability to perform arbitrary event to event transformations with the full power of Python 3. #3592

  • The operators from, to, load, and save support using URLs and file paths directly as their argument. For example, load https://example.com means load https https://example.com, and save local-file.json means save file local-file.json. #3608

  • The new --internal flag for the export operators returns internal events collected by the system, for example pipeline metrics. #3619

  • The syslog parser allows reading both RFC 5424 and RFC 3164 syslog messages. #3645

  • Use show without an aspect to return information about all aspects of a node. #3650

  • The new yield operator extracts nested records with the ability to unfold lists. #3651

  • When using from <URL> and to <URL> without specifying the format explicitly using a read/write argument, the default format is determined by the file extension for all loaders and savers, if possible. Previously, that was only done when using the file loader/saver. Additionally, if the file name would indicate some sort of compression (e.g. .gz), compression and decompression is performed automatically. For example, from https://example.com/myfile.yml.gz is expanded to load https://example.com/myfile.yml.gz | decompress gzip | read yaml automatically. #3653

  • We added a new tcp connector that allows reading raw bytes from TCP or TLS connections. #3664

  • The new, experimental parse operator applies a parser to the string stored in a given field. #3665

  • We optimized the behavior of the 'serve' operator to respond quicker and cause less system load for pipelines that take a long time to generate the first result. The new min_events parameter can be used to implement long-polling behavior for clients of /serve. #3666

  • The new apply operator includes pipelines defined in other files. #3677

  • Use --allow-comments with the xsv parser (incl. csv, tsv, and ssv) to treat lines beginning with '#' as comments. #3681

  • The closed-source context plugin offers a backend functionality for finding matches between data sets. #3684

  • The new lookup-table built-in is a hashtable-based contextualization algorithm that enriches events based on a unique value. #3684

  • The JSON format has a new --arrays-of-objects parameter that allows for parsing a JSON array of JSON objects into an event for each object. #3684

Bug Fixes

  • export --live now respects a subsequent where <expr> instead of silently discarding the filter expression. #3619

  • Using the sort operator with polymorphic inputs no longer leads to a failing assertion under some circumstances. #3655

  • The csv, ssv, and tsv parsers now correctly support empty strings, lists, and null values. #3687

  • The tail operator no longer hangs occasionally. #3687

v4.5.0

Changes

  • The operators drop, pseudonymize, put, extend, replace, rename and select were converted from suffix matching to prefix matching and can therefore address records now. #3616

  • Sparse indexes for time and bool fields are now always enabled, accelerating lookups against them. #3639

Features

  • The api source operator interacts with Tenzir's REST API without needing to spin up a web server, making all APIs accessible from within pipelines. #3630

  • In where <expression>, the types of numeric literals and numeric fields in an equality or relational comparison must no longer match exactly. The literals +42, 42 or 42.0 now compare against fields of types int64, uint64, and double as expected. #3634

  • The import operator now flushes events to disk automatically before returning, ensuring that they are available immediately for subsequent uses of the export operator. #3638

  • Lookups against uint64, int64, double, and duration fields now always use sparse indexes, which improves the performance of export | where <expression> for some expressions. #3639

  • If the summarize operator has no by clause, it now returns a result even if there is no input. For example, summarize num=count(.) returns an event with {"num": 0}. Aggregation functions which do not have a single default value, for example because it would depend on the input type, return null. #3640

  • The tenzir.disable-plugins option is a list of names of plugins and builtins to explicitly forbid from being used in Tenzir. For example, adding shell will prohibit use of the shell operator builtin, and adding kafka will prohibit use of the kafka connector plugin. This allows for a more fine-grained control than the tenzir.allow-unsafe-pipelines option. #3642

Bug Fixes

  • The long option --append for the file and directory savers now works as documented. Previously, only the short option worked correctly. #3629

  • The exporter.* metrics will now be emitted in case the exporter finishes early. #3633

v4.4.0

Changes

  • The string type is now restricted to valid UTF-8 strings. Use blob for arbitrary binary data. #3581

  • The new autostart and autodelete parameters for the pipeline manager supersede the start_when_created and restart_with_node parameters and extend restarting and deletion possibilities for pipelines. #3585

Features

  • The new amqp connector enables interaction with an AMQP 0-9-1 exchange, supporting working with messages as producer (saver) and consumer (loader). #3546

  • The new completed pipeline state in the pipeline manager shows when a pipeline has finished execution. #3554

  • If the node with running pipelines crashes, they will be marked as failed upon restarting. #3554

  • The new velociraptor source supports submitting VQL queries to a Velociraptor server. The operator communicates with the server via gRPC using a mutually authenticated and encrypted connection with client certificates. For example, velociraptor -q "select * from pslist()" lists processes and their running binaries. #3556

  • The output of show partitions includes a new events field that shows the number of events kept in that partition. E.g., the pipeline show partitions | summarize events=sum(events) by schema shows the number of events per schema stored at the node. #3580

  • The new blob type can be used to represent arbitrary binary data. #3581

  • The new ttl_expires_in_ns shows the remaining time to live for a pipeline in the pipeline manager. #3585

  • The new yara operator matches Yara rules on byte streams, producing structured events when rules match. #3594

  • show serves displays all currently active serve IDs in the /serve API endpoint, showing an overview of active pipelines with an on-demand API. #3596

  • The export operator now has a --live option to continuously emit events as they are imported instead of those that already reside in the database. #3612

Bug Fixes

  • Pipelines ending with the serve operator no longer incorrectly exit 60 seconds after transferring all events to the /serve endpoint, but rather wait until all events were fetched from the endpoint. #3562

  • Shutting down a node immediately after starting it now no longer waits for all partitions to be loaded. #3562

  • When using read json, incomplete objects (e.g., due to truncated files) are now reported as an error instead of silently discarded. #3570

  • Having duplicate field names in zeek-tsv data no longer causes a crash, but rather errors out gracefully. #3578

  • The csv parsed (or more generally, the xsv parser) now attempts to parse fields in order to infer their types. #3582

  • A regression in Tenzir v4.3 caused exports to often consider all partitions as candidates. Pipelines of the form export | where <expr> now work as expected again and only load relevant partitions from disk. #3599

  • The long option --skip-empty for read lines now works as documented. #3599

  • The zeek-tsv parser is now able to handle fields of type subnet correctly. #3606

v4.3.0

Changes

  • We made it easier to reuse the default zmq socket endpoint by disabling socket lingering, and thereby immediately relinquishing resources when terminating a ZeroMQ pipeline. Changing the linger period from infinite to 0 no longer buffers pending messages in memory after closing a ZeroMQ socket. #3536

  • Tenzir no longer builds dense indexes for imported events. Dense indexes improved query performance at the cost of a higher memory usage. However, over time the performance improvement became smaller due to other improvements in the underlying storage engine. #3552

  • Tenzir no longer supports models in taxonomies. Since Tenzir v4.0 they were only supported in the deprecated tenzir-ctl export and tenzir-ctl count commands. We plan to bring the functionality back in the future with more powerful expressions in TQL. #3552

Features

  • The yaml format supports reading and writing YAML documents and streams. #3456

  • The new fluent-bit source and sink operator provide and interface to the Fluent Bit ecosystem. The source operator maps to a Fluent Bit input and the sink operator to a Fluent Bit output. #3461 @fluent @bit

  • The performance of the json, suricata and zeek-json parsers was improved. #3503

  • The json parser has a new --raw flag, which uses the raw type of JSON values instead of trying to infer one. For example, strings with ip addresses are given the type string instead of ip. #3503

  • A dedicated null type was added. #3503

  • Empty records are now allowed. Operators that previously discarded empty records (for example, drop) now preserve them. #3503

  • The pipeline manager now supports user-provided labels for pipelines. #3541

Bug Fixes

  • The json, suricata and zeek-json parsers are now more stable and should now parse all inputs correctly. #3503

  • null records are no longer incorrectly transformed into records with null fields anymore. #3503

  • The type of the quic.version field in the built-in suricata.quic schema was fixed. It now is a string instead of an integer. #3533

  • The http loader no longer ignores the value user-provided custom headers. #3535

  • The parquet and feather formats no longer throw assertions during normal usage anymore. #3537

  • The zeek.software does not contain an incomplete version record type anymore. #3538

  • The version.minor type in the zeek.software schema is now a uint64 instead of a double to comply with Zeek's version structure. #3538

  • The web server will not crash when receiving requests during shutdown anymore. #3553

v4.2.0

Changes

  • The long option name --emit-file-header of the pcap parser is now called --emit-file-headers (plural) to streamline it with the nic loader and the new capability to process concatenated PCAP files. #3513

  • The decapsulate operator no longer drops the PCAP packet data in incoming events. #3515

Features

  • The new s3 connector enables the user to import/export file data from/to S3 buckets. #3496

  • The new zmq connector ships with a saver and loader for interacting with ZeroMQ. The loader (source) implements a connecting SUB socket and the saver (sink) a binding PUB socket. The --bind or --connect flags make it possible to control the direction of connection establishment. #3497

  • The new gcs connector enables the user to import/export file data from/to GCS buckets. #3498

  • The new connectors http, https, ftp, and ftps simplify using remote files in pipelines via HTTP(S) and FTP(S). #3499

  • The new lines parser splits its input at newline characters and produces events with a single field containing the line. #3511

  • The pcap parser can now process a stream of concatenated PCAP files. On the command line, you can now parse traces with cat *.pcap | tenzir 'read pcap'. When providing --emit-file-headers, each intermediate file header yields a separate event. #3513

  • The nic loader has a new option --emit-file-headers that prepends a PCAP file header for every batch of bytes that the loader produces, yielding a stream of concatenated PCAP files. #3513

  • You can now write show nics to get a list of network interfaces. Use show nics | select name to a get a list of possible interface names for from nic. #3517

Bug Fixes

  • Pipelines now show up in the "stopped" instead of the "created" state after the node restarted. #3487

v4.1.0

Changes

  • The version operator no longer exists. Use show version to get the Tenzir version instead. The additional information that version produced is now available as show build, show dependencies, and show plugins. #3442

Features

  • The new sigma operator filters its input with Sigma rules and outputs matching events alongside the matched rule. #3138

  • The compress [--level <level>] <codec> and decompress <codec> operators enable streaming compression and decompression in pipelines for brotli, bz2, gzip, lz4, and zstd. #3443

  • The show config aspect returns the configuration currently in use, combining options set in the configuration file, the command-line, environment options. #3455

  • The new show pipelines aspect displays a list of all managed pipelines. #3457

  • The pause action in the /pipeline/update endpoint suspends a pipeline and sets its state to paused. Resume it with the start action. #3471

  • Newly created pipelines are now in a new created rather than stopped state. #3471

  • The rendered field in the pipeline manager diagnostics delivers a displayable version of the diagnostic's error message. #3479

  • Pipelines that encounter an error during execution are now in a new failed rather than stopped state. #3479

Bug Fixes

  • Pipeline operators that create output independent of their input now emit their output instantly instead of waiting for receiving further input. This makes the shell operator more reliable. #3470

  • The show <aspect> operator wrongfully required unsafe pipelines to be allowed for some aspects. This is now fixed. #3470

v4.0.1

Features

  • It is now possible to replace the schema name with replace #schema="new_name". #3451

v4.0.0

Breaking Changes

  • The stop command no longer exists. Shut down VAST nodes using CTRL-C instead. #3166

  • The version command no longer exists. Use the more powerful version pipeline operator instead. #3166

  • The spawn source and spawn sink commands no longer exist. To import data remotely, run a pipeline in the form of remote from … | … | import, and to export data remotely, run a pipeline in the form of export | … | remote to …. #3166

  • The lower-level peer, kill, and send commands no longer exist. #3166

  • The #type meta extractor was renamed to #schema. #3183

  • VAST is now called Tenzir. The tenzir binary replaces vast exec to execute a pipeline. The tenzird binary replaces vast start to start a node. The tenzirctl binary continues to offer all functionality that vast previously offered until all commands have been migrated to pipeline operators. #3187

  • The Debian package for Tenzir replaces previous VAST installations and attempts to migrate existing data from VAST to Tenzir in the process. You can opt-out of this migration by creating the file /var/lib/vast/disable-migration. #3203

  • We removed the rest_endpoint_plugin::prefix() function from the public API of the rest_endpoint_plugin class. For a migration, existing users should prepend the prefix manually to all endpoints defined by their plugin. #3221

  • We changed the default connector of read <format> and write <format> for all formats to stdin and stdout, respectively. #3223

  • We removed language plugins in favor of operator-based integrations. #3223

  • The interface of the operator, loader, parser, printer and saver plugins was changed. #3223

  • The aggregation functions in a summarize operator can now receive only a single extractor instead of multiple ones. #3250

  • The behavior for absent columns and aggregations across multiple schemas was changed. #3250

  • We reimplemented the old pcap plugin as a format. The command tenzir-ctl import pcap no longer works. Instead, the new pcap plugin provides a parser that emits pcap.packet events, as well as a printer that generates a PCAP file when provided with these events. #3263

  • The delete_when_stopped flag was removed from the pipeline manager REST API. #3292

  • We removed the --pretty option from the json printer. This option is now the default. To switch to NDJSON, use -c|--compact-output. #3343

  • The previously deprecated options tenzir.pipelines (replaced with tenzir.operators) and tenzir.pipeline-triggers (no replacement) no longer exist. #3358

  • The previously deprecated deprecated types addr, count, int, and real (replaced with ip, uint64, int64, and double, respectively) no longer exist. #3358

  • The parse and print operators have been renamed to read and write, respectively. The read ... [from ...] and write ... [to ...] operators are not available anymore. If you did not specify a connector, you can continue using read ... and write ... in many cases. Otherwise, use from ... [read ...] and to ... [write ...] instead. #3365

Changes

  • The default port of the web plugin changed from 42001 to 5160. This change avoids collisions from dynamic port allocation on Linux systems. #3180

  • The HTTP method of the status endpoint in the experimental REST API is now POST. #3194

  • We now register extension types as tenzir.ip, tenzir.subnet, and tenzir.enumeration instead of vast.address, vast.subnet, and vast.enumeration, respectively. Arrow schema metadata now has a TENZIR: prefix instead of a VAST: prefix. #3208

  • The debugging utility lsvast no longer exists. Pipelines replace most of its functionality. #3211

  • The default database directory moved from vast.db to tenzir.db. Use the option tenzir.db-directory to manually set the database directory path. #3212

  • We reduced the default batch-timeout from ten seconds to one second in to improve the user experience of interactive pipelines with data aquisition. #3320

  • We reduced the default active-partition-timeout from 5 minutes to 30 seconds to reduce the time until data is persisted. #3320

  • The default interval between two automatic rebuilds is now set to 2 hours and can be configured with the rebuild-interval option. #3377

Features

  • The flatten [<separator>] operator flattens nested data structures by joining nested records with the specified separator (defaults to .) and merging lists. #3018

  • The sink operator import persists events in a VAST node. #3128 #3173 #3193

  • The source operator export retrieves events from a VAST node. #3128 #3173 #3193

  • The repeat operator repeats its input a given number of times. #3128 #3173 #3193

  • The new enumerate operator prepends a column with the row number of the input records. #3142

  • The new sort operator allows for arranging events by field, in ascending and descending order. The current version is still "beta" and has known limitations. #3155

  • The measure operator now returns running totals with the --cumulative option. #3156

  • The --timeout option for the vast status command allows for defining how long VAST waits for components to report their status. The option defaults to 10 seconds. #3162

  • The new pipeline-manager is a proprietary plugin that allows for creating, updating and persisting pipelines. The included RESTful interface allows for easy access and modification of these pipelines. #3164

  • The top <field> operator makes it easy to find the most common values for the given field. Likewise, rare <field> returns the least common values for the given field. #3176

  • The serve operator and /serve endpoint supersede the experimental /query endpoint. The operator is a sink for events, and bridges a pipeline into a RESTful interface from which events can be pulled incrementally. #3180

  • The new #schema_id meta extractor returns a unique fingerprint for the schema. #3183

  • In addition to tenzir "<pipeline>", there now is tenzir -f <file>, which loads and executes the pipeline defined in the given file. #3223

  • The pipeline parser now emits helpful and visually pleasing diagnostics. #3223

  • The summarize operator now works across multiple schemas and can combine events of different schemas into one group. It now also treats missing columns as having null values. #3250

  • The by clause of summarize is now optional. If it is omitted, all events are assigned to the same group. #3250

  • The new nic plugin provides a loader that acquires packets from a network interface card using libpcap. It emits chunks of data in the PCAP file format so that the pcap parser can process them as if packets come from a trace file. #3263

  • The new decapsulate operator processes events of type pcap.packet and emits new events of type tenzir.packet that contain the decapsulated PCAP packet with packet header fields from the link, network, and transport layer. The operator also computes a Community ID. #3263

  • The pipeline manager now accepts empty strings for the optional name. The /create endpoint returns a list of diagnostics if pipeline creation fails, and if start_when_created is set, the endpoint now returns only after the pipeline execution has been fully started. The /list endpoint now returns the diagnostics collected for every pipeline so far. The /delete endpoint now returns an empty object if the request is successful. #3264

  • The zeek-tsv parser sometimes failed to parse Zeek TSV logs, wrongly reporting that the header ended too early. This bug no longer exists. #3291

  • The --schema option for the JSON parser allows for setting the target schema explicitly by name. #3295

  • The unflatten [<separator>] operator unflattens data structures by creating nested records out of fields whose names contain a <separator>. #3304

  • Pipelines executed locally with tenzir now use load - and read json as implicit sources. This complements save - and write json --pretty as implicit sinks. #3329

  • The json printer can now colorize its output by providing the -C|--color-output option, and explicitly disable coloring via -M|--monochrome-output. #3343

  • Pipeline metrics (total ingress/egress amount and average rate per second) are now visible in the pipeline-manager, via the metrics field in the /pipeline/list endpoint result. #3376

  • The directory saver now supports the two arguments -a|--append and -r|--realtime that have the same semantics as they have for the file saver: open files in the directory in append mode (instead of overwriting) and flush the output buffers on every update. #3379

  • The sort operator now also works for ip and enum fields. #3390

  • tenzir --dump-metrics '<pipeline>' prints a performance overview of the executed pipeline on stderr at the end. #3390

  • The batch <limit> operator allows expert users to control batch sizes in pipelines explicitly. #3391

  • The new show source operator makes it possible to gather meta information about Tenzir. For example, the provided introspection capabilities allow for emitting existing formats, connectors, and operators. #3414

  • The json parser now servers as a fallback parser for all files whose extension do not have any default parser in Tenzir. #3422

Bug Fixes

  • Using transformation operators like summarize, sort, put, extend, or replace no longer sometimes crashes after a preceding head or tail operator when referencing a nested field. #3171

  • The tail operator sometimes returned more events than specified. This no longer happens. #3171

  • We fixed a bug in the compation plugin that prevented it from applying the configured weights when it was used for the first time on a database. #3185

  • Starting a remote pipeline with vast exec failed when the node was not reachable yet. Like other commands, executing a pipeline now waits until the node is reachable before starting. #3188

  • Import processes sometimes failed to shut down automatically when the node exited. They now shut down reliably. #3207

v3.1.0

Changes

  • The /query REST endpoint no longer accepts an expression at the start of the query. Instead, use where <expr> | .... #3015

  • As already announced with the VAST v3.0 release, the vast.pipeline-triggers option now no longer functions. The feature will be replaced with node ingress/egress pipelines that fit better into a multi-node model than the previous feature that was built under the assumption of a client/server model with a single server. #3052

  • The bundled systemd service is now configured to restart VAST in case of a failure. #3058

  • The vast.operators section in the configuration file supersedes the now deprecated vast.pipelines section and more generally enables user-defined operators. Defined operators now must use the new, textual format introduced with VAST v3.0, and are available for use in all places where pipelines are supported. #3067

  • The exporter.* metrics no longer exist, and will return in a future release as a more generic instrumentation mechanism for all pipelines. #3076

Features

  • The put operator is the new companion to the existing extend and replace operators. It specifies the output fields exactly, referring either to input fields with an extractor, metadata with a selector, or a fixed value. #3036 #3039 #3089

  • The extend and replace operators now support assigning extractors and selectors in addition to just fixed values. #3036 #3039 #3089

  • The new tail pipeline operator limits all latest events to a specified number. The operator takes the limit as an optional argument, with the default value being 10. #3050

  • The newly-added unique operator removes adjacent duplicates. #3051

  • User-defined operator aliases make pipelines easier to use by enabling users to encapsulate a pipelinea into a new operator. To define a user-defined operator alias, add an entry to the vast.operators section of your configuration. #3064

  • Compaction now makes use of the new pipeline operators, and allows pipelines to be defined inline instead in addition to the now deprecated vast.pipelines configuration section. #3064

  • The count_distinct aggregation function returns the number of distinct, non-null values. #3068

  • The vast export command now accepts the new pipelines as input. Furthermore, vast export <expr> is now deprecated in favor of vast export 'where <expr>'. #3076

  • The new from <connector> [read <format>], read <format> [from <connector>], write <format> [to <connector>], and to <connector> [write <format>] operators bring together a connector and a format to prduce and consume events, respectively. Their lower-level building blocks load <connector>, parse <format>, print <format>, and save <connector> enable expert users to operate on raw byte streams directly. #3079

  • The new file connector enables the user to process file input/output as data in a pipeline. This includes regular files, UDS files as well as stdin/stdout. #3085 #3088 #3097

  • The inspect operator replaces the events or bytes it receives with incremental metrics describing the input. #3093

  • The new directory sink creates a directory with a file for each schema in the specified format. #3098

  • The feather and parquet formats allow for reading and writing events from and to the Apache Feather V2 and Apache Parquet files, respectively. #3103

  • The xsv format enables the user to parse and print character-separated values, with the additional csv, tsv and ssv formats as sane defaults. #3104

  • The cef parser allows for using the CEF format with the new pipelines. #3110

  • The zeek-tsv format parses and prints Zeek's native tab-separated value (TSV) representation of logs. #3114

  • Pipelines may now span across multiple processes. This will enable upcoming operators that do not just run locally in the vast exec process, but rather connect to a VAST node and partially run in that node. The new operator modifiers remote and local allow expert users to control where parts of their pipeline run explicitly, e.g., to offload compute to a more powerful node. Potentially unsafe use of these modifiers requires setting vast.allow-unsafe-pipelines to true in the configuration file. #3119

  • The vast exec command now supports implicit sinks for pipelines that end in events or bytes: write json --pretty and save file -, respectively. #3123

  • The --pretty option for the JSON printer enables multi-line output. #3123

  • The new version source operator yields a single event containing VAST's version and a list of enabled plugins. #3123

Bug Fixes

  • VAST incorrectly handled subnets using IPv6 addresses for which an equivalent IPv4 address existed. This is now done correctly. For example, the query where :ip !in ::ffff:0:0/96 now returns all events containing an IP address that cannot be represented as an IPv4 address. As an additional safeguard, the VAST language no longer allows for constructing subnets for IPv4 addresses with lengths greater than 32. #3060

  • The distinct function silently performed a different operation on lists, returning the distinct non-null elements in the list rather than operating on the list itself. This special-casing no longer exists, and instead the function now operates on the lists itself. This feature will return in the future as unnesting on the extractor level via distinct(field[]), but for now it has to go to make the distinct aggregation function work consistently. #3068

  • Tokens created with vast web generate-token now persist correctly, and work across restarts of VAST. #3086

  • The matcher plugin no longer causes deadlocks through detached matcher clients. #3115

  • The tenzir/vast image now listens on 0.0.0.0:5158 instead of 127.0.0.1:5158 by default, which aligns the behavior with the tenzir/vast-slim image. #3137

  • Some pipelines in compaction caused transformed partitions to be treated as if they were older than they were supposed to be, causing them to be picked up again for deletion too early. This bug no longer exists, and compacted partitions are now considered at most as old as the oldest event before compaction. #3141

  • The rebuilder.partitions.remaining metric sometimes reported wrong values when partitions for at least one schema did not need to be rebuilt. We aligned the metrics with the actual functionality. #3147

v3.0.4

Bug Fixes

  • Automatic rebuilds now correctly consider only outdated or undersized partitions. #3083

  • The --all flag for the rebuild command now consistently causes all partitions to be rebuilt, aligning its functionality with its documentation. #3083

v3.0.3

Changes

  • VAST now depends on the Boost C++ libraries. #3043

  • VAST's rebuilding and compaction features now interfere less with queries. This patch was also backported as VAST v2.4.2 to enable a smoother upgrade from to VAST v3.x. #3047

Features

  • The new vast exec command executes a pipeline locally. It takes a single argument representing a closed pipeline, and immediately executes it. This is the foundation for a new, pipeline-first VAST, in which most operations are expressed as pipelines. #3004#3010

v3.0.2

Bug Fixes

  • VAST no longer miscalculates the rebuild metrics. #3026

v3.0.1

Features

  • The VAST language now supports comments using the familiar /* comment */ notation. This makes it easy to document multi-line pipelines inline. #3011

Bug Fixes

  • VAST no longer crashes when reading an unsupported partition from VAST v1.x. Instead, the partition is ignored correctly. Since v2.2 VAST automatically rebuilds partitions in the background to ensure compatibility. #3018

  • Automatic partition rebuilding both updates partitions with an outdated storage format and merges undersized partitions continuously in the background. This now also works as expected for outdated but not undersized partitions. #3020

v3.0.0

Breaking Changes

  • The match operator ~, its negation !~, and the pattern type no longer exist. Use queries of the forms lhs == /rhs/ and lhs != /rhs/ instead for queries using regular expressions. #2769 #2873

  • vast status does not work anymore with an embedded node (i.e., spawned with the -N parameter). #2771

  • The #field meta extractor no longer exists. Use X != null over #field == "X" to check for existence for the field X. #2776

  • VAST no longer supports reading partitions created with VAST versions older than VAST v2.2. Since VAST v2.2, VAST continuously upgrades old partitions to the most recent internal format while running. #2778 #2797 #2798

  • We removed the broker plugin that enabled direct Zeek 3.x log transfer to VAST. The plugin will return in the future rewritten for Zeek 5+. #2796

  • VAST now ignores the previously deprecated options vast.meta-index-fp-rate, vast.catalog-fp-rate, vast.transforms and vast.transform-triggers. Similarly, setting vast.store-backend to segment-store now results in an error rather than a graceful fallback to the default store. #2832

  • Boolean literals in expressions have a new syntax: true and false replace the old representations T and F. For example, the query suricata.alert.alerted == T is no longer valid; use suricata.alert.alerted == true instead. #2844

  • The builtin types count, int, real, and addr were renamed to uint64, int64, double, and ip respectively. For backwards-compatibility, VAST still supports parsing the old type tokens in schema files. #2864

  • The explore and pivot commands are now unavailable. They will be reintroduced as pipeline operators in the future. #2898

  • For the experimental REST API, the result format of the /export endpoint was modified: The num_events key was renamed to num-events, and the version key was removed. #2899

  • The map type no longer exists: instead of map<T, U>, use the equivalent list<record{ key: T, value: U }>. #2976

  • We renamed the identity operator to pass. #2980

  • The REST API does not contain the /export and /export/with-schemas endpoints anymore. Any previous queries using those endpoints have to be sent to the /query endpoint now. #2990

  • From now on VAST will use TCP port 5158 for its native inter process communication. This change avoids collisions from dynamic port allocation on Linux systems. #2998

  • The non-value literal in expressions has a new syntax: null replaces its old representation nil. For example, the query x != nil is no longer valid; use x != null instead. #2999

  • The vast.pipeline-triggers option is deprecated; while it continues to work as-is, support for it will be removed in the next release. Use the new inline import and export pipelines instead. They will return as more generally applicable node ingress and egress pipelines in the future. #3008

Changes

  • VAST now comes with a role definition for Ansible. You can find it directly in the ansible subdirectory. #2604

  • Building VAST now requires CAF 0.18.7. VAST supports setting advanced options for CAF directly in its configuration file under the caf section. If you were using any of these, compare them against the bundled vast.yaml.example file to see if you need to make any changes. The change has (mostly positive) performance and stability implications throughout VAST, especially in high-load scenarios. #2693 #2923

  • OpenSSL is now a required dependency. #2719

  • vast status no longer shows type registry-related information. Instead, refer to vast show for detailed type metadata information. #2745

  • Blocking imports now imply that ingested data gets persisted to disk before the the vast import process exits. #2807 #2848

  • Plugin names are now case-insensitive. #2832

  • The per-schema event distribution moved from index.statistics.layouts to catalog.schemas, and additionally includes information about the import time range and the number of partitions VAST knows for the schema. The number of events per schema no longer includes events that are yet unpersisted. #2852

  • The bundled Zeek schema no longer includes the _path field included in Zeek JSON. Use #type == "zeek.foo" over _path == "foo" for querying data ingested using vast import zeek-json. #2887

  • We removed the frontend prototype bundled with the web plugin Some parts of the frontend that we have in development are designed to be closed-source, and it is easier to develop at the current development stage in a single repository that is not bound to the release process of VAST itself. An open-source version of the frontend may return in the future. #2922 #2927

Features

  • The cef import format allows for reading events in the Common Event Format (CEF) via vast import cef < cef.log. #2216

  • VAST installations and packages now include Python bindings in a site-package under <install-prefix>/lib/python*/site-packages/vast. #2636

  • VAST now imports Arrow IPC data, which is the same format it already supports for export. #2707

  • The new pseudonymize pipeline operator pseudonymizes IP addresses in user-specified fields. #2719

  • We now offer a tenzir/vast-slim image as an alternative to the tenzir/vast image. The image is minimal in size and supports the same features as the regular image, but does not support building additional plugins against it and mounting in additional plugins. #2742

  • The new /query endpoint for the experimental REST API allows users to receive query data in multiple steps, as opposed to a oneshot export. #2766

  • Queries of the forms :string == /pattern/, field == /pattern/, #type == /pattern/, and their respective negations now work as expected. #2769

  • The /export family of endpoints now accepts an optional pipeline parameter to specify an ad-hoc pipeline that should be applied to the exported data. #2773

  • We changed VAST client processes to attempt connecting to a VAST server multiple times until the configured connection timeout (vast.connection-timeout, defaults to 5 minutes) runs out. A fixed delay between connection attempts (vast.connection-retry-delay, defaults to 3 seconds) ensures that clients to not stress the server too much. Set the connection timeout to zero to let VAST client attempt connecting indefinitely, and the delay to zero to disable the retry mechanism. #2835

  • The JSON export format gained the options --omit-empty-records, --omit-empty-lists, and --omit-empty-maps, which cause empty records, lists, and maps not to be rendered respectively. The options may be combined together with the existing --omit-nulls option. Use --omit-empty to set all four flags at once. #2856

  • The export and import commands now support an optional pipeline string that allows for chaining pipeline operators together and executing such a pipeline on outgoing and incoming data. This feature is experimental and the syntax is subject to change without notice. New operators are only available in the new pipeline syntax, and the old YAML syntax is deprecated. #2877 #2904 #2907

  • The new head and taste operators limit results to the specified number of events. The head operator applies this limit for all events, and the taste operator applies it per schema. Both operators take the limit as an optional argument, with the default value being 10. #2891

  • The experimental web frontend now correctly responds to CORS preflight requests. To configure CORS behavior, the new vast.web.cors-allowed-origin config option can be used. #2944

  • Patterns now support case insensitivity by adding i to the pattern string, e.g. /^\w{3}$/i. #2951

  • The sigma plugin now treats Sigma strings as case-insensitive patterns during the transpilation process. #2974

  • The experimental web plugin now serves its own API specification at the new '/openapi.json' endpoint. #2981

  • Extractors such as x and :T can now expand to the predicates x != null and :T != null, respectively. #2984

Bug Fixes

  • Attempting to connect with thousands of clients around the same time sometimes crashed the VAST server. This no longer occurs. #2693

  • The replace and extend pipeline operators wrongly inferred IP address, subnet, pattern, and map values as strings. They are now inferred correctly. To force a value to be inferred as a string, wrap it inside double quotes. #2768

  • VAST now shuts down instantly when metrics are enabled instead of being held alive for up to the duration of the telemetry interval (10 seconds). #2832

  • The web plugin now reacts correctly to CTRL-C by stopping itself. #2860

  • VAST no longer ignores existing PID lock files on Linux. #2861

  • The start commands specified with the vast.start.commands option are now run aynchronously. This means that commands that block indefinitely will no longer prevent execution of subsequent commands, and allow for correct signal handling. #2868

  • The Zeek TSV reader now respects the schema files in the bundled zeek.schema file, and produces data of the same schema as the Zeek JSON reader. E.g., instead of producing a top-level ip field id.orig_h, the reader now produces a top-level record field id that contains the ip field orig_h, effectively unflattening the data. #2887

  • Pipelines that reduce the number of events do not prevent vast export processes that have a max-events limit from terminating any more. #2896

  • We fixed incorrect printing of human-readable durations in some edge cases. E.g., the value 1.999s was rendered as 1.1s instead of the expected 2.0s. This bug affected the JSON and CSV export formats, and all durations printed in log messages or the status command. #2906

  • Options passed in the caf.openssl section in the configuration file or as VAST_CAF__OPENSSL__* environment variables are no longer ignored. #2908

  • The VAST client will now terminate properly when using the count command with a query which delivers zero results. #2924

  • VAST no longer crashes when it encounters an invalid type expression in a schema. #2977

  • Compaction now retries immediately on failure instead of waiting for the configured scan interval to expire again. #3006

v2.4.2

Changes

  • VAST's rebuilding and compaction features now interfere less with queries. #3047

v2.4.1

Features

  • VAST's Feather store now yields initial results much faster and performs better when running queries affecting a large number of partitions by doing smaller incremental disk reads as needed rather than one large disk read upfront. #2805

v2.4.0

Changes

  • VAST now emits per-component memory usage metrics under the keys index.memory-usage and catalog.memory-usage. #2471

  • We changed the default VAST endpoint from localhost to 127.0.0.1. This ensures the listening address is deterministic and not dependent on the host-specific IPv4 and IPv6 resolution. For example, resolving localhost yields a list of addresses, and if VAST fails to bind on the first (e.g., to due to a lingering socket) it would silently go to the next. Taking name resolution out of the equation fixes such issues. Set the option vast.endpoint to override the default endpoint. #2512

  • Building VAST from source now requires CMake 3.19 or greater. #2582

  • The default store backend of VAST is now feather. Reading from VAST's custom segment-store backend is still transparently supported, but new partitions automatically write to the Apache Feather V2 backend instead. #2587

  • We removed PyVAST from the code base in favor of the new Python bindings. PyVAST continues to work as a thin wrapper around the VAST binary, but will no longer be released alongside VAST. #2674

  • Building VAST from source now requires Apache Arrow 10.0 or newer. #2685

  • The vast dump command is now called vast show. #2686

  • VAST now loads all plugins by default. To revert to the old behavior, explicitly set the vast.plugins option to have no value. #2689

Features

  • We now distribute VAST also as Debian Package with every new release. The Debian package automatically installs a systemd service and creates a vast user for the VAST process. #2513 #2738

  • VAST Cloud has now a MISP plugin that enables to add a MISP instance to the cloud stack. #2548

  • The new experimental web plugin offers a RESTful API to VAST and a bundled web user interface in Svelte. #2567 #2614 #2638 #3681

  • VAST now emits metrics for filesystem access under the keys posix-filesystem.{checks,writes,reads,mmaps,erases,moves}.{successful,failed,bytes}. #2572

  • VAST now ships a Docker Compose file. In particular, the Docker Compose stack now has a TheHive integration that can run VAST queries as a Cortex Analyzer. #2574 #2652

  • VAST Cloud can now expose HTTP services using Cloudflare Access. #2578

  • Rebuilding partitions now additionally rebatches the contained events to vast.import.batch-size events per batch, which accelerates queries against partitions that previously had undersized batches. #2583

  • VAST has a new configuration setting, vast.zstd-compression-level, to control the compression level of the Zstd algorithm used in both the Feather and Parquet store backends. The default level is set by the Apache Arrow library, and for Parquet is no longer explicitly defaulted to 9. #2623

  • VAST has three new metrics: catalog.num-partitions-total, catalog.num-events-total, and ingest-total that sum up all schema-based metrics by their respective schema-based metric counterparts. #2682

  • Queries without acceleration from a dense index run significantly faster, e.g., initial tests show a 2x performance improvement for substring queries. #2730

Bug Fixes

  • VAST now skips unreadable partitions while starting up, instead of aborting the initialization routine. #2515

  • Rebuilding of heterogeneous partition no longer freezes the entire rebuilder on pipeline failures. #2530

  • VAST no longer attempts to hard-kill itself if the shutdown did not finish within the configured grace period. The option vast.shutdown-grace-period no longer exists. We recommend setting TimeoutStopSec=180 in the VAST systemd service definition to restore the previous behavior. #2568

  • The error message on connection failure now contains a correctly formatted target endpoint. #2609

  • The UDS metrics sink no longer deadlocks due to suspended listeners. #2635

  • VAST now ejects partitions from the LRU cache if they fail to load with an I/O error. #2642

  • The systemd service no longer fails if the home directory of the vast user is not in /var/lib/vast. #2734

v2.3.1

Bug Fixes

  • We fixed an indefinite hang that occurred when attempting to apply a pipeline to a partition that is not a valid flatbuffer. #2624

  • VAST now properly regenerates any corrupted, oversized partitions it encounters during startup, provided that the corresponding store files are available. These files could be produced by versions up to and including VAST v2.2, when using configurations with an increased maximum partition size. #2631

v2.3.0

Changes

  • We improved the operability of VAST servers under high load from automated low-priority queries. VAST now considers queries issued with --low-priority, such as automated retro-match queries, with even less priority compared to regular queries (down from 33.3% to 4%) and internal high-priority queries used for rebuilding and compaction (down from 12.5% to 1%). #2484

  • The default value for vast.active-partition-timeout is now 5 minutes (down from 1 hour), causing VAST to persist underful partitions earlier. #2493

  • We split the vast rebuild command into two: vast rebuild start and vast rebuild stop. Rebuild orchestration now runs server-side, and only a single rebuild may run at a given time. We also made it more intuitive to use: --undersized now implies --all, and a new --detached option allows for running rebuilds in the background. #2493

Features

  • VAST's partition indexes are now optional, allowing operators to control the trade-off between disk-usage and query performance for every field. #2430

  • We can now use matchers in AWS using the vast-cloud CLI matcher plugin. #2473

  • VAST now continuously rebuilds outdated and merges undersized partitions in the background. The new option vast.automatic-rebuild controls how many resources to spend on this. To disable this behavior, set the option to 0; the default is 1. #2493

  • Rebuilding now emits metrics under the keys rebuilder.partitions.{remaining,rebuilding,completed}. The vast status rebuild command additionally shows information about the ongoing rebuild. #2493

  • The new vast.connection-timeout option allows for configuring the timeout VAST clients use when connecting to a VAST server. The value defaults to 10s; setting it to a zero duration causes produces an infinite timeout. #2499

Bug Fixes

  • VAST properly processes queries for fields with skip attribute. #2430

  • VAST can now store data in segments bigger than 2GiB in size each. #2449

  • VAST can now store column indexes that are bigger than 2GiB. #2449

  • VAST no longer occasionally prints warnings about no longer available partitions when queries run concurrently to imports. #2500

  • Configuration options representing durations with an associated command-line option like vast.connection-timeout and --connection-timeout were not picked up from configuration files or environment variables. This now works as expected. #2503

  • Partitions now fail early when their stores fail to load from disk, detailing what went wrong in an error message. #2507

  • We changed the way vast-cloud is loading its cloud plugins to make it more explicit. This avoids inconsitent defaults assigned to variables when using core commands on specific plugins. #2510

  • The rebuild command, automatic rebuilds, and compaction are now much faster, and match the performance of the import command for building indexes. #2515

  • Fixed a race condition where the output of a partition transform could be reused before it was fully written to disk, for example when running vast rebuild. #2543

v2.2.0

Changes

  • Metrics for VAST's store lookups now use the keys {active,passive}-store.lookup.{runtime,hits}. The store type metadata field now distinguishes between the various supported store types, e.g., parquet, feather, or segment-store, rather than containing active or passive. #2413

  • The summarize pipeline operator is now a builtin; the previously bundled summarize plugin no longer exists. Aggregation functions in the summarize operator are now plugins, which makes them easily extensible. The syntax of summarize now supports specification of output field names, similar to SQL's AS in SELECT f(x) AS name. #2417

  • The undocumented count pipeline operator no longer exists. #2417

  • The put pipeline operator is now called select, as we've abandoned plans to integrate the functionality of replace into it. #2423

  • The replace pipeline operator now supports multiple replacements in one configuration, which aligns the behavior with other operators. #2423

  • Transforms are now called pipelines. In your configuration, replace transform with pipeline in all keys. #2429

  • An init command was added to vast-cloud to help getting out of inconsistent Terraform states. #2435

Features

  • The new flush command causes VAST to decommission all currently active partitions, i.e., write all active partitions to disk immediately regardless of their size or the active partition timeout. This is particularly useful for testing, or when needing to guarantee in automated scripts that input is available for operations that only work on persisted passive partitions. The flush command returns only after all active partitions were flushed to disk. #2396

  • The summarize operator supports three new aggregation functions: sample takes the first value in every group, distinct filters out duplicate values, and count yields the number of values. #2417

  • The drop pipeline operator now drops entire schemas spcefied by name in the schemas configuration key in addition to dropping fields by extractors in the fields configuration key. #2419

  • The new extend pipeline operator allows for adding new fields with fixed values to data. #2423

  • The cloud execution commands (run-lambda and execute-command) now accept scripts from file-like handles. To improve the usability of this feature, the whole host file system is now mounted into the CLI container. #2446

Bug Fixes

  • VAST will export real values in JSON consistently with at least one decimal place. #2393

  • VAST is now able to detect corrupt index files and will attempt to repair them on startup. #2431

  • The JSON export with --omit-nulls now correctly handles nested records whose first field is null instead of dropping them entirely. #2447

  • We fixed a race condition when VAST crashed while applying a partition transform, leading to data duplication. #2465

  • The rebuild command no longer crashes on failure, and displays the encountered error instead. #2466

  • Missing arguments for the --plugins, --plugin-dirs, and --schema-dirs command line options no longer cause VAST to crash occasionally. #2470

v2.1.0

Changes

  • The mdx-regenerate tool is no longer part of VAST binary releases. #2260

  • Partition transforms now always emit homogenous partitions, i.e., one schema per partition. This makes compaction and aging more efficient. #2277

  • VAST now requires Arrow >= v8.0.0. #2284

  • The vast.store-backend configuration option no longer supports archive, and instead always uses the superior segment-store instead. Events stored in the archive will continue to be available in queries. #2290

  • The vast.use-legacy-query-scheduler option is now ignored because the legacy query scheduler has been removed. #2312

  • VAST will from now on always format time and timestamp values with six decimal places (microsecond precision). The old behavior used a precision that depended on the actual value. This may require action for downstream tooling like metrics collectors that expect nanosecond granularity. #2380

Features

  • The lsvast tool can now print contents of individual .mdx files. It now has an option to print raw Bloom filter contents of string and IP address synopses. #2260

  • The mdx-regenerate tool was renamed to vast-regenerate and can now also regenerate an index file from a list of partition UUIDs. #2260

  • VAST now compresses data with Zstd. When persisting data to the segment store, the default configuration achieves over 2x space savings. When transferring data between client and server processes, compression reduces the amount of transferred data by up to 5x. This allowed us to increase the default partition size from 1,048,576 to 4,194,304 events, and the default number of events in a single batch from 1,024 to 65,536. The performance increase comes at the cost of a ~20% memory footprint increase at peak load. Use the option vast.max-partition-size to tune this space-time tradeoff. #2268

  • VAST now produces additional metrics under the keys ingest.events, ingest.duration and ingest.rate. Each of those gets issued once for every schema that VAST ingested during the measurement period. Use the metadata_schema key to disambiguate the metrics. #2274

  • A new parquet store plugin allows VAST to store its data as parquet files, increasing storage efficiency at the expense of higher deserialization costs. Storage requirements for the VAST database is reduced by another 15-20% compared to the existing segment store with Zstd compression enabled. CPU usage for suricata import is up ~ 10%, mostly related to the more expensive serialization. Deserialization (reading) of a partition is significantly more expensive, increasing CPU utilization by about 100%, and should be carefully considered and compared to the potential reduction in storage cost and I/O operations. #2284

  • The status command now supports filtering by component name. E.g., vast status importer index only shows the status of the importer and index components. #2288

  • VAST emits the new metric partition.events-written when writing a partition to disk. The metric's value is the number of events written, and the metadata_schema field contains the name of the partition's schema. #2302

  • The new rebuild command rebuilds old partitions to take advantage of improvements in newer VAST versions. Rebuilding takes place in the VAST server in the background. This process merges partitions up to the configured max-partition-size, turns VAST v1.x's heterogeneous into VAST v2.x's homogenous partitions, migrates all data to the currently configured store-backend, and upgrades to the most recent internal batch encoding and indexes. #2321

  • PyVAST now supports running client commands for VAST servers running in a container environment, if no local VAST binary is available. Specify the container keyword to customize this behavior. It defaults to {"runtime": "docker", "name": "vast"}. #2334 @KaanSK

  • The csv import gained a new --seperator='x' option that defaults to ','. Set it to '\t' to import tab-separated values, or ' ' to import space-separated values. #2336

  • VAST now compresses on-disk indexes with Zstd, resulting in a 50-80% size reduction depending on the type of indexes used, and reducing the overall index size to below the raw data size. This improves retention spans significantly. For example, using the default configuration, the indexes for suricata.ftp events now use 75% less disk space, and suricata.flow 30% less. #2346

  • The index statistics in vast status --detailed now show the event distribution per schema as a percentage of the total number of events in addition to the per-schema number, e.g., for suricata.flow events under the key index.statistics.layouts.suricata.flow.percentage. #2351

  • The output vast status --detailed now shows metadata from all partitions under the key .catalog.partitions. Additionally, the catalog emits metrics under the key catalog.num-events and catalog.num-partitions containing the number of events and partitions respectively. The metrics contain the schema name in the field metadata_schema and the (internal) partition version in the field metadata_partition-version. #2360 #2363

  • The VAST Cloud CLI can now authenticate to the Tenzir private registry and download the vast-pro image (including plugins such as Matcher). The deployment script can now be configured to use a specific image and can thus be set to use vast-pro. #2415

Bug Fixes

  • VAST no longer crashes when importing map or pattern data annotated with the #skip attribute. #2286

  • The command-line options --plugins, --plugin-dirs, and --schema-dirs now correctly overwrite their corresponding configuration options. #2289

  • VAST no longer crashes when a query arrives at a newly created active partition in the time window between the partition creation and the first event arriving at the partition. #2295

  • Setting the environment variable VAST_ENDPOINT to host:port pair no longer fails on startup with a parse error. #2305

  • VAST no longer hangs when it is shut down while still importing events. #2324

  • VAST now reads the default false-positive rate for sketches correctly. This broke accidentally with the v2.0 release. The option moved from vast.catalog-fp-rate to vast.index.default-fp-rate. #2325

  • The parser for real values now understands scientific notation, e.g., 1.23e+42. #2332

  • The csv import no longer crashes when the CSV file contains columns not present in the selected schema. Instead, it imports these columns as strings. #2336

  • vast export csv now renders enum columns in their string representation instead of their internal numerical representation. #2336

  • The JSON import now treats time and duration fields correctly for JSON strings containing a number, i.e., the JSON string "1654735756" now behaves just like the JSON number 1654735756 and for a time field results in the value 2022-06-09T00:49:16.000Z. #2340

  • VAST will no longer terminate when it can't write any more data to disk. Incoming data will still be accepted but discarded. We encourage all users to enable the disk-monitor or compaction features as a proper solution to this problem. #2376

  • VAST no longer ignores environment variables for plugin-specific options. E.g., the environment variable VAST_PLUGINS__FOO__BAR now correctly refers to the bar option of the foo plugin, i.e., plugins.foo.bar. #2390

  • We improved the mechanism to recover the database state after an unclean shutdown. #2394

v2.0.0

Breaking Changes

  • We removed the experimental vast get command. It relied on an internal unique event ID that was only exposed to the user in debug messages. This removal is a preparatory step towards a simplification of some of the internal workings of VAST. #2121

  • The meta-index is now called the catalog. This affects multiple metrics and entries in the output of vast status, and the configuration option vast.meta-index-fp-rate, which is now called vast.catalog-fp-rate. #2128

  • The command line option --verbosity has the new name --console-verbosity. This synchronizes the CLI interface with the configuration file that solely understands the option vast.console-verbosity. #2178

  • Multiple transform steps now have new names: select is now called where, delete is now called drop, project is now called put, and aggregate is now called summarize. This breaking change is in preparation for an upcoming feature that improves the capability of VAST's query language. #2228

  • The layout-names option of the rename transform step was renamed schemas. The step now additonally supports renaming fields. #2228

Changes

  • VAST ships experimental Terraform scripts to deploy on AWS Lambda and Fargate. #2108

  • We revised the query scheduling logic to exploit synergies when multiple queries run at the same time. In that vein, we updated the related metrics with more accurate names to reflect the new mechanism. The new keys scheduler.partition.materializations, scheduler.partition.scheduled, and scheduler.partition.lookups provide periodic counts of partitions loaded from disk and scheduled for lookup, and the overall number of queries issued to partitions, respectively. The keys query.workers.idle, and query.workers.busy were renamed to scheduler.partition.remaining-capacity, and scheduler.partition.current-lookups. Finally, the key scheduler.partition.pending counts the number of currently pending partitions. It is still possible to opt-out of the new scheduling algorithm with the (deprecated) option --use-legacy-query-scheduler. #2117

  • VAST now requires Apache Arrow >= v7.0.0. #2122

  • VAST's internal data model now completely preserves the nesting of the stored data when using the arrow encoding, and maps the pattern, address, subnet, and enumeration types onto Arrow extension types rather than using the underlying representation directly. This change enables use of the export arrow command without needing information about VAST's type system. #2159

  • Transform steps that add or modify columns now transform the columns in-place rather than at the end, preserving the nesting structure of the original data. #2159

  • The deprecated msgpack encoding no longer exists. Data imported using the msgpack encoding can still be accessed, but new data will always use the arrow encoding. #2159

  • Client commands such as vast export or vast status now create less threads at runtime, reducing the risk of hitting system resource limits. #2193

  • The index section in the status output no longer contains the catalog and catalog-bytes keys. The information is already present in the top-level catalog section. #2233

Features

  • The new vast.index section in the configuration supports adjusting the false-positive rate of first-stage lookups for individual fields, allowing users to optimize the time/space trade-off for expensive queries. #2065

  • VAST now creates one active partition per layout, rather than having a single active partition for all layouts. #2096

  • The new option vast.active-partition-timeout controls the time after which an active partition is flushed to disk. The timeout may hit before the partition size reaches vast.max-partition-size, allowing for an additional temporal control for data freshness. The active partition timeout defaults to 1 hour. #2096

  • The output of vast status now displays the total number of events stored under the key index.statistics.events.total. #2133

  • The disk monitor has new status entries blacklist and blacklist - size containing information about partitions failed to be erased. #2160

  • VAST has now complete support for passing environment variables as alternate path to configuration files. Environment variables have lower precedence than CLI arguments and higher precedence than config files. Variable names of the form VAST_FOO__BAR_BAZ map to vast.foo.bar-baz, i.e., __ is a record separator and _ translates to -. This does not apply to the prefix VAST_, which is considered the application identifier. Only variables with non-empty values are considered. #2162

  • VAST v1.0 deprecated the experimental aging feature. Given popular demand we've decided to un-deprecate it, and to actually implement it on top of the same building blocks the compaction mechanism uses. This means that it is now fully working and no longer considered experimental. #2186

  • The replace transform step now allows for setting values of complex types, e.g., lists or records. #2228

  • The lsvast tool now prints the whole store contents when given a store file as an argument. #2247

Bug Fixes

  • The explore command now properly terminates after the requested number of results are delivered. #2120

  • The count --estimate erroneously materialized store files from disk, resulting in an unneeded performance penalty. VAST now answers approximate count queries by solely consulting the relevant index files. #2146

  • The import zeek command now correctly marks the event timestamp using the timestamp type alias for all inferred schemas. #2155

  • Some queries could get stuck when an importer would time out during the meta index lookup. This race condition no longer exists. #2167

  • We optimized the queue size of the logger for commands other than vast start. Client commands now show a significant reduction in memory usage and startup time. #2176

  • The CSV parser no longer fails when encountering integers when floating point values were expected. #2184

  • The vast(1) man-page is no longer empty for VAST distributions with static binaries. #2190

  • VAST servers no longer accept queries after initiating shutdown. This fixes a potential infinite hang if new queries were coming in faster than VAST was able to process them. #2215

  • VAST no longer sometimes crashes when aging or compaction erase whole partitions. #2227

  • Environment variables for options that specify lists now consistently use comma-separators and respect escaping with backslashes. #2236

  • The JSON import no longer rejects non-string selector fields. Instead, it always uses the textual JSON representation as a selector. E.g., the JSON object {id:1,...} imported via vast import json --selector=id:mymodule now matches the schema named mymodule.1 rather than erroring because the id field is not a string. #2255

  • Transform steps removing all nested fields from a record leaving only empty nested records no longer cause VAST to crash. #2258

  • The query optimizer incorrectly transformed queries with conjunctions or disjunctions with several operands testing against the same string value, leading to missing result. This was rarely an issue in practice before the introduction of homogenous partitions with the v2.0 release. #2264

v1.1.2

Bug Fixes

  • Terminating or timing out exports during the catalog lookup no longer causes query workers to become stuck indefinitely. #2165

v1.1.1

Bug Fixes

  • The disk monitor now correctly continues deleting until below the low water mark after a partition failed to delete. #2160

  • We fixed a rarely occurring race condition caused query workers to become stuck after delivering all results until the corresponding client process terminated. #2160

  • Queries that timed out or were externally terminated while in the query backlog and with more than five unhandled candidate partitions no longer permanently get stuck. #2160

v1.1.0

Changes

  • VAST no longer attempts to intepret query expressions as Sigma rules automatically. Instead, this functionality moved to a dedicated sigma query language plugin that must explicitly be enabled at build time. #2074

  • The msgpack encoding option is now deprecated. VAST issues a warning on startup and automatically uses the arrow encoding instead. A future version of VAST will remove this option entirely. #2087

  • The experimental aging feature is now deprecated. The compaction plugin offers a superset of the aging functionality. #2087

  • Actor names in log messages now have an -ID suffix to make it easier to tell multiple instances of the same actor apart, e.g., exporter-42. #2119

  • We fixed an issue where partition transforms that erase complete partitions trigger an internal assertion failure. #2123

Features

  • The built-in select and project transform steps now correctly handle dropping all rows and columns respectively, effectively deleting the input data. #2064 #2082

  • VAST has a new query language plugin type that allows for adding additional query language frontends. The plugin performs one function: compile user input into a VAST expression. The new sigma plugin demonstrates usage of this plugin type. #2074

  • The new built-in rename transform step allows for renaming event types during a transformation. This is useful when you want to ensure that a repeatedly triggered transformation does not affect already transformed events. #2076

  • The new aggregate transform plugin allows for flexibly grouping and aggregating events. We recommend using it alongside the compaction plugin, e.g., for rolling up events into a more space-efficient representation after a certain amount of time. #2076

Bug Fixes

  • A performance bug in the first stage of query evaluation caused VAST to return too many candidate partitions when querying for a field suffix. For example, a query for the ts field commonly used in Zeek logs also included partitions for netflow.pkts from suricata.netflow events. This bug no longer exists, resulting in a considerable speedup of affected queries. #2086

  • VAST does not lose query capacity when backlogged queries are cancelled any more. #2092

  • VAST now correctly adjusts the index statistics when applying partition transforms. #2097

  • We fixed a bug that potentially resulted in the wrong subset of partitions to be considered during query evaluation. #2103

v1.0.0

Changes

  • Building VAST now requires Arrow >= 6.0. #2033

  • VAST no longer uses calendar-based versioning. Instead, it uses a semantic versioning scheme. A new VERSIONING.md document installed alongside VAST explores the semantics in-depth. #2035

  • Plugins now have a separate version. The build scaffolding installs README.md and CHANGELOG.md files in the plugin source tree root automatically. #2035

Features

  • VAST has a new transform step: project, which keeps the fields with configured key suffixes and removes the rest from the input. At the same time, the delete transform step can remove not only one but multiple fields from the input based on the configured key suffixes. #2000

  • The new --omit-nulls option to the vast export json command causes VAST to skip over fields in JSON objects whose value is null when rendering them. #2004

  • VAST has a new transform step: select, which keeps rows matching the configured expression and removes the rest from the input. #2014

  • The #import_time meta extractor allows for querying events based on the time they arrived at the VAST server process. It may only be used for comparisons with time value literals, e.g., vast export json '#import_time > 1 hour ago' exports all events that were imported within the last hour as NDJSON. #2019

Bug Fixes

  • The index now emits the metrics query.backlog.{low,normal} and query.workers.{idle,busy} reliably. #2032

  • VAST no longer ignores the --schema-dirs option when using --bare-mode. #2046

  • Starting VAST no longer fails if creating the database directory requires creating intermediate directories. #2046

2021.12.16

Changes

  • VAST's internal type system has a new on-disk data representation. While we still support reading older databases, reverting to an older version of VAST will not be possible after this change. Alongside this change, we've implemented numerous fixes and streamlined handling of field name lookups, which now more consistently handles the dot-separator. E.g., the query #field == "ip" still matches the field source.ip, but no longer the field source_ip. The change is also performance-relevant in the long-term: For data persisted from previous versions of VAST we convert to the new type system on the fly, and for newly ingested data we now have near zero-cost deserialization for types, which should result in an overall speedup once the old data is rotated out by the disk monitor. #1888

Features

  • All metrics events now contain the version of VAST. Additionally, VAST now emits startup and shutdown metrics at the start and stop of the VAST server. #1973

  • JSON field selectors are now configurable instead of being hard-coded for Suricata Eve JSON and Zeek Streaming JSON. E.g., vast import json --selector=event_type:suricata is now equivalent to vast import suricata. This allows for easier integration of JSONL data containing a field that indicates its type. #1974

  • Metrics events now optionally contain a metadata field that is a key-value mapping of string to string, allowing for finer-grained introspection. For now this enables correlation of metrics events and individual queries. A set of new metrics for query lookup use this feature to include the query ID. #1987 #1992

Bug Fixes

  • The field-based default selector of the JSON import now correctly matches types with nested record types. #1988

2021.11.18

Changes

  • The max-queries configuration option now works at a coarser granularity. It used to limit the number of queries that could simultaneously retrieve data, but it now sets the number of queries that can be processed at the same time. #1896

  • VAST no longer vendors xxHash, which is now a regular required dependency. Internally, VAST switched its default hash function to XXH3, providing a speedup of up to 3x. #1905

  • Building VAST from source now requires CMake 3.18+. #1914

  • A recently added features allows for exporting everything when no query is provided. We've restricted this to prefer reading a query from stdin if available. Additionally, conflicting ways to read the query now trigger errors. #1917

Features

  • A new 'apply' handler in the index gives plugin authors the ability to apply transforms over entire partitions. Previously, transforms were limited to streams of table slice during import or export. #1887

  • The export command now has a --low-priority option to reduce the priority of the request while query backlogs are being worked down. #1929 #1947

  • The keys query.backlog.normal and query.backlog.low have been added to the metrics output. The values indicate the number of quries that are currently in the backlog. #1942

Bug Fixes

  • The timeout duration to delete partitions has been increased to one minute, reducing the frequency of warnings for hitting this timeout significantly. #1897

  • When reading IPv6 addresses from PCAP data, only the first 4 bytes have been considered. VAST now stores all 16 bytes. #1905

  • Store files now get deleted correctly if the database directory differs from the working directory. #1912

  • Debug builds of VAST no longer segfault on a status request with the --debug option. #1915

  • The suricata.dns schema has been updated to match the currently used EVE-JSON structure output by recent Suricata versions. #1919

  • VAST no longer tries to create indexes for fields of type list<record{...}> as that wasn't supported in the first place. #1933

  • Static plugins are no longer always loaded, but rather need to be explicitly enabled as documented. To restore the behavior from before this bug fix, set vast.plugins: [bundled] in your configuration file. #1959

2021.09.30

Changes

  • The default store backend now is segment-store in order to enable the use of partition transforms in the future. To continue using the (now deprecated) legacy store backend, set vast.store-backend to archive. #1876

  • Example configuration files are now installed to the datarootdir as opposed to the sysconfdir in order to avoid overriding previously installed configuration files. #1880

Features

  • If present in the plugin source directory, the build scaffolding now automatically installs <plugin>.yaml.example files, commenting out every line so the file has no effect. This serves as documentation for operators that can modify the installed file in-place. #1860

  • The broker plugin is now a also writer plugin on top of being already a reader plugin. The new plugin enables exporting query results directly into a a Zeek process, e.g., to write Zeek scripts that incorporate context from the past. Run vast export broker <expr> to ship events via Broker that Zeek dispatches under the event VAST::data(layout: string, data: any). #1863

  • The new tool mdx-regenerate allows operators to re-create all .mdx files in a database directory to the latest file format version while VAST is running. This is useful for advanced users in preparation for version upgrades that bump the format version. #1866

  • Running vat status --detailed now lists all loaded configuration files under system.config-files. #1871

  • The query argument to the export and count commands may now be omitted, which causes the commands to operate on all data. Note that this may be a very expensive operation, so use with caution. #1879

  • The output of vast status --detailed now contains information about queries that are currently processed in the index. #1881

Bug Fixes

  • The status command no longer occasionally contains garbage keys when the VAST server is under high load. #1872

  • Remote sources and sinks are no longer erroneously included in the output of VAST status. #1873

  • The index now correctly cancels pending queries when the requester dies. #1884

  • Import filter expressions now work correctly with queries using field extractors, e.g., vast import suricata 'event_type == "alert"' < path/to/eve.json. #1885

  • Expression predicates of the #field type now produce error messages instead of empty result sets for operations that are not supported. #1886

  • The disk monitor no longer fails to delete segments of particularly busy partitions with the segment-store store backend. #1892

2021.08.26

Changes

  • VAST no longer strips link-layer framing when ingesting PCAPs. The stored payload is the raw PCAP packet. Similarly, vast export pcap now includes a Ethernet link-layer framing, per libpcap's DLT_EN10MB link type. #1797

  • Strings in error or warning log messages are no longer escaped, greatly improving readability of messages containing nested error contexts. #1842

  • VAST now supports building against {fmt} 8 and spdlog 1.9.2, and now requires at least {fmt} 7.1.3. #1846

  • VAST now ships with an updated schema type for the suricata.dhcp event, covering all fields of the extended output. #1854

Features

  • The segment-store store backend works correctly with vast get and vast explore. #1805

  • VAST can now process Eve JSON events of type suricata.packet that Suricata emits when the config option tagged-packets is set and a rule tags a packet using, e.g., tag:session,5,packets;. #1819 #1833

Bug Fixes

  • Previously missing fields of suricata event types are now part of the concept definitions of net.src.ip, net.src.port, net.dst.ip, net.dst.port, net.app, net.proto, net.community_id, net.vlan, and net.packets. #1798

  • Invalid segment files will no longer crash VAST at startup. #1820

  • Plugins in the prebuilt Docker images no longer show unspecified as their version. #1828

  • The configuration options vast.metrics.{file,uds}-sink.path now correctly specify paths relative to the database directory of VAST, rather than the current working directory of the VAST server. #1848

  • The segment-store store backend and built-in transform steps (hash, replace, and delete) now function correctly in static VAST binaries. #1850

  • The output of VAST status now includes status information for sources and sinks spawned in the VAST node, i.e., via vast spawn source|sink <format> rather than vast import|export <format>. #1852

  • In order to align with the GNU Coding Standards, the static binary (and other relocatable binaries) now uses /etc as sysconfdir for installations to /usr/bin/vast. #1856

  • VAST now only switches to journald style logging by default when it is actually supported. #1857

  • The CSV parser now correctly parses quoted fields in non-string types. E.g., "127.0.0.1" in CSV now successfully parsers when a matching schema contains an address type field. #1858

  • The memory counts in the output of vast status now represent bytes consistently, as opposed to a mix of bytes and kilobytes. #1862

2021.07.29

Changes

  • VAST no longer officially supports Debian Buster with GCC-8. In CI, VAST now runs on Debian Bullseye with GCC-10. The provided Docker images now use debian:bullseye-slim as base image. Users that require Debian Buster support should use the provided static builds instead. #1765

  • From now on VAST is compiled with the C++20 language standard. Minimum compiler versions have increased to GCC 10, Clang 11, and AppleClang 12.0.5. #1768

  • The vast binaries in our prebuilt Docker images no longer contain AVX instructions for increased portability. Building the image locally continues to add supported auto-vectorization flags automatically. #1778

  • The following new build options exist: VAST_ENABLE_AUTO_VECTORIZATION enables/disables all auto-vectorization flags, and VAST_ENABLE_SSE_INSTRUCTIONS enables -msse; similar options exist for SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, and AVX2. #1778

Features

  • VAST has new a store_plugin type for custom store backends that hold the raw data of a partition. The new setting vast.store-backend controls the selection of the store implementation, which has a default value is segment-store. This is still an opt-in feature: unless the configuration value is set, VAST defaults to the old implementation. #1720 #1762 #1802

  • VAST now supports import filter expressions. They act as the dual to export query expressions: vast import suricata '#type == "suricata.alert"' < eve.json will import only suricata.alert events, discarding all other events. #1742

  • VAST now comes with a tenzir/vast-dev Docker image in addition to the regular tenzir/vast. The vast-dev image targets development contexts, e.g., when building additional plugins. The image contains all build-time dependencies of VAST and runs as root rather than the vast user. #1749

  • lsvast now prints extended information for hash indexes. #1755

  • The new Broker plugin enables seamless log ingestion from Zeek to VAST via a TCP socket. Broker is Zeek's messaging library and the plugin turns VAST into a Zeek logger node. Use vast import broker to establish a connection to a Zeek node and acquire logs. #1758

  • Plugin versions are now unique to facilitate debugging. They consist of three optional parts: (1) the CMake project version of the plugin, (2) the Git revision of the last commit that touched the plugin, and (3) a dirty suffix for uncommited changes to the plugin. Plugin developers no longer need to specify the version manually in the plugin entrypoint. #1764

  • VAST now supports the arm64 architecture. #1773

  • Installing VAST now includes a vast.yaml.example configuration file listing all available options. #1777

  • VAST now exports per-layout import metrics under the key <reader>.events.<layout-name> in addition to the regular <reader>.events. This makes it easier to understand the event type distribution. #1781

  • The static binary now bundles the Broker plugin. #1789

Bug Fixes

  • Configuring VAST to use CAF's built-in OpenSSL module via the caf.openssl.* options now works again as expected. #1740

  • The the status command now prints information about input and output transformations. #1748

  • A [*** LOG ERROR #0001 ***] error message on startup under Linux no longer occurs. #1754

  • Queries against fields using a #index=hash attribute could have missed some results. Fixing a bug in the offset calculation during bitmap processing resolved the issue. #1755

  • A regression caused VAST's plugins to be loaded in random order, which printed a warning about mismatching plugins between client and server. The order is now deterministic. #1756

  • VAST does not abort JSON imports anymore when encountering something other than a JSON object, e.g., a number or a string. Instead, VAST skips the offending line. #1759

  • Import processes now respond quicker. Shutdown requests are no longer delayed when the server process has busy imports, and metrics reports are now written in a timely manner. #1771

  • Particularly busy imports caused the shutdown of the server process to hang, if import processes were still running or had not yet flushed all data. The server now shuts down correctly in these cases. #1771

  • The static binary no longer behaves differently than the regular build with regards to its configuration directories: system-wide configuration files now reside in <prefix>/etc/vast/vast.yaml rather than /etc/vast/vast.yaml. #1777

  • The VAST_ENABLE_JOURNALD_LOGGING CMake option is no longer ignored. #1780

  • Plugins built against an external libvast no longer require the CMAKE_INSTALL_LIBDIR to be specified as a path relative to the configured CMAKE_INSTALL_PREFIX. This fixes an issue with plugins in separate packages for some package managers, e.g., Nix. #1786

  • The official Docker image and static binary distribution of VAST now produce the correct version output for plugins from the vast version command. #1799

  • The disk budget feature no longer triggers a rare segfault while deleting partitions. #1804 #1809

2021.06.24

Breaking Changes

  • Apache Arrow is now a required dependency. The previously deprecated build option -DVAST_ENABLE_ARROW=OFF no longer exists. #1683

  • VAST no longer loads static plugins by default. Generally, VAST now treats static plugins and bundled dynamic plugins equally, allowing users to enable or disable static plugins as needed for their deployments. #1703

Changes

  • The VAST community chat moved from Gitter to Slack. Join us in the #vast channel for vibrant discussions. #1696

  • The tenzir/vast Docker image bundles the PCAP plugin. #1705

  • VAST merges lists from configuration files. E.g., running VAST with --plugins=some-plugin and vast.plugins: [other-plugin] in the configuration now results in both some-plugin and other-plugin being loaded (sorted by the usual precedence), instead of just some-plugin. #1721 #1734

Features

  • The new option vast.start.commands allows for specifying an ordered list of VAST commands that run after successful startup. The effect is the same as first starting a node, and then using another VAST client to issue commands. This is useful for commands that have side effects that cannot be expressed through the config file, e.g., starting a source inside the VAST server that listens on a socket or reads packets from a network interface. #1699

  • The options vast.plugins and vast.plugin-dirs may now be specified on the command line as well as the configuration. Use the options --plugins and --plugin-dirs respectively. #1703

  • Add the reserved plugin name bundled to vast.plugins to enable load all bundled plugins, i.e., static or dynamic plugins built alongside VAST, or use --plugins=bundled on the command line. The reserved plugin name all causes all bundled and external plugins to be loaded, i.e., all shared libraries matching libvast-plugin-* from the configured vast.plugin-dirs. #1703

  • It's now possible to configure the VAST endpoint as an environment variable by setting VAST_ENDPOINT. This has higher precedence than setting vast.endpoint in configuration files, but lower precedence than passing --endpoint= on the command-line. #1714

  • Plugins load their respective configuration from <configdir>/vast/plugin/<plugin-name>.yaml in addition to the regular configuration file at <configdir>/vast/vast.yaml. The new plugin-specific file does not require putting configuration under the key plugins.<plugin-name>. This allows for deploying plugins without needing to touch the <configdir>/vast/vast.yaml configuration file. #1724

Bug Fixes

  • VAST no longer crashes when querying for string fields with non-string values. Instead, an error message warns the user about an invalid query. #1685

  • Building plugins against an installed VAST no longer requires manually specifying -DBUILD_SHARED_LIBS=ON. The option is now correctly enabled by default for external plugins. #1697

  • The UDS metrics sink continues to send data when the receiving socket is recreated. #1702

  • The vast.log-rotation-threshold option was silently ignored, causing VAST to always use the default log rotation threshold of 10 MiB. The option works as expected now. #1709

  • Additional tags for the tenzir/vast Docker image for the release versions exist, e.g., tenzir/vast:2021.05.27. #1711

  • The import csv command handles quoted fields correctly. Previously, the quotes were part of the parsed value, and field separators in quoted strings caused the parser to fail. #1712

  • Import processes no longer hang on receiving SIGINT or SIGKILL. Instead, they shut down properly after flushing yet to be processed data. #1718

2021.05.27

Breaking Changes

  • Schemas are no longer implicitly shared between sources, i.e., an import process importing data with a custom schema will no longer affect other sources started at a later point in time. Schemas known to the VAST server process are still available to all import processes. We do not expect this change to have a real-world impact, but it could break setups where some sources have been installed on hosts without their own schema files, the VAST server did not have up-to-date schema files, and other sources were (ab)used to provide the latest type information. #1656

  • The configure script was removed. This was a custom script that mimicked the functionality of an autotools-based configure script by writing directly to the cmake cache. Instead, users now must use the cmake and/or ccmake binaries directly to configure VAST. #1657

Changes

  • Building VAST without Apache Arrow via -DVAST_ENABLE_ARROW=OFF is now deprecated, and support for the option will be removed in a future release. As the Arrow ecosystem and libraries matured, we feel confident in making it a required dependency and plan to build upon it more in the future. #1682

Features

  • The new transforms feature allows VAST to apply transformations to incoming and outgoing data. A transform consists of a sequence of steps that execute sequentially, e.g., to remove, overwrite, hash, encrypt data. A new plugin type makes it easy to write custom transforms. #1517 #1656

  • Plugin schemas are now installed to <datadir>/vast/plugin/<plugin>/schema, while VAST's built-in schemas reside in <datadir>/vast/schema. The load order guarantees that plugins are able to reliably override the schemas bundled with VAST. #1608

  • The new option vast export --timeout=<duration> allows for setting a timeout for VAST queries. Cancelled exports result in a non-zero exit code. #1611

  • To enable easier post-processing, the new option vast.export.json.numeric-durations switches JSON output of duration types from human-readable strings (e.g., "4.2m") to numeric (e.g., 252.15) in fractional seconds. #1628

  • The status command now prints the VAST server version information under the version key. #1652

  • The new setting vast.disk-monitor-step-size enables the disk monitor to remove N partitions at once before re-checking if the new size of the database directory is now small enough. This is useful when checking the size of a directory is an expensive operation itself, e.g., on compressed filesystems. #1655

Bug Fixes

  • VAST now correctly refuses to run when loaded plugins fail their initialization, i.e., are in a state that cannot be reasoned about. #1618

  • A recent change caused imports over UDP not to forward its events to the VAST server process. Running vast import -l :<port>/udp <format> now works as expected again. #1622

  • Non-relocatable VAST binaries no longer look for configuration, schemas, and plugins in directories relative to the binary location. Vice versa, relocatable VAST binaries no longer look for configuration, schemas, and plugins in their original install directory, and instead always use paths relative to their binary location. On macOS, we now always build relocatable binaries. Relocatable binaries now work correctly on systems where the libary install directory is lib64 instead of lib. #1624

  • VAST no longer erroneously skips the version mismatch detection between client and server. The check now additionally compares running plugins. #1652

  • Executing VAST's unit test suite in parallel no longer fails. #1659

  • VAST and transform plugins now build without Arrow support again. #1673

  • The delete transform step correctly deletes fields from the layout when running VAST with Arrow disabled. #1673

  • VAST no longer erroneously warns about a version mismatch between client and server when their plugin load order differs. #1679

2021.04.29

Breaking Changes

  • The previously deprecated (#1409) option vast.no-default-schema no longer exists. #1507

  • Plugins configured via vast.plugins in the configuration file can now be specified using either the plugin name or the full path to the shared plugin library. We no longer allow omitting the extension from specified plugin files, and recommend using the plugin name as a more portable solution, e.g., example over libexample and /path/to/libexample.so over /path/to/libexample. #1527

  • The previously deprecated usage (#1354) of format-independent options after the format in commands is now no longer possible. This affects the options listen, read, schema, schema-file, type, and uds for import commands and the write and uds options for export commands. #1529

  • Plugins must define a separate entrypoint in their build scaffolding using the argument ENTRYPOINT to the CMake function VASTRegisterPlugin. If only a single value is given to the argument SOURCES, it is interpreted as the ENTRYPOINT automatically. #1549

  • To avoid confusion between the PCAP plugin and libpcap, which both have a library file named libpcap.so, we now generally prefix the plugin library output names with vast-plugin-. E.g., The PCAP plugin library file is now named libvast-plugin-pcap.so. Plugins specified with a full path in the configuration under vast.plugins must be adapted accordingly. #1593

Changes

  • The metrics for Suricata Eve JSON and Zeek Streaming JSON imports are now under the categories suricata-reader and zeek-reader respectively so they can be distinguished from the regular JSON import, which is still under json-reader. #1498

  • VAST now ships with a schema record type for Suricata's rfb event type. #1499 @satta

  • The exporter.hits metric has been removed. #1514 #1574

  • We upstreamed the Debian patches provided by @satta. VAST now prefers an installed tsl-robin-map>=0.6.2 to the bundled one unless configured with --with-bundled-robin-map, and we provide a manpage for lsvast if pandoc is installed. #1515

  • The Suricata dns schema type now defines the dns.grouped.A field containing a list of all returned addresses. #1531

  • The status output of Analyzer Plugins moved from the importer.analyzers key into the top-level record. #1544

  • The new option --disable-default-config-dirs disables the loading of user and system configuration, schema, and plugin directories. We use this option internally when running integration tests. #1557

  • Building VAST now requires CMake >= 3.15. #1559

  • The VAST community chat moved from Element to Gitter. Join us at gitter.im/tenzir/vast or via Matrix at #tenzir_vast:gitter.im. #1591

Features

  • The disk monitor gained a new vast.start.disk-budget-check-binary option that can be used to specify an external binary to determine the size of the database directory. This can be useful in cases where stat() does not give the correct answer, e.g. on compressed filesystems. #1453

  • The VAST_PLUGIN_DIRS and VAST_SCHEMA_DIRS environment variables allow for setting additional plugin and schema directories separated with : with higher precedence than other plugin and schema directories. #1532 #1541

  • It is now possible to build plugins against an installed VAST. This requires a slight adaptation to every plugin's build scaffolding. The example plugin was updated accordingly. #1532

  • Component Plugins are a new category of plugins that execute code within the VAST server process. Analyzer Plugins are now a specialization of Component Plugins, and their API remains unchanged. #1544 #1547 #1588

  • Reader Plugins and Writer Plugins are a new family of plugins that add import/export formats. The previously optional PCAP format moved into a dedicated plugin. Configure with --with-pcap-plugin and add pcap to vast.plugins to enable the PCAP plugin. #1549

Bug Fixes

  • VAST no longer erroneously tries to load explicitly specified plugins dynamically that are linked statically. #1528

  • Custom commands from plugins ending in start no longer try to write to the server instead of the client log file. #1530

  • Linking against an installed VAST via CMake now correctly resolves VAST's dependencies. #1532

  • VAST no longer refuses to start when any of the configuration file directories is unreadable, e.g., because VAST is running in a sandbox. #1533

  • The CSV reader no longer crashes when encountering nested type aliases. #1534

  • The command-line parser no longer crashes when encountering a flag with missing value in the last position of a command invocation. #1536

  • A bug in the parsing of ISO8601 formatted dates that incorrectly adjusted the time to the UTC timezone has been fixed. #1537

  • Plugin unit tests now correctly load and initialize their respective plugins. #1549

  • The shutdown logic contained a bug that would make the node fail to terminate in case a plugin actor is registered at said node. #1563

  • A race condition in the shutdown logic that caused an assertion was fixed. #1563

  • VAST now correctly builds within shallow clones of the repository. If the build system is unable to determine the correct version from git-describe, it now always falls back to the version of the last release. #1570

  • We fixed a regression that made it impossible to build static binaries from outside of the repository root directory. #1573

  • The VASTRegisterPlugin CMake function now correctly removes the ENTRYPOINT from the given SOURCES, allowing for plugin developers to easily glob for sources again. #1573

  • The exporter.selectivity metric is now 1.0 instead of NaN for idle periods. #1574

  • VAST no longer renders JSON numbers with non-finite numbers as NaN, -NaN, inf, or -inf, resulting in invalid JSON output. Instead, such numbers are now rendered as null. #1574

  • Specifying relative CMAKE_INSTALL_*DIR in the build configuration no longer causes VAST not to pick up system-wide installed configuration files, schemas, and plugins. The configured install prefix is now used correctly. The defunct VAST_SYSCONFDIR, VAST_DATADIR, and VAST_LIBDIR CMake options no longer exist. Use a combination of CMAKE_INSTALL_PREFIX and CMAKE_INSTALL_*DIR instead. #1580

  • Spaces before SI prefixes in command line arguments and configuration options are now generally ignored, e.g., it is now possible to set the disk monitor budgets to 2 GiB rather than 2GiB. #1590

2021.03.25

Breaking Changes

  • The previously deprecated #timestamp extractor has been removed from the query language entirely. Use :timestamp instead. #1399

  • Plugins can now be linked statically against VAST. A new VASTRegisterPlugin CMake function enables easy setup of the build scaffolding required for plugins. Configure with --with-static-plugins or build a static binary to link all plugins built alongside VAST statically. All plugin build scaffoldings must be adapted, older plugins do no longer work. #1445 #1452

Changes

  • The default size of table slices (event batches) that is created from vast import processes has been changed from 1,000 to 1,024. #1396

  • VAST now ships with schema record types for Suricata's mqtt and anomaly event types. #1408 @satta

  • The option vast.no-default-schema is deprecated, as it is no longer needed to override types from bundled schemas. #1409

  • Query latency for expressions that contain concept names has improved substantially. For DB sizes in the TB region, and with a large variety of event types, queries with a high selectivity experience speedups of up to 5x. #1433

  • The zeek-to-vast utility was moved to the tenzir/zeek-vast repository. All options related to zeek-to-vast and the bundled Broker submodule were removed. #1435

  • The type extractor in the expression language now works with type aliases. For example, given the type definition for port from the base schema type port = count, a search for :count will also consider fields of type port. #1446

Features

  • The schema language now supports 4 operations on record types: + combines the fields of 2 records into a new record. <+ and +> are variations of + that give precedence to the left and right operand respectively. - creates a record with the field specified as its right operand removed. #1407 #1487 #1490

  • VAST now supports nested records in Arrow table slices and in the JSON import, e.g., data of type list<record<name: string, age: count>. While nested record fields are not yet queryable, ingesting such data will no longer cause VAST to crash. MessagePack table slices don't support records in lists yet. #1429

Bug Fixes

  • Some non-null pointers were incorrectly rendered as *nullptr in log messages. #1430

  • Data that was ingested before the deprecation of the #timestamp attribute wasn't exported correctly with newer versions. This is now corrected. #1432

  • The JSON parser now accepts data with numerical or boolean values in fields that expect strings according to the schema. VAST converts these values into string representations. #1439

  • A query for a field or field name suffix that matches multiple fields of different types would erroneously return no results. #1447

  • The disk monitor now correctly erases partition synopses from the meta index. #1450

  • The archive, index, source, and sink components now report metrics when idle instead of omitting them entirely. This allows for distinguishing between idle and not running components from the metrics. #1451

  • VAST no longer crashes when the disk monitor tries to calculate the size of the database while files are being deleted. Instead, it will retry after the configured scan interval. #1458

  • Insufficient permissions for one of the paths in the schema-dirs option would lead to a crash in vast start. #1472

  • A race condition during server shutdown could lead to an invariant violation, resulting in a firing assertion. Streamlining the shutdown logic resolved the issue. #1473 #1485

  • Enabling the disk budget feature no longer prevents the server process from exiting after it was stopped. #1495

2021.02.24

Breaking Changes

  • VAST switched to spdlog >= 1.5.0 for logging. For users, this means: The vast.console-format and vast.file-format now must be specified using the spdlog pattern syntax as described here. All settings under caf.logger.* are now ignored by VAST, and only the vast.* counterparts are used for logger configuration. #1223 #1328 #1334 #1390 @a4z

  • VAST now requires {fmt} >= 5.2.1 to be installed. #1330

  • All options in vast.metrics.* had underscores in their names replaced with dashes to align with other options. For example, vast.metrics.file_sink is now vast.metrics.file-sink. The old options no longer work. #1368

  • User-supplied schema files are now picked up from <SYSCONFDIR>/vast/schema and <XDG_CONFIG_HOME>/vast/schema instead of <XDG_DATA_HOME>/vast/schema. #1372

  • The previously deprecated options vast.spawn.importer.ids and vast.schema-paths no longer work. Furthermore, queries spread over multiple arguments are now disallowed instead of triggering a deprecation warning. #1374

  • The special meaning of the #timestamp attribute has been removed from the schema language. Timestamps can from now on be marked as such by using the timestamp type instead. Queries of the form #timestamp <op> value remain operational but are deprecated in favor of :timestamp. Note that this change also affects :time queries, which aren't supersets of #timestamp queries any longer. #1388

Changes

  • Schema parsing now uses a 2-pass loading phase so that type aliases can reference other types that are later defined in the same directory. Additionally, type definitions from already parsed schema dirs can be referenced from schema types that are parsed later. Types can also be redefined in later directories, but a type can not be defined twice in the same directory. #1331

  • The infer command has an improved heuristic for the number types int, count, and real. #1343 #1356 @ngrodzitski

  • The options listen, read, schema, schema-file, type, and uds can from now on be supplied to the import command directly. Similarly, the options write and uds can be supplied to the export command. All options can still be used after the format subcommand, but that usage is deprecated. #1354

  • The query normalizer interprets value predicates of type subnet more broadly: given a subnet S, the parser expands this to the expression :subnet == S || :addr in S. This change makes it easier to search for IP addresses belonging to a specific subnet. #1373

  • The output of vast help and vast documentation now goes to stdout instead of to stderr. Erroneous invocations of vast also print the helptext, but in this case the output still goes to stderr to avoid interference with downstream tooling. #1385

Experimental Features

  • Sigma rules are now a valid format to represent query expression. VAST parses the detection attribute of a rule and translates it into a native query expression. To run a query using a Sigma rule, pass it on standard input, e.g., vast export json < rule.yaml. #1379

Features

  • VAST rotates server logs by default. The new config options vast.disable-log-rotation and vast.log-rotation-threshold can be used to control this behaviour. #1223 #1362

  • The meta index now stores partition synopses in separate files. This will decrease restart times for systems with large databases, slow disks and aggressive readahead settings. A new config setting vast.meta-index-dir allows storing the meta index information in a separate directory. #1330 #1376

  • The JSON import now always relies upon simdjson. The previously experimental --simdjson option to the vast import json|suricata|zeek-json commands no longer exist as the feature is considered stable. #1343 #1356 @ngrodzitski

  • The new options vast.metrics.file-sink.real-time and vast.metrics.uds-sink.real-time enable real-time metrics reporting for the file sink and UDS sink respectively. #1368

  • The type extractor in the expression language now works with user defined types. For example the type port is defined as type port = count in the base schema. This type can now be queried with an expression like :port == 80. #1382

Bug Fixes

  • An ordering issue introduced in #1295 that could lead to a segfault with long-running queries was reverted. #1381

  • A bug in the new simdjson based JSON reader introduced in #1356 could trigger an assertion in the vast import process if an input field could not be converted to the field type in the target layout. This is no longer the case. #1386

2021.01.28

Breaking Changes

  • The new short options -v, -vv, -vvv, -q, -qq, and -qqq map onto the existing verbosity levels. The existing short syntax, e.g., -v debug, no longer works. #1244

  • The GitHub CI changed to Debian Buster and produces Debian artifacts instead of Ubuntu artifacts. Similarly, the Docker images we provide on Docker Hub use Debian Buster as base image. To build Docker images locally, users must set DOCKER_BUILDKIT=1 in the build environment. #1294

Changes

  • VAST preserves nested JSON objects in events instead of formatting them in a flattened form when exporting data with vast export json. The old behavior can be enabled with vast export json --flatten. #1257 #1289

  • vast start prints the endpoint it is listening on when providing the option --print-endpoint. #1271

  • The option vast.schema-paths is renamed to vast.schema-dirs. The old option is deprecated and will be removed in a future release. #1287

Experimental Features

  • VAST features a new plugin framework to support efficient customization points at various places of the data processing pipeline. There exist several base classes that define an interface, e.g., for adding new commands or spawning a new actor that processes the incoming stream of data. The directory examples/plugins/example contains an example plugin. #1208 #1264 #1275 #1282 #1285 #1287 #1302 #1307 #1316

  • VAST relies on simdjson for JSON parsing. The substantial gains in throughput shift the bottleneck of the ingest path from parsing input to indexing at the node. To use the (yet experimental) feature, use vast import json|suricata|zeek-json --simdjson. #1230 #1246 #1281 #1314 #1315 @ngrodzitski

Features

  • The new import zeek-json command allows for importing line-delimited Zeek JSON logs as produced by the json-streaming-logs package. Unlike stock Zeek JSON logs, where one file contains exactly one log type, the streaming format contains different log event types in a single stream and uses an additional _path field to disambiguate the log type. For stock Zeek JSON logs, use the existing import json with the -t flag to specify the log type. #1259

  • VAST queries also accept nanoseconds, microseconds, milliseconds seconds and minutes as units for a duration. #1265

  • The output of vast status contains detailed memory usage information about active and cached partitions. #1297

  • VAST installations bundle a LICENSE.3rdparty file alongside the regular LICENSE file that lists all embedded code that is under a separate license. #1306

Bug Fixes

  • Invalid Arrow table slices read from disk no longer trigger a segmentation fault. Instead, the invalid on-disk state is ignored. #1247

  • Manually specified configuration files may reside in the default location directories. Configuration files can be symlinked. #1248

  • For relocatable installations, the list of schema loading paths does not include a build-time configured path any more. #1249

  • Values in JSON fields that can't be converted to the type that is specified in the schema won't cause the containing event to be dropped any longer. #1250

  • Line based imports correctly handle read timeouts that occur in the middle of a line. #1276

  • Disk monitor quota settings not ending in a 'B' are no longer silently discarded. #1278

  • A potential race condition that could lead to a hanging export if a partition was persisted just as it was scanned no longer exists. #1295

2020.12.16

Breaking Changes

  • The splunk-to-vast script has a new name: taxonomize. The script now also generates taxonomy declarations for Azure Sentinel. #1134

  • CAF-encoded table slices no longer exist. As such, the option vast.import.batch-encoding now only supports arrow and msgpack as arguments. #1142

  • The on-disk format for table slices now supports versioning of table slice encodings. This breaking change makes it so that adding further encodings or adding new versions of existing encodings is possible without breaking again in the future. #1143 #1157 #1160 #1165

  • Archive segments no longer include an additional, unnecessary version identifier. We took the opportunity to clean this up bundled with the other recent breaking changes. #1168

  • The build configuration of VAST received a major overhaul. Inclusion of libvast in other procects via add_subdirectory(path/to/vast) is now easily possible. The names of all build options were aligned, and the new build summary shows all available options. #1175

  • The port type is no longer a first-class type. The new way to represent transport-layer ports relies on count instead. In the schema, VAST ships with a new alias type port = count to keep existing schema definitions in tact. However, this is a breaking change because the on-disk format and Arrow data representation changed. Queries with :port type extractors no longer work. Similarly, the syntax 53/udp no longer exists; use count syntax 53 instead. Since most port occurrences do not carry a known transport-layer type, and the type information exists typically in a separate field, removing port as native type streamlines the data model. #1187

Changes

  • VAST no longer requires you to manually remove a stale PID file from a no-longer running vast process. Instead, VAST prints a warning and overwrites the old PID file. #1128

  • VAST does not produce metrics by default any more. The option --disable-metrics has been renamed to --enable-metrics accordingly. #1137

  • VAST now processes the schema directory recursively, as opposed to stopping at nested directories. #1154

  • The default segment size in the archive is now 1 GiB. This reduces fragmentation of the archive meta data and speeds up VAST startup time. #1166

  • VAST now listens on port 42000 instead of letting the operating system choose the port if the option vast.endpoint specifies an endpoint without a port. To restore the old behavior, set the port to 0 explicitly. #1170

  • The Suricata schemas received an overhaul: there now exist vlan and in_iface fields in all types. In addition, VAST ships with new types for ikev2, nfs, snmp, tftp, rdp, sip and dcerpc. The tls type gets support for the additional sni and session_resumed fields. #1176 #1180 #1186 #1237 @satta

  • Installed schema definitions now reside in <datadir>/vast/schema/types, taxonomy definitions in <datadir>/vast/schema/taxonomy, and concept definitions in <datadir/vast/schema/concepts, as opposed to them all being in the schema directory directly. When overriding an existing installation, you may have to delete the old schema definitions by hand. #1194

  • The zeek export format now strips off the prefix zeek. to ensure full compatibility with regular Zeek output. For all non-Zeek types, the prefix remains intact. #1205

Experimental Features

  • VAST now ships with its own taxonomy and basic concept definitions for Suricata, Zeek, and Sysmon. #1135 #1150

  • The query language now supports models. Models combine a list of concepts into a semantic unit that can be fulfiled by an event. If the type of an event contains a field for every concept in a model. Turn to the documentation for more information. #1185 #1228

  • The expression language gained support for the #field meta extractor. It is the complement for #type and uses suffix matching for field names at the layout level. #1228

Features

  • The new option vast.client-log-file enables client-side logging. By default, VAST only writes log files for the server process. #1132

  • The new option --print-bytesizes of lsvast prints information about the size of certain fields of the flatbuffers inside a VAST database directory. #1149

  • The storage required for index IP addresses has been optimized. This should result in significantly reduced memory usage over time, as well as faster restart times and reduced disk space requirements. #1172 #1200 #1216

  • A new key 'meta-index-bytes' appears in the status output generated by vast status --detailed. #1193

  • The new dump command prints configuration and schema-related information. The implementation allows for printing all registered concepts and models, via vast dump concepts and vast dump models. The flag to --yaml to dump switches from JSON to YAML output, such that it confirms to the taxonomy configuration syntax. #1196 #1233

  • On Linux, VAST now contains a set of built-in USDT tracepoints that can be used by tools like perf or bpftrace when debugging. Initially, we provide the two tracepoints chunk_make and chunk_destroy, which trigger every time a vast::chunk is created or destroyed. #1206

  • Low-selectivity queries of string (in)equality queries now run up to 30x faster, thanks to more intelligent selection of relevant index partitions. #1214

Bug Fixes

  • vast import no longer stalls when it doesn't receive any data for more than 10 seconds. #1136

  • The vast.yaml.example contained syntax errors. The example config file now works again. #1145

  • VAST no longer starts if the specified config file does not exist. #1147

  • The output of vast status --detailed now contains informations about runnings sinks, e.g., vast export <format> <query> processes. #1155

  • VAST no longer blocks when an invalid query operation is issued. #1189

  • The type registry now detects and handles breaking changes in schemas, e.g., when a field type changes or a field is dropped from record. #1195

  • The index now correctly drops further results when queries finish early, thus improving the performance of queries for a limited number of events. #1209

  • The index no longer crashes when too many parallel queries are running. #1210

  • The index no longer causes exporters to deadlock when the meta index produces false positives. #1225

  • The summary log message of vast export now contains the correct number of candidate events. #1228

  • The vast status command does not collect status information from sources and sinks any longer. They were often too busy to respond, leading to a long delay before the command completed. #1234

  • Concepts that reference other concepts are now loaded correctly from their definition. #1236

2020.10.29

Changes

  • The new option import.read-timeout allows for setting an input timeout for low volume sources. Reaching the timeout causes the current batch to be forwarded immediately. This behavior was previously controlled by import.batch-timeout, which now only controls the maximum buffer time before the source forwards batches to the server. #1096

  • VAST will now warn if a client command connects to a server that runs on a different version of the vast binary. #1098

  • Log files are now less verbose because class and function names are not printed on every line. #1107

  • The default database directory moved to /var/lib/vast for Linux deployments. #1116

Experimental Features

  • The query language now comes with support for concepts, the first part of taxonomies. Concepts is a mechanism to unify the various naming schemes of different data formats into a single, coherent nomenclature. #1102

  • A new disk monitor component can now monitor the database size and delete data that exceeds a specified threshold. Once VAST reaches the maximum amount of disk space, the disk monitor deletes the oldest data. The command-line options --disk-quota-high, --disk-quota-low, and --disk-quota-check-interval control the rotation behavior. #1103

Features

  • When running VAST under systemd supervision, it is now possible to use the Type=notify directive in the unit file to let VAST notify the service manager when it becomes ready. #1091

  • The new options vast.segments and vast.max-segment-size control how the archive generates segments. #1103

  • The new script splunk-to-vast converts a splunk CIM model file in JSON to a VAST taxonomy. For example, splunk-to-vast < Network_Traffic.json renders the concept definitions for the Network Traffic datamodel. The generated taxonomy does not include field definitions, which users should add separately according to their data formats. #1121

  • The expression language now accepts records without field names. For example,id == <192.168.0.1, 41824, 143.51.53.13, 25, "tcp"> is now valid syntax and instantiates a record with 5 fields. Note: expressions with records currently do not execute. #1129

Bug Fixes

  • The lookup for schema directories now happens in a fixed order. #1086

  • Sources that receive no or very little input do not block vast status any longer. #1096

  • The vast status --detailed command now correctly shows the status of all sources, i.e., vast import or vast spawn source commands. #1109

  • VAST no longer opens a random public port, which used to be enabled in the experimental VAST cluster mode in order to transparently establish a full mesh. #1110

  • The lsvast tool failed to print FlatBuffers schemas correctly. The output now renders correctly. #1123

2020.09.30

Breaking Changes

  • Data exported in the Apache Arrow format now contains the name of the payload record type in the metadata section of the schema. #1072

  • The persistent storage format of the index now uses FlatBuffers. #863

Changes

  • The JSON export format now renders duration and port fields using strings as opposed to numbers. This avoids a possible loss of information and enables users to re-use the output in follow-up queries directly. #1034

  • The delay between the periodic log messages for reporting the current event rates has been increased to 10 seconds. #1035

  • The global VAST configuration now always resides in <sysconfdir>/vast/vast.conf, and bundled schemas always in <datadir>/vast/schema/. VAST no longer supports reading a vast.conf file in the current working directory. #1036

  • The proprietary VAST configuration file has changed to the more ops-friendly industry standard YAML. This change introduced also a new dependency: yaml-cpp version 0.6.2 or greater. The top-level vast.yaml.example illustrates how the new YAML config looks like. Please rename existing configuration files from vast.conf to vast.yaml. VAST still reads vast.conf but will soon only look for vast.yaml or vast.yml files in available configuration file paths. #1045 #1055 #1059 #1062

  • The options that affect batches in the import command received new, more user-facing names: import.table-slice-type, import.table-slice-size, and import.read-timeout are now called import.batch-encoding, import.batch-size, and import.read-timeout respectively. #1058

  • All configuration options are now grouped into vast and caf sections, depending on whether they affect VAST itself or are handed through to the underlying actor framework CAF directly. Take a look at the bundled vast.yaml.example file for an explanation of the new layout. #1073

  • We refactored the index architecture to improve stability and responsiveness. This includes fixes for several shutdown issues. #863

Experimental Features

  • The vast get command has been added. It retrieves events from the database directly by their ids. #938

Features

  • VAST now supports the XDG base directory specification: The vast.conf is now found at ${XDG_CONFIG_HOME:-${HOME}/.config}/vast/vast.conf, and schema files at ${XDG_DATA_HOME:-${HOME}/.local/share}/vast/schema/. The user-specific configuration file takes precedence over the global configuration file in <sysconfdir>/vast/vast.conf. #1036

  • VAST now merges the contents of all used configuration files instead of using only the most user-specific file. The file specified using --config takes the highest precedence, followed by the user-specific path ${XDG_CONFIG_HOME:-${HOME}/.config}/vast/vast.conf, and the compile-time path <sysconfdir>/vast/vast.conf. #1040

  • VAST now ships with a new tool lsvast to display information about the contents of a VAST database directory. See lsvast --help for usage instructions. #863

  • The output of the status command was restructured with a strong focus on usability. The new flags --detailed and --debug add additional content to the output. #995

Bug Fixes

  • Stalled sources that were unable to generate new events no longer stop import processes from shutting down under rare circumstances. #1058

2020.08.28

Breaking Changes

  • We now bundle a patched version of CAF, with a changed ABI. This means that if you're linking against the bundled CAF library, you also need to distribute that library so that VAST can use it at runtime. The versions are API compatible so linking against a system version of CAF is still possible and supported. #1020

Changes

  • The set type has been removed. Experience with the data model showed that there is no strong use case to separate sets from vectors in the core. While this may be useful in programming languages, VAST deals with immutable data where set constraints have been enforced upstream. This change requires updating existing schemas by changing set<T> to vector<T>. In the query language, the new symbol for the empty map changed from {-} to {}, as it now unambiguously identifies map instances. #1010

  • The vector type has been renamed to list. In an effort to streamline the type system vocabulary, we favor list over vector because it's closer to existing terminology (e.g., Apache Arrow). This change requires updating existing schemas by changing vector<T> to list<T>. #1016

  • The expression field parser now allows the '-' character. #999

Features

  • VAST now writes a PID lock file on startup to prevent multiple server processes from accessing the same persistent state. The pid.lock file resides in the vast.db directory. #1001

  • The default schema for Suricata has been updated to support the suricata.ftp and suricata.ftp_data event types. #1009

  • VAST now prints the location of the configuration file that is used. #1009

Bug Fixes

  • The shutdown process of the server process could potentially hang forever. VAST now uses a 2-step procedure that first attempts to terminate all components cleanly. If that fails, it will attempt a hard kill afterwards, and if that fails after another timeout, the process will call abort(3). #1005

  • When continuous query in a client process terminated, the node did not clean up the corresponding server-side state. This memory leak no longer exists. #1006

  • The port encoding for Arrow-encoded table slices is now host-independent and always uses network-byte order. #1007

  • Importing JSON no longer fails for JSON fields containing null when the corresponding VAST type in the schema is a non-trivial type like vector<string>. #1009

  • Some file descriptors remained open when they weren't needed any more. This descriptor leak has been fixed. #1018

  • When running VAST under heavy load, CAF stream slot ids could wrap around after a few days and deadlock the system. As a workaround, we extended the slot id bit width to make the time until this happens unrealistically large. #1020

  • Incomplete reads have not been handled properly, which manifested for files larger than 2GB. On macOS, writing files larger than 2GB may have failed previously. VAST now respects OS-specific constraints on the maximum block size. #1025

  • VAST would overwrite existing on-disk state data when encountering a partial read during startup. This state-corrupting behavior no longer exists. #1026

  • VAST did not terminate when a critical component failed during startup. VAST now binds the lifetime of the node to all critical components. #1028

  • MessagePack-encoded table slices now work correctly for nested container types. #984

  • A bug in the expression parser prevented the correct parsing of fields starting with either 'F' or 'T'. #999

2020.07.28

Breaking Changes

  • FlatBuffers is now a required dependency for VAST. The archive and the segment store use FlatBuffers to store and version their on-disk persistent state. #972

Changes

  • The suricata schema file contains new type definitions for the stats, krb5, smb, and ssh events. #954 #986

  • VAST now recognizes /etc/vast/schema as an additional default directory for schema files. #980

Features

  • Starting with this release, installing VAST on any Linux becomes significantly easier: A static binary will be provided with each release on the GitHub releases page. #966

  • We open-sourced our MessagePack-based table slice implementation, which provides a compact row-oriented encoding of data. This encoding works well for binary formats (e.g., PCAP) and access patterns that involve materializing entire rows. The MessagePack table slice is the new default when Apache Arrow is unavailable. To enable parsing into MessagePack, you can pass --table-slice-type=msgpack to the import command, or set the configuration option import.table-slice-type to 'msgpack'. #975

Bug Fixes

  • The PCAP reader now correctly shows the amount of generated events. #954

2020.06.25

Changes

  • The options system.table-slice-type and system.table-slice-size have been removed, as they duplicated import.table-slice-type and import.table-slice-size respectively. #908 #951

  • The JSON export format now renders timestamps using strings instead of numbers in order to avoid possible loss of precision. #909

  • The default table slice type has been renamed to caf. It has not been the default when built with Apache Arrow support for a while now, and the new name more accurately reflects what it is doing. #948

Experimental Features

  • VAST now supports aging out existing data. This feature currently only concerns data in the archive. The options system.aging-frequency and system.aging-query configure a query that runs on a regular schedule to determine which events to delete. It is also possible to trigger an aging cycle manually. #929

Features

  • VAST now has options to limit the amount of results produced by an invocation of vast explore. #882

  • The import json command's type restrictions are more relaxed now, and can additionally convert from JSON strings to VAST internal data types. #891

  • VAST now supports /etc/vast/vast.conf as an additional fallback for the configuration file. The following file locations are looked at in order: Path specified on the command line via --config=path/to/vast.conf, vast.conf in current working directory, ${INSTALL_PREFIX}/etc/vast/vast.conf, and /etc/vast/vast.conf. #898

  • The import command gained a new --read-timeout option that forces data to be forwarded to the importer regardless of the internal batching parameters and table slices being unfinished. This allows for reducing the latency between the import command and the node. The default timeout is 10 seconds. #916

  • The output format for the explore and pivot commands can now be set using the explore.format and pivot.format options respectively. Both default to JSON. #921

  • The meta index now uses Bloom filters for equality queries involving IP addresses. This especially accelerates queries where the user wants to know whether a certain IP address exists in the entire database. #931

Bug Fixes

  • A use after free bug would sometimes crash the node while it was shutting down. #896

  • A bogus import process that assembled table slices with a greater number of events than expected by the node was able to lead to wrong query results. #908

  • The export json command now correctly unescapes its output. #910

  • VAST now correctly checks for control characters in inputs. #910

2020.05.28

Changes

  • The command line flag for disabling the accountant has been renamed to --disable-metrics to more accurately reflect its intended purpose. The internal vast.statistics event has been renamed to vast.metrics. #870

  • Spreading a query over multiple command line arguments in commands like explore/export/pivot/etc. has been deprecated. #878

Experimental Features

  • Added a new explore command to VAST that can be used to show data records within a certain time from the results of a query. #873#877

Features

  • All input parsers now support mixed \n and \r\n line endings. #865

  • When importing events of a new or updated type, VAST now only requires the type to be specified once (e.g., in a schema file). For consecutive imports, the event type does not need to be specified again. A list of registered types can now be viewed using vast status under the key node.type-registry.types. #875

  • When importing JSON data without knowing the type of the imported events a priori, VAST now supports automatic event type deduction based on the JSON object keys in the data. VAST selects a type iff the set of fields match a known type. The --type / -t option to the import command restricts the matching to the set of types that share the provided prefix. Omitting -t attempts to match JSON against all known types. If only a single variant of a type is matched, the import falls back to the old behavior and fills in nil for mismatched keys. #875

  • VAST now prints a message when it is waiting for user input to read a query from a terminal. #878

  • VAST now ships with a schema suitable for Sysmon import. #886

Bug Fixes

  • The parser for Zeek tsv data used to ignore attributes that were defined for the Zeek-specific types in the schema files. It has been modified to respect and prefer the specified attributes for the fields that are present in the input data. #847

  • Fixed a bug that caused vast import processes to produce 'default' table slices, despite having the 'arrow' type as the default. #866

  • Fixed a bug where setting the logger.file-verbosity in the config file would not have an effect. #866

2020.04.29

Changes

  • The index specific options max-partition-size, max-resident-partitions, max-taste-partitions, and max-queries can now be specified on the command line when starting a node. #728

  • The default bind address has been changed from :: to localhost. #828

  • The option --skip-candidate-checks / -s for the count command was renamed to --estimate / -e. #843

Features

  • Packet drop and discard statistics are now reported to the accountant for PCAP import, and are available using the keys pcap-reader.recv, pcap-reader.drop, pcap-reader.ifdrop, pcap-reader.discard, and pcap-reader.discard-rate in the vast.statistics event. If the number of dropped packets exceeds a configurable threshold, VAST additionally warns about packet drops on the command line. #827 #844

  • Bash autocompletion for vast is now available via the autocomplete script located at scripts/vast-completions.bash in the VAST source tree. #833

Bug Fixes

  • Archive lookups are now interruptible. This change fixes an issue that caused consecutive exports to slow down the node, which improves the overall performance for larger databases considerably. #825

  • Fixed a crash when importing data while a continuous export was running for unrelated events. #830

  • Queries of the form x != 80/tcp were falsely evaluated as x != 80/? && x != ?/tcp. (The syntax in the second predicate does not yet exist; it only illustrates the bug.) Port inequality queries now correctly evaluate x != 80/? || x != ?/tcp. E.g., the result now contains values like 80/udp and 80/?, but also 8080/tcp. #834

  • Fixed a bug that could cause stalled input streams not to forward events to the index and archive components for the JSON, CSV, and Syslog readers, when the input stopped arriving but no EOF was sent. This is a follow-up to #750. A timeout now ensures that that the readers continue when some events were already handled, but the input appears to be stalled. #835

  • For some queries, the index evaluated only a subset of all relevant partitions in a non-deterministic manner. Fixing a violated evaluation invariant now guarantees deterministic execution. #842

  • The stop command always returned immediately, regardless of whether it succeeded. It now blocks until the remote node shut down properly or returns an error exit code upon failure. #849

2020.03.26

Changes

  • The VERBOSE log level has been added between INFO and DEBUG. This level is enabled at build time for all build types, making it possible to get more detailed logging output from release builds. #787

  • The internal statistics event type vast.account has been renamed to vast.statistics for clarity. #789

  • The command line options prefix for changing CAF options was changed from --caf# to --caf.. #797

  • The log folder vast.log/ in the current directory will not be created by default any more. Users must explicitly set the system.file-verbosity option if they wish to keep the old behavior. #803

  • The config option system.log-directory was deprecated and replaced by the new option system.log-file. All logs will now be written to a single file. #806

Features

  • The new vast import syslog command allows importing Syslog messages as defined in RFC5424. #770

  • The option --disable-community-id has been added to the vast import pcap command for disabling the automatic computation of Community IDs. #777

  • Continuous export processes can now be stopped correctly. Before this change, the node showed an error message and the exporting process exited with a non-zero exit code. #779

  • The short option -c for setting the configuration file has been removed. The long option --config must now be used instead. This fixed a bug that did not allow for -c to be used for continuous exports. #781

  • Expressions must now be parsed to the end of input. This fixes a bug that caused malformed queries to be evaluated until the parser failed. For example, the query #type == "suricata.http" && .dest_port == 80 was erroneously evaluated as #type == "suricata.http" instead. #791

  • The hash index has been re-enabled after it was outfitted with a new high-performance hash map implementation that increased performance to the point where it is on par with the regular index. #796

  • An under-the-hood change to our parser-combinator framework makes sure that we do not discard possibly invalid input data up the the end of input. This uncovered a bug in our MRT/bgpdump integrations, which have thus been disabled (for now), and will be fixed at a later point in time. #808

2020.02.27

Changes

  • The build system will from now on try use the CAF library from the system, if one is provided. If it is not found, the CAF submodule will be used as a fallback. #740

  • VAST now supports (and requires) Apache Arrow >= 0.16. #751

  • The option --historical for export commands has been removed, as it was the default already. #754

  • The option --directory has been replaced by --db-directory and log-directory, which set directories for persistent state and log files respectively. The default log file path has changed from vast.db/log to vast.log. #758

  • Hash indices have been disabled again due to a performance regression. #765

Features

  • For users of the Nix package manager, expressions have been added to generate reproducible development environments with nix-shell. #740

Bug Fixes

  • Continuously importing events from a Zeek process with a low rate of emitted events resulted in a long delay until the data would be included in the result set of queries. This is because the import process would buffer up to 10,000 events before sending them to the server as a batch. The algorithm has been tuned to flush its buffers if no data is available for more than 500 milliseconds. #750

2020.01.31

Changes

  • The import pcap command no longer takes interface names via --read,-r, but instead from a separate option named --interface,-i. This change has been made for consistency with other tools. #641

  • Record field names can now be entered as quoted strings in the schema and expression languages. This lifts a restriction where JSON fields with whitespaces or special characters could not be ingested. #685

  • Build configuration defaults have been adapated for a better user experience. Installations are now relocatable by default, which can be reverted by configuring with --without-relocatable. Additionally, new sets of defaults named --release and --debug (renamed from --dev-mode) have been added. #695

  • Two minor modifications were done in the parsing framework: (i) the parsers for enums and records now allow trailing separators, and (ii) the dash (-) was removed from the allowed characters of schema type names. #706

  • VAST is switching to a calendar-based versioning scheme starting with this release. #739

Features

  • When a record field has the #index=hash attribute, VAST will choose an optimized index implementation. This new index type only supports (in)equality queries and is therefore intended to be used with opaque types, such as unique identifiers or random strings. #632 #726

  • Added Apache Arrow as new export format. This allows users to export query results as Apache Arrow record batches for processing the results downstream, e.g., in Python or Spark. #633

  • The import pcap command now takes an optional snapshot length via --snaplen. If the snapshot length is set to snaplen, and snaplen is less than the size of a packet that is captured, only the first snaplen bytes of that packet will be captured and provided as packet data. #642

  • An experimental new Python module enables querying VAST and processing results as pyarrow tables. #685

  • The long option --config, which sets an explicit path to the VAST configuration file, now also has the short option -c. #689

  • On FreeBSD, a VAST installation now includes an rc.d script that simpliefies spinning up a VAST node. CMake installs the script at PREFIX/etc/rc.d/vast. #693

Bug Fixes

  • In some cases it was possible that a source would connect to a node before it was fully initialized, resulting in a hanging vast import process. #647

  • PCAP ingestion failed for traces containing VLAN tags. VAST now strips IEEE 802.1Q headers instead of skipping VLAN-tagged packets. #650

  • Importing events over UDP with vast import <format> --listen :<port>/udp failed to register the accountant component. This caused an unexpected message warning to be printed on startup and resulted in losing import statistics. VAST now correctly registers the accountant. #655

  • The import process did not print statistics when importing events over UDP. Additionally, warnings about dropped UDP packets are no longer shown per packet, but rather periodically reported in a readable format. #662

  • A bug in the quoted string parser caused a parsing failure if an escape character occurred in the last position. #685

  • A race condition in the index logic was able to lead to incomplete or empty result sets for vast export. #703

  • The example configuration file contained an invalid section vast. This has been changed to the correct name system. #705

0.2 - 2019.10.30

Changes

  • The query language has been extended to support expression of the form X == /pattern/, where X is a compatible LHS extractor. Previously, patterns only supports the match operator ~. The two operators have the same semantics when one operand is a pattern.

  • CAF and Broker are no longer required to be installed prior to building VAST. These dependencies are now tracked as git submodules to ensure version compatibility. Specifying a custom build is still possible via the CMake variables CAF_ROOT_DIR and BROKER_ROOT_DIR.

  • When exporting data in pcap format, it is no longer necessary to manually restrict the query by adding the predicate #type == "pcap.packet" to the expression. This now happens automatically because only this type contains the raw packet data.

  • When defining schema attributes in key-value pair form, the value no longer requires double-quotes. For example, #foo=x is now the same as #foo="x". The form without double-quotes consumes the input until the next space and does not support escaping. In case an attribute value contains whitespace, double-quotes must be provided, e.g., #foo="x y z".

  • The PCAP packet type gained the additional field community_id that contains the Community ID flow hash. This identifier facilitates pivoting to a specific flow from data sources with connnection-level information, such Zeek or Suricata logs.

  • Log files generally have some notion of timestamp for recorded events. To make the query language more intuitive, the syntax for querying time points thus changed from #time to #timestamp. For example, #time > 2019-07-02+12:00:00 now reads #timestamp > 2019-07-02+12:00:00.

  • Default schema definitions for certain import formats changed from hard-coded to runtime-evaluated. The default location of the schema definition files is $(dirname vast-executable)/../share/vast/schema. Currently this is used for the Suricata JSON log reader.

  • The default directory name for persistent state changed from vast to vast.db. This makes it possible to run ./vast in the current directory without having to specify a different state directory on the command line.

  • Nested types are from now on accessed by the .-syntax. This means VAST now has a unified syntax to select nested types and fields. For example, what used to be zeek::http is now just zeek.http.

  • The (internal) option --node for the import and export commands has been renamed from -n to -N, to allow usage of -n for --max-events.

  • To make the export option to limit the number of events to be exported more idiomatic, it has been renamed from --events,e to --max-events,n. Now vast export -n 42 generates at most 42 events.

Features

  • The default schema for Suricata has been updated to support the new suricata.smtp event type in Suricata 5.

  • The export null command retrieves data, but never prints anything. Its main purpose is to make benchmarking VAST easier and faster.

  • The new pivot command retrieves data of a related type. It inspects each event in a query result to find an event of the requested type. If a common field exists in the schema definition of the requested type, VAST will dynamically create a new query to fetch the contextual data according to the type relationship. For example, if two records T and U share the same field x, and the user requests to pivot via T.x == 42, then VAST will fetch all data for U.x == 42. An example use case would be to pivot from a Zeek or Suricata log entry to the corresponding PCAP packets. VAST uses the field community_id to pivot between the logs and the packets. Pivoting is currently implemented for Suricata, Zeek (with community ID computation enabled), and PCAP.

  • The new infer command performs schema inference of input data. The command can deduce the input format and creates a schema definition that is sutable to use with the supplied data. Supported input types include Zeek TSV and JSONLD.

  • The newly added count comman allows counting hits for a query without exporting data.

  • Commands now support a --documentation option, which returns Markdown-formatted documentation text.

  • A new schema for Argus CSV output has been added. It parses the output of ra(1), which produces CSV output when invoked with -L 0 -c ,.

  • The schema language now supports comments. A double-slash (//) begins a comment. Comments last until the end of the line, i.e., until a newline character (\n).

  • The import command now supports CSV formatted data. The type for each column is automatically derived by matching the column names from the CSV header in the input with the available types from the schema definitions.

  • Configuring how much status information gets printed to STDERR previously required obscure config settings. From now on, users can simply use --verbosity=<level>,-v <level>, where <level> is one of quiet, error, warn, info, debug, or trace. However, debug and trace are only available for debug builds (otherwise they fall back to log level info).

  • The query expression language now supports data predicates, which are a shorthand for a type extractor in combination with an equality operator. For example, the data predicate 6.6.6.6 is the same as :addr == 6.6.6.6.

  • The index object in the output from vast status has a new field statistics for a high-level summary of the indexed data. Currently, there exists a nested layouts objects with per-layout statistics about the number of events indexed.

  • The accountant object in the output from vast status has a new field log-file that points to the filesystem path of the accountant log file.

  • Data extractors in the query language can now contain a type prefix. This enables an easier way to extract data from a specific type. For example, a query to look for Zeek conn log entries with responder IP address 1.2.3.4 had to be written with two terms, #type == zeek.conn && id.resp_h == 1.2.3.4, because the nested id record can occur in other types as well. Such queries can now written more tersely as zeek.conn.id.resp_h == 1.2.3.4.

  • VAST gained support for importing Suricata JSON logs. The import command has a new suricata format that can ingest EVE JSON output.

  • The data parser now supports count and integer values according to the International System for Units (SI). For example, 1k is equal to 1000 and 1Ki equal to 1024.

  • VAST can now ingest JSON data. The import command gained the json format, which allows for parsing line-delimited JSON (LDJSON) according to a user-selected type with --type. The --schema or --schema-file options can be used in conjunction to supply custom types. The JSON objects in the input must match the selected type, that is, the keys of the JSON object must be equal to the record field names and the object values must be convertible to the record field types.

  • For symmetry to the export command, the import command gained the --max-events,n option to limit the number of events that will be imported.

  • The import command gained the --listen,l option to receive input from the network. Currently only UDP is supported. Previously, one had to use a clever netcat pipe with enough receive buffer to achieve the same effect, e.g., nc -I 1500 -p 4200 | vast import pcap. Now this pipe degenerates to vast import pcap -l.

  • The new --disable-accounting option shuts off periodic gathering of system telemetry in the accountant actor. This also disables output in the accounting.log.

Bug Fixes

  • The user environments LDFLAGS were erroneously passed to ar. Instead, the user environments ARFLAGS are now used.

  • Exporting data with export -n <count> crashed when count was a multiple of the table slice size. The command now works as expected.

  • Queries of the form #type ~ /pattern/ used to be rejected erroneously. The validation code has been corrected and such queries are now working as expected.

  • When specifying enum types in the schema, ingestion failed because there did not exist an implementation for such types. It is now possible to use define enumerations in schema as expected and query them as strings.

  • Queries with the less < or greater > operators produced off-by-one results for the duration when the query contained a finer resolution than the index. The operator now works as expected.

  • Timestamps were always printed in millisecond resolution, which lead to loss of precision when the internal representation had a higher resolution. Timestamps are now rendered up to nanosecond resolution - the maximum resolution supported.

  • All query expressions in the form #type != X were falsely evaluated as #type == X and consequently produced wrong results. These expressions now behave as expected.

  • Parsers for reading log input that relied on recursive rules leaked memory by creating cycling references. All recursive parsers have been updated to break such cycles and thus no longer leak memory.

  • The Zeek reader failed upon encountering logs with a double column, as it occurs in capture_loss.log. The Zeek parser generator has been fixed to handle such types correctly.

  • Some queries returned duplicate events because the archive did not filter the result set properly. This no longer occurs after fixing the table slice filtering logic.

  • The map data parser did not parse negative values correctly. It was not possible to parse strings of the form "{-42 -> T}" because the parser attempted to parse the token for the empty map "{-}" instead.

  • The CSV printer of the export command used to insert 2 superfluous fields when formatting an event: The internal event ID and a deprecated internal timestamp value. Both fields have been removed from the output, bringing it into line with the other output formats.

  • When a node terminates during an import, the client process remained unaffected and kept processing input. Now the client terminates when a remote node terminates.

  • Evaluation of predicates with negations return incorrect results. For example, the expression :addr !in 10.0.0.0/8 created a disjunction of all fields to which :addr resolved, without properly applying De-Morgan. The same bug also existed for key extractors. De-Morgan is now applied properly for the operations !in and !~.

0.1 - 2019.02.28

This is the first official release.