Download the release on GitHub.
Features
Section titled “Features”Clean up transform steps (and native plugins generally)
Section titled “Clean up transform steps (and native plugins generally)”The replace
transform step now allows for setting values of complex types,
e.g., lists or records.
By @dominiklohmann in #2228.
Print segment contents with lsvast
Section titled “Print segment contents with lsvast”The lsvast
tool now prints the whole store contents when given a store file as
an argument.
Use dedicated partitions for each layout
Section titled “Use dedicated partitions for each layout”VAST now creates one active partition per layout, rather than having a single active partition for all layouts.
The new option vast.active-partition-timeout
controls the time after which an
active partition is flushed to disk. The timeout may hit before the partition
size reaches vast.max-partition-size
, allowing for an additional temporal
control for data freshness. The active partition timeout defaults to 1 hour.
Allow fine-grained meta index configuration
Section titled “Allow fine-grained meta index configuration”The new vast.index
section in the configuration supports adjusting the
false-positive rate of first-stage lookups for individual fields, allowing
users to optimize the time/space trade-off for expensive queries.
Add a grand total event counter to the status output
Section titled “Add a grand total event counter to the status output”The output of vast status
now displays the total number of events stored
under the key index.statistics.events.total
.
Backport bug fixes for a v1.1.1 release
Section titled “Backport bug fixes for a v1.1.1 release”The disk monitor has new status entries blacklist
and `blacklist
- size` containing information about partitions failed to be erased.
By @dominiklohmann in #2160.
Support environment variables as alternate config mechanism
Section titled “Support environment variables as alternate config mechanism”VAST has now complete support for passing environment variables as alternate
path to configuration files. Environment variables have lower precedence than
CLI arguments and higher precedence than config files. Variable names of the
form VAST_FOO__BAR_BAZ
map to vast.foo.bar-baz
, i.e., __
is a record
separator and _
translates to -
. This does not apply to the prefix VAST_
,
which is considered the application identifier. Only variables with non-empty
values are considered.
Implement support for transforms that apply to every type and use compaction for aging
Section titled “Implement support for transforms that apply to every type and use compaction for aging”VAST v1.0 deprecated the experimental aging feature. Given popular demand we’ve decided to un-deprecate it, and to actually implement it on top of the same building blocks the compaction mechanism uses. This means that it is now fully working and no longer considered experimental.
Changes
Section titled “Changes”Remove the get subcommand
Section titled “Remove the get subcommand”We removed the experimental vast get
command. It relied on an internal unique
event ID that was only exposed to the user in debug messages. This removal is a
preparatory step towards a simplification of some of the internal workings of
VAST.
Mark experimental
encoding as arrow.v2
Section titled “Mark experimental encoding as arrow.v2”VAST’s internal data model now completely preserves the nesting of the stored
data when using the arrow
encoding, and maps the pattern, address,
subnet, and enumeration types onto Arrow extension types rather than using the
underlying representation directly. This change enables use of the export arrow
command without needing information about VAST’s type system.
Transform steps that add or modify columns now transform the columns in-place rather than at the end, preserving the nesting structure of the original data.
The deprecated msgpack
encoding no longer exists. Data imported using the
msgpack
encoding can still be accessed, but new data will always use the
arrow
encoding.
By @dominiklohmann in #2159.
Minimize the threadpool for client commands
Section titled “Minimize the threadpool for client commands”Client commands such as vast export
or vast status
now create less threads
at runtime, reducing the risk of hitting system resource limits.
Deploy VAST to AWS Lambda
Section titled “Deploy VAST to AWS Lambda”VAST ships experimental Terraform scripts to deploy on AWS Lambda and Fargate.
Fix CLI verbosity shorthands
Section titled “Fix CLI verbosity shorthands”The command line option --verbosity
has the new name --console-verbosity
.
This synchronizes the CLI interface with the configuration file that solely
understands the option vast.console-verbosity
.
Remove the “catalog” and “catalog-bytes” keys from the index status
Section titled “Remove the “catalog” and “catalog-bytes” keys from the index status”The index
section in the status output no longer contains the catalog
and
catalog-bytes
keys. The information is already present in the top-level
catalog
section.
Rename meta-index
to catalog
Section titled “Rename meta-index to catalog”The meta-index
is now called the catalog
. This affects multiple metrics and
entries in the output of vast status
, and the configuration option
vast.meta-index-fp-rate
, which is now called vast.catalog-fp-rate
.
By @dominiklohmann in #2128.
Eploit synergies when evaluating many queries at the same time
Section titled “Eploit synergies when evaluating many queries at the same time”We revised the query scheduling logic to exploit synergies when multiple
queries run at the same time. In that vein, we updated the related metrics with
more accurate names to reflect the new mechanism. The new keys
scheduler.partition.materializations
, scheduler.partition.scheduled
, and
scheduler.partition.lookups
provide periodic counts of partitions loaded from
disk and scheduled for lookup, and the overall number of queries issued to
partitions, respectively. The keys query.workers.idle
, and
query.workers.busy
were renamed to scheduler.partition.remaining-capacity
,
and scheduler.partition.current-lookups
. Finally, the key
scheduler.partition.pending
counts the number of currently pending
partitions. It is still possible to opt-out of the new scheduling algorithm
with the (deprecated) option --use-legacy-query-scheduler
.
Bump the minimum version of Apache Arrow to 7.0
Section titled “Bump the minimum version of Apache Arrow to 7.0”VAST now requires Apache Arrow >= v7.0.0.
Clean up transform steps (and native plugins generally)
Section titled “Clean up transform steps (and native plugins generally)”Multiple transform steps now have new names: select
is now called where
,
delete
is now called drop
, project
is now called put
, and aggregate
is
now called summarize
. This breaking change is in preparation for an upcoming
feature that improves the capability of VAST’s query language.
The layout-names
option of the rename
transform step was renamed schemas
.
The step now additonally supports renaming fields
.
By @dominiklohmann in #2228.
Bug Fixes
Section titled “Bug Fixes”Make man-page creation more robust
Section titled “Make man-page creation more robust”The vast(1)
man-page is no longer empty for VAST distributions with static
binaries.
By @dominiklohmann in #2190.
Treat list options in env variables consistently
Section titled “Treat list options in env variables consistently”Environment variables for options that specify lists now consistently use comma-separators and respect escaping with backslashes.
By @dominiklohmann in #2236.
Reduce the default log queue size for client commands
Section titled “Reduce the default log queue size for client commands”We optimized the queue size of the logger for commands other than vast start
.
Client commands now show a significant reduction in memory usage and startup
time.
Lift selector field requirements for JSON import
Section titled “Lift selector field requirements for JSON import”The JSON import no longer rejects non-string selector fields. Instead, it always
uses the textual JSON representation as a selector. E.g., the JSON object
{id:1,...}
imported via vast import json --selector=id:mymodule
now matches
the schema named mymodule.1
rather than erroring because the id
field is not
a string.
By @dominiklohmann in #2255.
Correctly terminate the explore command
Section titled “Correctly terminate the explore command”The explore
command now properly terminates after the requested number of
results are delivered.
Load stores lazily
Section titled “Load stores lazily”The count --estimate
erroneously materialized store files from disk,
resulting in an unneeded performance penalty. VAST now answers approximate
count queries by solely consulting the relevant index files.
By @dominiklohmann in #2146.
Add support for reals in CSV without dot
Section titled “Add support for reals in CSV without dot”The CSV parser no longer fails when encountering integers when floating point values were expected.
By @dominiklohmann in #2184.
Fix query pruning in the catalog
Section titled “Fix query pruning in the catalog”The query optimizer incorrectly transformed queries with conjunctions or disjunctions with several operands testing against the same string value, leading to missing result. This was rarely an issue in practice before the introduction of homogenous partitions with the v2.0 release.
Don’t send null pointers when erasing whole partitions
Section titled “Don’t send null pointers when erasing whole partitions”VAST no longer sometimes crashes when aging or compaction erase whole partitions.
Ignore types unrelated to the configuration in the summarize plugin
Section titled “Ignore types unrelated to the configuration in the summarize plugin”Transform steps removing all nested fields from a record leaving only empty nested records no longer cause VAST to crash.
By @dominiklohmann in #2258.
Fix race condition with exporter timeouts
Section titled “Fix race condition with exporter timeouts”Some queries could get stuck when an importer would time out during the meta index lookup. This race condition no longer exists.
Stop accepting new queries after initiating shutdown
Section titled “Stop accepting new queries after initiating shutdown”VAST servers no longer accept queries after initiating shutdown. This fixes a potential infinite hang if new queries were coming in faster than VAST was able to process them.
By @dominiklohmann in #2215.
Use the timestamp type for inferred event timestamp fields in the Zeek reader
Section titled “Use the timestamp type for inferred event timestamp fields in the Zeek reader”The import zeek
command now correctly marks the event timestamp using the
timestamp
type alias for all inferred schemas.