VAST processes events in batches. Because the event data has the shape of a table, we call the batches table slices. The following options control their parameters.
vast.import.batch-encoding selects the encoding of table slices.
Available values are:
The default encoding for table slices is
arrow, which is designed for fast
columnar access and enables direct export as Apache
Arrow without re-encoding.
MessagePack is space-efficient binary serialization format. We recommend using it especially for binary data, like PCAP or NetFlow/IPFIX with varying layouts, and if exporting to Arrow is not required. MessagePack uses less disk space than Arrow, but the exact space savings depend on the type of data that is encoded.
Most components in VAST operate on table slices, which makes the table slice size a fundamental tuning knob on the spectrum of throughput and latency. Small table slices allow for shorter processing times, resulting in more scheduler context switches and a more balanced workload. However, the increased pressure on the scheduler comes at the cost of throughput. A large table slice size allows actors to spend more time processing a block of data, but makes them yield less frequently to the scheduler. As a result, other actors scheduled on the same thread may have to wait a little longer.
vast.import.batch-size sets an upper bound for the number of events
per table slice.
The option merely controls number of events per table slice, but not necessarily the number of events until a component forwards a batch to the next stage in a stream. The CAF streaming framework uses a credit-based flow-control mechanism to determine buffering of tables slices.
vast.import.batch-size to 0 causes the table slice size to be
unbounded and leaves it to other parameters to determine the actual table slice
vast.import.batch-timeout option sets a timeout for forwarding buffered
table slices to the importer. If the timeout fires before a table slice reaches
vast.import.batch-size, then the table slice will contain fewer events and
The amount of memory that a VAST server process is allowed to use can currently not be configured directly as a config file option. Instead of such a direct tuning knob, the memory usage can be influenced through the configuration of the caching, meta index and disk monitor features.
As illustrated on the [architecture.md] page VAST splits its indexes and raw
storage into separate top-level components called INDEX and ARCHIVE. These
components bucket their data into
segments respectively, and
each maintains an LRU cache to improve query responsiveness.
The segment cache can be controlled with the
segments key which together with
max-segment-size will determine the memory requirements for the segment
The partition cache works analogous, but the size of a partition is controlled as number of events instead of bytes. The latter depends on the data that is ingested and should be measured for a particular deployment.
The Meta Index is responsible for deciding whether a partition qualifies for a certain query. It does so by maintaining a probabilistic data structure (sketch) for each partition that allows membership queries with a small acceptance for false positives. Due to this characteristic sketches can grow sub-linear, doubling the number of events in a sketch does not lead to a doubling of the memory requirement.
Because the meta index must be traversed in full for a given query it needs to be maintained in active memory to provide high responsiveness.
As a consequence, the overall amount of data in a VAST database instance and the
max-partition-size determine the memory requirements of the meta index. The
max-partition-size is inversely linked to the number of sketches in the Meta
Index. That means increasing the
max-partition-size is an effective method to
reduce the memory requirements for the Meta Index.
A rough formula for estimating the memory requirements takes the configuration, the input data rate, and the amount of unique id address and string values in the input data stream into account.
sketch-factor depends on the rate of unique string or address values in
the input data stream and the
meta-index-fp-rate option. This has been
measured to 2.0% at a false positive rate of
avg-partition-disk-size is in turn a function of the input data
max-partition-size config option. It has been measured at
about 163 MiB for a typical suricata eve.log.
avg-partition-mem-size can be conservatively calculated as twice
the size of the
avg-partition-disk-size. Multiplied to the value of
max-resident-partitions + 1 from the configuration file we get the maximum
occupancy of the partitions.
Putting in the measured example values and a
disk-budget-high setting of
2000 GiB we can calculate the estimated memory consumption to
As you can see, depending on the disk budget and the entropy in the data the
sketches that make up the meta index can quickly become the largest contributor
to the memory requirement. Increasing the
max-partition-size from the default
1 Mib to 8 Mib reduces the sketch factor to
0.016 for our sample data, if we
2 for a fair comparison, we get the
the following new composition:
stop command gracefully brings down a VAST server that has been started
It is also possible to send a signal
SIGINT(2) to the
vast process instead
vast stop, but in only works on the same machine that runs the
server process. We recommend using
vast stop, as it also works over the wire.
stop command blocks until the server process has terminated, and returns
a zero exit code upon success, making it suitable for use in launch system
The configuration option
vast.shutdown-grace-period sets the time to wait
until component shutdown finishes cleanly before inducing a hard kill.
The server waits for ongoing import processes to terminate before shutting down itself. In case an import process is hanging, you can always terminate the hanging process manually to shutdown the server.
The VAST server writes log files into a file named
server.log in the database
directory by default. Set the option
vast.log-file to change the location of
the log file.
VAST client processes do not write logs by default. Set the option
vast.client-log-file to enable logging. Note that relative paths are
interpreted relative to the current working directory of the client process.
Server log files rotate automatically after 10 MiB. The option
vast.disable-log-rotation allows for disabling log rotation entirely, and the
vast.log-rotation-threshold sets the size limit when a log file should
VAST processes log messages in a dedicated thread, which buffers up to 1M
messages by default. The option
vast.log-queue-size controls this setting.