Tuning

Batching: Table Slices

VAST processes events in batches. Because the event data has the shape of a table, we call the batches table slices. The following options control their parameters.

Encoding

The option vast.import.batch-encoding selects the encoding of table slices. Available values are:

  • msgpack (row-based)
  • arrow (column-based)

The default encoding for table slices is arrow, which is designed for fast columnar access and enables direct export as Apache Arrow without re-encoding.

MessagePack is space-efficient binary serialization format. We recommend using it especially for binary data, like PCAP or NetFlow/IPFIX with varying layouts, and if exporting to Arrow is not required. MessagePack uses less disk space than Arrow, but the exact space savings depend on the type of data that is encoded.

Size

Most components in VAST operate on table slices, which makes the table slice size a fundamental tuning knob on the spectrum of throughput and latency. Small table slices allow for shorter processing times, resulting in more scheduler context switches and a more balanced workload. However, the increased pressure on the scheduler comes at the cost of throughput. A large table slice size allows actors to spend more time processing a block of data, but makes them yield less frequently to the scheduler. As a result, other actors scheduled on the same thread may have to wait a little longer.

The option vast.import.batch-size sets an upper bound for the number of events per table slice.

The option merely controls number of events per table slice, but not necessarily the number of events until a component forwards a batch to the next stage in a stream. The CAF streaming framework uses a credit-based flow-control mechanism to determine buffering of tables slices.

tip

Setting vast.import.batch-size to 0 causes the table slice size to be unbounded and leaves it to other parameters to determine the actual table slice size.

Import Timeout

The vast.import.batch-timeout option sets a timeout for forwarding buffered table slices to the importer. If the timeout fires before a table slice reaches vast.import.batch-size, then the table slice will contain fewer events and ship immediately.

Shutdown

The stop command gracefully brings down a VAST server that has been started with the start command.

It is also possible to send a signal SIGINT(2) to the vast process instead of using vast stop, but in only works on the same machine that runs the server process. We recommend using vast stop, as it also works over the wire.

The stop command blocks until the server process has terminated, and returns a zero exit code upon success, making it suitable for use in launch system scripts.

The configuration option vast.shutdown-grace-period sets the time to wait until component shutdown finishes cleanly before inducing a hard kill.

note

The server waits for ongoing import processes to terminate before shutting down itself. In case an import process is hanging, you can always terminate the hanging process manually to shutdown the server.