VAST offers a flexible way of distributing its key components over processes, thanks to the actor model. This flexibility accommodates a wide range of possible deployment scenarios, ranging from highly integrated single-process applications to distributed setups.
When ingesting data, there are two options to spin up sources: either run them
as a separate client process via
vast import or spawn them directly inside
the server process via
vast spawn. We discuss both options below.
By default, starting VAST only spawns the necesssary server components, such as archive and index. This works well for ad-hoc scenarios where users perform manual imports or for low-volume continous data sources. The VAST client process connects to the server and sends the parsed data over a TCP connection. The setup looks like this:
The import command implements this form of ingestion. For example, to import a bunch of zipped Zeek logs, you can invoke VAST like this:
The second method of ingesting data involves spawning a source actor directly inside the server process. The advantage is performance: it cuts out any IPC. Sending parsed data from the source is equivalent to passing a pointer in memory. For high-volume formats, like PCAP or NetFlow, avoiding stress on the I/O path can make a major difference.
The spawn and kill commands add and remove components inside a remote VAST node.
For example, to spawn a PCAP source that listens on the interface
would do this:
Most operating systems don't allow unprivileged processes to access network
interfaces. Use your systems mechanism to grant the necessary privileges to
VAST. On linux this can be done with
setcap cap_net_raw+ep <path/to/vast>.
Keep in mind that all subsequent invocations of the
vast binary will have these capabilities, independent of the user executing it. That includes import and export commands.