System Structure

Using the actor model, VAST represents its individual components as actors. The following figure highlights the top-level actors:

Components as actors

The actor model allows for flexible modes of distribution because actors can be replicated and moved across process boundaries. In the most basic form, VAST spawns one server process that contains all core actors that manage the persistent state, i.e., archive and index. This process spawns only one "container" actor that we call a node:

Actors in processes

Implementation Detail

The node exposes itself on the network at a TCP endpoint. In CAF, every actor can bind to a socket and accept remote connections. This allows other processes to establish a direct connection to an individual actor and thereafter interact with it by sending and receiving messages. When there exist multiple remote connections between two processes, CAF multiplexes the messages over a single TCP socket.

A client process can now spawn its own actors, connect to the remote node and request actor handles to communicate with specific components. For example, when importing data a client process spawns a source actor, which sends its parsed data to the importer in the remote node. Likewise, a one-shot query on the command line spawns a new process with a sink actor that holds a reference to an exporter actor in the remote node.

Because using multiple processes incurs extra I/O costs as well as CPU overhead for serialization, a node can also accommodate source actors so that sending messages results in mere pointer passing. For example, when a source acquires a high-volume stream of packets, it is important to avoid minimal overhead to guarantee the needed event bandwidth. The figure below shows this form of actor deployment:

Actors in processes