VAST uses the actor model to structure the system into individual components. This results in modular architecture where a separate runtime maps the application logic onto OS processes or remote actors in the network. The actor model simplifies the design a distributed system because it allows for easier reasoning about behavior, while providing a light-weight concurrency primitive that scales well from intra-machine to distributed deployment.
An actor defines a sequential unit of processing, while all actors conceptually run in parallel. Because actors solely communicate via message passing, data races do not occur by design. As long as the application exhibits enough overdecomposition (i.e., distinct running actors), there exists enough "work" that the actor runtime can schedule on OS-level threads. The figure below illustrates the separation of application logic, actor runtime, and underlying hardware. The programmer only thinks in actors (circles), and sending messages between (arrows), whereas the runtime takes care of scheduling the actor execution.
VAST is written in C++. We evaluated multiple actor model library implementations and found that the C++ Actor Framework (CAF) best suits our needs because of the following unique features:
- Efficient copy-on-write message passing
- A configurable and exchangeable scheduler
- Typed actor interfaces for compile-time message contract checking
VAST has a client-server deployment model where a continuously running server manages the persistent state, and clients send telemetry for ingestion or request data via queries. All server actors run in a node, which is effectively a container actor that organizes other actors. The node exposes a listening TCP socket for clients to connect. When a client connects to a node, it opens exactly one TCP connection, even though it may interact with multiple remote actors. The actor runtime takes care of multiplexing the message stream over the TCP socket.
The figure below shows the node as grey background container with the major components as circles.
The source and sink actors live outside the node because they typically run in the context of the client process. The next figure illustrates the split of actors to OS-level client and processes.
The actor runtime transparently serializes messages when they cross process boundaries; within a single process all actors send messages via efficient pointer passing. In the above example of the import process, the source actor runs in its own process, parses input, converts it to an internal batched representation and ships the batches over to the importer running in the node on the server.