Overview

VAST supports a wide array of data formats of network telemetry data, such as IDS logs (Zeek and Suricata), raw network data (PCAP and NetFlow), as well as generic adapters for JSON and CSV. Internally, VAST normalizes all data into a standardized semi-structured data model.

Portable Representation

VAST uses established framing and encoding methods to avoid silo behavior with limited data access:

At the physical level, flatbuffers defines the file system layout and message framing on the wire. Inside the framing, data is encoded either row-oriented via MessagePack or column-oriented via Apache Arrow.

Tables and Slices

VAST models data as tables that consist of rows and columns. A table slice refers to a horizontal partition of a table. The following figure illustrates the key concepts.

A table row is an event, e.g., a data record, log line, or packet. All data is strongly typed according to VAST's type system. Table slices with the same layout can form a logical table to represent a larger unit of analysis.

Data Sets

VAST's system architecture is inherently asynchronous because of uses the actor model to implement the various components. The asynchrony has the advantage of delivering exceptionally low query latencies, but comes at the cost of non-determinism in query processing.

For the user, this mainly manifests a different sort order of results as they come in. But when requesting specific subsets of a result, e.g., for rendering in a frontend or, this can be undesirable. The data set abstraction makes the result of a query deterministic and therefore uniquely representable. Conceptually, a data set is a set of tables with different types.

note

Data Sets are still under development and only available in an experimental stage.

In the next section, we take a closer look at the different existing types and how to compose them to describe structured data.