Data Model

VAST features a type-rich data model to retain as much semantics as possible during import and export.

The data model has a notion of data and type, which together form a value. Data is an instance of a type. We’ll introduce the types first and then show how they relate to data.

Types

The figure below illustrates the type hierarchy:

There exist two major type classes: recursive types that can contain other types and basic types that have a fixed structure. Recursive are either container types, such as vectors, sets, and tables, or compound types in the form of records.

Basic Types

  • bool: a boolean value
  • int: a 64-bit signed integer
  • count: a 64-bit unsigned integer
  • real: a 64-bit double (IEEE 754)
  • duration: a time duration (nanoseconds granularity)
  • time: a time point (nanoseconds granularity)
  • string: a fixed-length string optimized for short strings
  • pattern: a regular expression
  • addr: an IPv4 or IPv6 address
  • subnet: an IPv4 or IPv6 subnet
  • port: a transport-layer port

Container Types

There exist three container types:

  • vector<T>: an (ordered) sequence of values where each element has type T.

  • set<T>: a set of (unique) values where each element has type T.

  • map<K, V>: an associate array which maps a keys of type K to values of type V.

Compound Types

The record type enables composition of various other types into structures. Records consist of one or more fields, each of which have a name and a type.

Type Attributes

A type can have additional attributes to enrich its semantics or to specify how VAST should handle instances of it. An attribute is a key plus an optional value, e.g., #foo or #foo="bar".

When defining a new event type with a record field that represents the event timestamp, then this field should have the #timestamp attribute. Otherwise VAST will use the current time when ingesting events of this type.

Schema

A schema is a collection of types. VAST uses a Zeek-inspired syntax to define a schema. Consider this example:

# A type alias with an attribute.
type foo = count #skip

# A record using the above type alias.
type bar = record {
  x: foo,
  y: real,
  z: string
}

This schema defines two types: a type alias foo with a skip attribute and a record type bar with three fields, where the first field x contains the previously defined type.

Records naturally support nesting:

# A record using the above type alias.
type flow = record {
  timestamp: time,
  id: record {
    src: addr,
    dst: addr
  },
  data: string
}

The flow record type contains a field id that is also a record type.

Data

While types define the reprsentation of information, data represents the concrete instances. For performance reasons, VAST treats types and data separately.

For example, 42 would be an instance of type count and 1.2.3.4 an instance of type addr. An instance of the record type bar defined above would be be <7, 4.2, "x">.

Every data instance is optional and can also be nil. For a given type, nil is always a valid data instance, but it’s impossible to deduce a type from the literal nil. For example, both <nil, 4.2, nil> and nil itself are valid instance of the record type bar.

Grammar

Data instances occur in query expressions and in the ASCII format. VAST perform type inference based on how the string representation of data literal. The examples below illustrate how to specify data instances:

Basic Data

Type Data
bool T, F
int -42, +0, +8
count 0, 42
real -4.2, 0.0, 4.2
duration -3ns, 4 secs, 10h
time now, now - 7h, 7h ago, 2042-01-01
string "foo", ""
pattern /a.*b+c/
addr 1.2.3.4, ::1, 2001:db8::
subnet 10.0.0.0/8, ::1/128, 2001:db8::/32
port 53/udp, 80/tcp, 8/icmp, 1337/?
SI Literals

To simplify expression of large integer values, VAST supports literals with suffixes according to the International System of Units (SI). These suffixes apply to type int and count:

Suffix 10^X Value
k 3 1 000
M 6 1 000 000
G 9 1 000 000 000
P 12 1 000 000 000 000
E 15 1 000 000 000 000 000

For example, 42k is equal to 42000, and -9G equal to -9000000000.

In addition to expressing numbers to the power of 10, there exists a parallel array of suffixes for expressing numbers with base 2 to represent bytes:

Suffix 2^X Value
Ki 10 1 024
Mi 20 1 048 576
Gi 30 1 073 741 824
Pi 40 1 099 511 627 776
Ei 50 1 125 899 906 842 624

For example, 8Ki is equal to 8192, and 64Mi equal to 67108864.

We purposefully chose IEC-style binary suffixes that end in i to avoid ambiguities of frequent abbreviations, such as “64K”, which colloquially means “64 kilo bytes” to the power of 2 but resembles more the suffix for a value ot the power of 10.

Container Data

In the following, x, y, and z, are a data instaceas of type T and a, b, and c instances of type U.

Type Data
vector<T> [x, y, z], []
set<T> {x, y, z}, {}
map<T, U> {x -> a, y -> b, z -> c}, {-}

Type inference for container instances are a bit tricky, because it’s impossible to infer the element type from an empty container.

Record Data

In the following, x, y, and z, are a data instaceas of type T and a, b, and c instances of type U.

Type Data
record {a: count, b: string } <42, "foo">
record {a: count, b: string } <nil, nil>
record {x: real, y: record{ z: time }} <4.2, <now>>
record {a: addr, b: port, c: bool} <1.2.3.4, 22/tcp, T>

Type inference for records suffers from the same issue that containers have.

Tables and Table Slices

VAST processes data in batches. We call each batch a table slice. The layout of a table slice is record type whose fields describe the columns. The rows of a table slice are record instances. Tables slices that have same layout can form a logical table.