Schemas

A schema is a collection of type definitions that describe the structure of events. More generally, schemas include type aliases that make it easier to create semantic types, e.g., to describe domains, URLs, or hashes, instead of defaulting to a string type.

VAST ships with type definitions for common types and a variety of tools. But you can also write your own schemas and adapt existing types.

Syntax

VAST uses a Zeek-inspired syntax to define a schema. It consists of white-space separate type definitions that have the following form:

type X = T

where X is a new identifier for an existing type T. The type T can either be a type alias or a built-in type definition according to the type system.

A basic type is the simplest of all types and represents a single value, such as a string or number. For example, you can create a new string type like this:

type domain = string

This defines a type alias with name domain and the representation string. An alias is always a refinement of the type on the right-hand side of the assignment. For example, you can query domain types only with the predicate :domain == "evil.com" but :string == "evil.com" will include domains as well.

Attributes

Any type can be augmented with attributes, which are a list of key-value pairs that convey additional type semantics or details on how VAST should treat the data.

For example, we could write our above alias as follows:

type domain = string #index=hash

In this case, VAST would create a more space-efficient index for domain that only supports equality queries.

Containers

The list<T> type is a container type that contains a variable number of values. It corresponds to a typed JSON array. For example, list<string> represents a list of string values with 0 or more entries.

The map<K, V> is effectively a list of key-value pairs with fixed key type K and value type V.

Records

The record type represents named tuples with 1 or more fields. It corresponds to a typed JSON object. For example, a log event may look as follows:

type log = record {
source: addr,
content: record {
severity: int,
msg: string,
},
}

This example contains two records: log is a type alias and log.content an anonymous record inside log. "Anonymous" means that the scope is local to the log record, requiring explicit field reference in queries, e.g., log.content.msg == "foo".

It is also possible to extract the anonymous record and splitting it into two types:

type log = record {
source: addr,
content: log_content,
}
type log_msg: record {
severity: int,
msg: string,
}

Events

In VAST's data model, an event is always an instance of a record type alias, because VAST models every batch of data as table where columns correspond to the record fields and rows the event instances.

Consequently, every record type alias is a valid event type. Using the example from above, the log record definition can be used as an event, likewise the log_msg record, but not the local log.content record because it lacks a global type name.

Type Algebra

The schema language supports a few operations on record types to make it easier to adapt to the dynamic natures of events. This comes in handy when data sources combine multiple JSON objects into a single event, such as Suricata's EVE JSON output.

Composing Records

There exist 3 operators to combine records:

  • +: concatenate the fields of two records
  • <+: like + but prefer the left record for duplicate fields
  • +>: like + but prefer the right record for duplicate fields

Here is an example:

type common = record {
timestamp: time,
id: string,
}
type alert = common + record {
message: string,
}
type advanced = record {
timestamp: timestamp,
community_id: string,
}
type time_event = common <+ advanced
type timestamp_event = common +> advanced

The alert record type contains the fields timestamp, id, and message. The time_event type contains the fields timestamp, id, and message. The <+ operator uses the field definition from the left in case both left and right operands contain a field of the same name. The timestamp_event type is created with the +> operator which gives precedence to the right operand instead.

Removing Record Fields

Sometimes an existing record efinition contains fields that are not relevant. While extra fields can be null without consuming noticeable extra space, the ability to remove fields from existing records makes schema management more convenient.

The - operation removes fields from a record:

type foo = record {
a: count,
b: real,
c: record {
d: string
},
}
type bar = foo - c.d

The bar record contains only the fields a and b. Note that empty records are not allowed, so c is impliclity removed from bar.

Type Definition Rules

All defined type names and aliases share one global identifier namespace. Introducing a new type definition or alias adds a symbol to this namespace. The following rules exist to make manipulation of the namespace manageable:

  • VAST processes all directories of the vast.schema-dirs option in order, creating a union of all type definitions.

  • Within a specified schema directory, all type definitions must be unique, i.e., no types can have the same name.

  • Across directories, later definitions can override existing ones from previous directories. This allows users to adapt existing types by providing an alternate definition in a separate schema directory.

  • Resolving aliases to custom types follows a 2-phase lookup, which makes it possible to use a custom type and define it afterwards in the schema file. The 2-phase lookup only works within a schema directory.

Schema Directory Lookup

VAST ships with type definitions and alises for common formats, such as Zeek or Suricata logs. Preinstalled schemas reside in <datadir>/vast/schema, and additional search paths for user-provided schemas can be set in the configuration file vast.yaml by adjusting the vast.schema-dirs option.

VAST looks at schema directories in the following order:

  1. <datadir>/vast/schema for system-wide schema files bundled with VAST, where <datadir> is the platform-specific directory for data files, e.g., /usr/share.

  2. <sysconfdir>/vast/schema for system-wide configuration, where <sysconfdir> is the platform-specific directory for configuration files, e.g., /etc.

  3. ~/.config/vast/schema for user-specific configuration. VAST respects the XDG base directory specification and its environment variables.

  4. An ordered, comma-separated list of directories passed using --schema-dirs=path/to/schemas on the command line. This corresponds to the option vast.schema-dirs.

We recommend to avoid making changes to schema files in <datadir>/vast/schema, as this can break updates to VAST. If you need to make adaptations of builtin types, you can modify them in your own schema directory with the help of type operations. For example:

type suricata.alert = suricata.alert + record {
_custom: string
}
note

VAST processes all directories recursively. This means you are free to split the content over a directory structure of your choice.

Import Type Filtering

For the following reasons, users may want to restrict the types considered when importing data:

  1. Resolve ambiguity when there exists no 1-to-1 mapping from parsed data to type, and the type must be inferred.

  2. Discard parsed data that does not match the list of restricted types.

  3. Improve performance: VAST does not have to consider all possible types when all incoming data is of a single type, and the filter restricts the list of known types to one type only.

The import command filters known types by prefix when the --type=<filter> option is specified. E.g., vast import --type=sysmon json only considers types whose name begins with sysmon.