A schema is a collection of type definitions that describe the structure of events. More generally, schemas include type aliases that make it easier to create semantic types, e.g., to describe domains, URLs, or hashes, instead of defaulting to a string type.
VAST ships with type definitions for common types and a variety of tools. But you can also write your own schemas and adapt existing types.
VAST uses a Zeek-inspired syntax to define a schema. It consists of white-space separate type definitions that have the following form:
X is a new identifier for an existing type
T. The type
T can either
be a type alias or a built-in type definition according to the type
A basic type is the simplest of all types and represents a single value, such as a string or number. For example, you can create a new string type like this:
This defines a type alias with name
domain and the representation
An alias is always a refinement of the type on the right-hand side of the
assignment. For example, you can query domain types only with the predicate
:domain == "evil.com" but
:string == "evil.com" will include domains as
Any type can be augmented with attributes, which are a list of key-value pairs that convey additional type semantics or details on how VAST should treat the data.
For example, we could write our above alias as follows:
In this case, VAST would create a more space-efficient index for
only supports equality queries.
list<T> type is a container
type that contains a variable
number of values. It corresponds to a typed JSON array. For example,
list<string> represents a list of string values with 0 or more entries.
map<K, V> is effectively a list of key-value pairs with fixed key type
K and value type
record type represents named tuples with 1 or more fields. It corresponds
to a typed JSON object. For example, a
log event may look as follows:
This example contains two records:
log is a type alias and
anonymous record inside
log. "Anonymous" means that the scope is local to the
log record, requiring explicit field reference in queries, e.g.,
log.content.msg == "foo".
It is also possible to extract the anonymous record and splitting it into two types:
In VAST's data model, an event is always an
instance of a
record type alias, because VAST models every batch of data as
table where columns correspond to the record fields and rows the event
record type alias is a valid event type. Using the
example from above, the
log record definition can be used as an event,
log_msg record, but not the local
log.content record because
it lacks a global type name.
The schema language supports a few operations on
record types to make
it easier to adapt to the dynamic natures of events. This comes in handy when
data sources combine multiple JSON objects into a single event, such as
Suricata's EVE JSON output.
There exist 3 operators to combine records:
+: concatenate the fields of two records
+but prefer the left record for duplicate fields
+but prefer the right record for duplicate fields
Here is an example:
alert record type contains the fields
time_event type contains the fields
<+ operator uses the field definition from the left in case both left and
right operands contain a field of the same name. The
timestamp_event type is
created with the
+> operator which gives precedence to the right operand
Removing Record Fields
Sometimes an existing record efinition contains fields that are not relevant. While extra fields can be null without consuming noticeable extra space, the ability to remove fields from existing records makes schema management more convenient.
- operation removes fields from a record:
bar record contains only the fields
b. Note that empty records
are not allowed, so
c is impliclity removed from
Type Definition Rules
All defined type names and aliases share one global identifier namespace. Introducing a new type definition or alias adds a symbol to this namespace. The following rules exist to make manipulation of the namespace manageable:
VAST processes all directories of the
vast.schema-dirsoption in order, creating a union of all type definitions.
Within a specified schema directory, all type definitions must be unique, i.e., no types can have the same name.
Across directories, later definitions can override existing ones from previous directories. This allows users to adapt existing types by providing an alternate definition in a separate schema directory.
Resolving aliases to custom types follows a 2-phase lookup, which makes it possible to use a custom type and define it afterwards in the schema file. The 2-phase lookup only works within a schema directory.
Schema Directory Lookup
VAST ships with type definitions and alises for common formats, such as Zeek or
Suricata logs. Preinstalled schemas reside in
additional search paths for user-provided schemas can be set in the
vast.yaml by adjusting the
VAST looks at schema directories in the following order:
<datadir>/vast/schemafor system-wide schema files bundled with VAST, where
<datadir>is the platform-specific directory for data files, e.g.,
<sysconfdir>/vast/schemafor system-wide configuration, where
<sysconfdir>is the platform-specific directory for configuration files, e.g.,
~/.config/vast/schemafor user-specific configuration. VAST respects the XDG base directory specification and its environment variables.
An ordered, comma-separated list of directories passed using
--schema-dirs=path/to/schemason the command line. This corresponds to the option
We recommend to avoid making changes to schema files in
as this can break updates to VAST. If you need to make adaptations of builtin
types, you can modify them in your own schema directory with the help of type
operations. For example:
VAST processes all directories recursively. This means you are free to split the content over a directory structure of your choice.
Import Type Filtering
For the following reasons, users may want to restrict the types considered when importing data:
Resolve ambiguity when there exists no 1-to-1 mapping from parsed data to type, and the type must be inferred.
Discard parsed data that does not match the list of restricted types.
Improve performance: VAST does not have to consider all possible types when all incoming data is of a single type, and the filter restricts the list of known types to one type only.
import command filters known types by prefix when the
option is specified. E.g.,
vast import --type=sysmon json only considers types
whose name begins with