Schemas
A schema is a collection of type definitions that describe the structure of events. More generally, schemas include type aliases that make it easier to create semantic types, e.g., to describe domains, URLs, or hashes, instead of defaulting to a string type.
VAST ships with type definitions for common types and a variety of tools. But you can also write your own schemas and adapt existing types.
Syntax
VAST uses a Zeek-inspired syntax to define a schema. It consists of white-space separate type definitions that have the following form:
where X
is a new identifier for an existing type T
. The type T
can either
be a type alias or a built-in type definition according to the type
system.
A basic type is the simplest of all types and represents a single value, such as a string or number. For example, you can create a new string type like this:
This defines a type alias with name domain
and the representation string
.
An alias is always a refinement of the type on the right-hand side of the
assignment. For example, you can query domain types only with the predicate
:domain == "evil.com"
but :string == "evil.com"
will include domains as
well.
Attributes
Any type can be augmented with attributes, which are a list of key-value pairs that convey additional type semantics or details on how VAST should treat the data.
For example, we could write our above alias as follows:
In this case, VAST would create a more space-efficient index for domain
that
only supports equality queries.
Containers
The list<T>
type is a container
type that contains a variable
number of values. It corresponds to a typed JSON array. For example,
list<string>
represents a list of string values with 0 or more entries.
The map<K, V>
is effectively a list of key-value pairs with fixed key type
K
and value type V
.
Records
The record
type represents named tuples with 1 or more fields. It corresponds
to a typed JSON object. For example, a log
event may look as follows:
This example contains two records: log
is a type alias and log.content
an
anonymous record inside log
. "Anonymous" means that the scope is local to the
log
record, requiring explicit field reference in queries, e.g.,
log.content.msg == "foo"
.
It is also possible to extract the anonymous record and splitting it into two types:
Events
In VAST's data model, an event is always an
instance of a record
type alias, because VAST models every batch of data as
table where columns correspond to the record fields and rows the event
instances.
Consequently, every record
type alias is a valid event type. Using the
example from above, the log
record definition can be used as an event,
likewise the log_msg
record, but not the local log.content
record because
it lacks a global type name.
Type Algebra
The schema language supports a few operations on record
types to make
it easier to adapt to the dynamic natures of events. This comes in handy when
data sources combine multiple JSON objects into a single event, such as
Suricata's EVE JSON output.
Composing Records
There exist 3 operators to combine records:
+
: concatenate the fields of two records<+
: like+
but prefer the left record for duplicate fields+>
: like+
but prefer the right record for duplicate fields
Here is an example:
The alert
record type contains the fields timestamp
, id
, and message
.
The time_event
type contains the fields timestamp
, id
, and message
. The
<+
operator uses the field definition from the left in case both left and
right operands contain a field of the same name. The timestamp_event
type is
created with the +>
operator which gives precedence to the right operand
instead.
Removing Record Fields
Sometimes an existing record efinition contains fields that are not relevant. While extra fields can be null without consuming noticeable extra space, the ability to remove fields from existing records makes schema management more convenient.
The -
operation removes fields from a record:
The bar
record contains only the fields a
and b
. Note that empty records
are not allowed, so c
is impliclity removed from bar
.
Type Definition Rules
All defined type names and aliases share one global identifier namespace. Introducing a new type definition or alias adds a symbol to this namespace. The following rules exist to make manipulation of the namespace manageable:
VAST processes all directories of the
vast.schema-dirs
option in order, creating a union of all type definitions.Within a specified schema directory, all type definitions must be unique, i.e., no types can have the same name.
Across directories, later definitions can override existing ones from previous directories. This allows users to adapt existing types by providing an alternate definition in a separate schema directory.
Resolving aliases to custom types follows a 2-phase lookup, which makes it possible to use a custom type and define it afterwards in the schema file. The 2-phase lookup only works within a schema directory.
Schema Directory Lookup
VAST ships with type definitions and alises for common formats, such as Zeek or
Suricata logs. Preinstalled schemas reside in <datadir>/vast/schema
, and
additional search paths for user-provided schemas can be set in the
configuration file vast.yaml
by adjusting the vast.schema-dirs
option.
VAST looks at schema directories in the following order:
<datadir>/vast/schema
for system-wide schema files bundled with VAST, where<datadir>
is the platform-specific directory for data files, e.g.,/usr/share
.<sysconfdir>/vast/schema
for system-wide configuration, where<sysconfdir>
is the platform-specific directory for configuration files, e.g.,/etc
.~/.config/vast/schema
for user-specific configuration. VAST respects the XDG base directory specification and its environment variables.An ordered, comma-separated list of directories passed using
--schema-dirs=path/to/schemas
on the command line. This corresponds to the optionvast.schema-dirs
.
We recommend to avoid making changes to schema files in <datadir>/vast/schema
,
as this can break updates to VAST. If you need to make adaptations of builtin
types, you can modify them in your own schema directory with the help of type
operations. For example:
note
VAST processes all directories recursively. This means you are free to split the content over a directory structure of your choice.
Import Type Filtering
For the following reasons, users may want to restrict the types considered when importing data:
Resolve ambiguity when there exists no 1-to-1 mapping from parsed data to type, and the type must be inferred.
Discard parsed data that does not match the list of restricted types.
Improve performance: VAST does not have to consider all possible types when all incoming data is of a single type, and the filter restricts the list of known types to one type only.
The import
command filters known types by prefix when the --type=<filter>
option is specified. E.g., vast import --type=sysmon json
only considers types
whose name begins with sysmon
.