Tenzir's expression language makes it easy to describe a relevant subset of interest over structured data. The "easy" part is that Tenzir expressions operate on multiple different schemas at once, as opposed to traditional expressions that apply to a single, fixed schema. The language captures this heterogeneity with extractors.
Expressions occur in pipeline operators. The
where operator is the most prominent
An expression is a function over an event that evaluates to
false, indicating whether it qualifies as result. Expression operands are
either sub-expressions or predicates, and can be composed via conjunctions
&&), disjunctions (
||), and negations (
The following diagram shows an example expression in tree form:
When written out, it looks like this:
(dport <= 1024 || :ip in 10.0.0.0/8) && ! (#schema == /zeek.*/)
In this example, the predicate operands
extractors that resolve to a set of matching fields at runtime.
Let's take a look at the expression components in more depth.
There exist three logical connectives that connect sub-expressions:
&&: the logical AND between two expressions
||: the logical OR between two expressions
!: the logical NOT of one expression
A predicate has the form
LHS denotes the left-hand
side operand and
RHS the right-hand side operand. The relational operator
op is typed, i.e., only a subset of the cross product
of operand types is valid.
The following operators separate two operands:
<: less than
<=: less equal
>=: greater equal
==: equal to
!=: not equal to
in: in (left to right)
!in: not in (left to right)
ni: in (right to left)
!ni: not in (right to left)
The table below illustrates a partial function over the cross product of available types. Each letter in a cell denotes a set of operators:
- E: equality operators
- R: range operators
- M: membership operators
An extractor retrieves a certain aspect of an event. When looking up an expression, Tenzir binds the extractor to a specific record field, i.e., maps it to the corresponding numeric column offset in the schema. Binding an expression implicitly creates a disjunction of all matching fields. We find that this existential qualification is the natural user experience when "extracting" data declaratively.
Tenzir has the following extractor types:
Field: extracts all fields whose name match a given record field name.
Type: extracts all event types that have a field of a given type.
Meta: extracts metadata describing the event instead of the actual values contained in it
The diagram below illustrate how extractors relate to each other:
Field extractors have the form
z match on
record field names. The access fields in nested records. Using a type name as
leftmost element before a
. is also possible.
A field extractor has suffix semantics. It is possible to just write
x.y.z. In fact, writing
z is equivalent to
*.z and creates a
disjunction of all fields ending in
ts > 1 day ago: events with a record field
tsfrom the last 24h hours
zeek.conn.id.orig_h in 192.168.0.0/24: connections with source IP in 192.168.0.0/24
orig_bytes >= 10Ki: events with a field
orig_bytesgreater or equal to 10 * 2^10.
Type extractors have the form
T is the type of a field. Type
extractors work for all basic types and
A search for type
:T includes all aliased types. For example, given the alias
port that maps to
uint64, then the
:uint64 type extractor will also
consider instances of type
port. However, a
:port query does not include
:uint64 types because an alias is a strict refinement of an existing type.
:timestamp > 1 hour ago: events with a
timestampalias in the last hour
:ip == 18.104.22.168: events with any field of type
ipequal to 22.214.171.124
:uint64 > 42M: events where
uint64values is greater than 42M
"evil" in :string: events where any
stringfield contains the substring
Meta extractors have the form
#extractor. They work on the event metadata
(e.g., their schema) instead of the value domain.
#schema: the human-readable name of the schema
#schema_id: the unique fingerprint for the schema
#import_time: the ingestion time when event arrived at the server
#schema == "zeek.conn": events of type
"suricata" in #schema: events that have
suricatain their schema name
#import_time > 1 hour ago: events that have been imported within the last hour
There are three short forms for defining predicates succinctly. They are merely syntactic sugar and can be used whenever a predicate is expected. The following table shows how values, field extractors and type extractors are expanded.
The first form requires the event to contain a field with the given
value. This allows for quick type-based point queries, such as
(126.96.36.199 || 10.0.0.0/8) && "evil". The second and third short forms
test for the existance of a field or type. They are useful to filter
out events with missing information.
Values of type
subnet expand more broadly. For example, the subnet
10.0.0.0/8 expands to:
:subnet == 10.0.0.0/8 || :ip in 10.0.0.0/8
This makes it easier to search for IP addresses belonging to a specific subnet.
Every type has a corresponding value syntax in the expression language.