Skip to main content
Version: Next

Configuration

There exist multiple options to configure a Tenzir deployment:

  1. Command-line arguments
  2. Environment variables
  3. Configuration files
  4. Compile-time defaults

These options apply to the tenzir and tenzir-node executables that ship with a Tenzir package. The options are sorted by precedence, i.e., command-line arguments override environment variables, which override configuration file settings. Compile-time defaults can only be changed by rebuilding Tenzir from source.

Let's discuss the first three options in more detail.

Command Line Arguments

The command line arguments of the executables have the following synopsis:

tenzir [opts] <pipeline>
tenzir-node [opts]

We have both long --long=X and short -s X options. Boolean options do not require explicit specification of a value, and it suffices to write --long and -s to set an option to true.

Environment Variables

You can use environment variables as an alternative method to passing command line options. This comes in handy when working with non-interactive deployments where the command line is hard-coded, such as in Docker containers.

An environment variable has the form KEY=VALUE, and we describe the format of KEY and VALUE below. Tenzir processes only environment variables that have the form TENZIR_{KEY}=VALUE. For example, TENZIR_ENDPOINT=1.2.3.4 translates to the command line option --endpoint=1.2.3.4 and YAML configuration tenzir.endpoint: 1.2.3.4.

Keys

There exists a one-to-one mapping from configuration file keys to environment variable names. Here are two examples:

  • tenzir.import.batch-size 👈 configuration file key
  • TENZIR_IMPORT__BATCH_SIZE 👈 environment variable

A hierarchical key of the form tenzir.x.y.z maps to the environment variable TENZIR_X__Y__Z. More generally, the KEY in TENZIR_{KEY}=VALUE adheres to the following rules:

  1. Double underscores map to the . separator of YAML dictionaries.

  2. Single underscores _ map to a - in the corresponding configuration file key. This is unambiguous because Tenzir does not have any options that include a literal underscore.

From the perspective of the command line, setting the --foo option via tenzir --foo or tenzir-node --foo maps onto the environment variable TENZIR_FOO and the configuration file key tenzir.foo. Here are two examples with identical behavior:

TENZIR_ENDPOINT=0.0.0.0:42000 tenzir-node
tenzir-node --endpoint=0.0.0.0:42000
CAF and plugin Settings

To provide CAF and plugin settings, which have the form caf.x.y.z and plugins.name.x.y.z in the configuration file, the environment variable must have the form TENZIR_CAF__X__Y__Z and TENZIR_PLUGINS__NAME__X__Y__Z respectively.

The configuration file is an exception in this regard: tenzir.caf. and tenzir.plugins. are invalid key prefixes. Instead, CAF and plugin configuration file keys have the prefixes caf. and plugins., i.e., they are hoisted into the global scope.

Values

While all environment variables are strings on the shell, Tenzir parses them into a typed value internally. In general, parsing values from the environment follows the same syntactical rules as command line parsing.

In particular, this applies to lists. For example, TENZIR_PLUGINS="foo,bar" is equivalent to --plugins=foo,bar.

Tenzir ignores environment variables with an empty value because the type cannot be inferred. For example, TENZIR_PLUGINS= will not be considered.

Configuration files

Tenzir's configuration file is in YAML format. On startup, Tenzir attempts to read configuration files from the following places, in order:

  1. <sysconfdir>/tenzir/tenzir.yaml for system-wide configuration, where sysconfdir is the platform-specific directory for configuration files, e.g., <install-prefix>/etc.

  2. ~/.config/tenzir/tenzir.yaml for user-specific configuration. Tenzir respects the XDG base directory specification and its environment variables.

  3. A path to a configuration file passed via --config=/path/to/tenzir.yaml.

If there exist configuration files in multiple locations, options from all configuration files are merged in order, with the latter files receiving a higher precedence than former ones. For lists, merging means concatenating the list elements.

Plugin Configuration Files

In addition to tenzir/tenzir.yaml, Tenzir loads tenzir/plugin/<plugin>.yaml for plugin-specific configuration for a given plugin named <plugin>. The same rules apply as for the regular configuration file directory lookup.

Bare Mode

Sometimes, users may wish to run Tenzir without side effects, e.g., when wrapping Tenzir in their own scripts. Run with --bare-mode to disable looking at all system- and user-specified configuration paths.

Plugins

Tenzir's plugin architecture allows for flexible replacement and enhancement of functionality at various pre-defined customization points. There exist dynamic plugins that ship as shared libraries and static plugins that are compiled into libtenzir.

Install plugins

Dynamic plugins are just shared libraries and can be placed at a location of your choice. We recommend putting them into a single directory and add the path to the tenzir.plugin-dirs configuration option..

Static plugins do not require installation since they are compiled into Tenzir.

Load plugins

The configuration key tenzir.plugins specifies the list of plugins that should load at startup. The all plugin name is reserved. When all is specified Tenzir loads all available plugins in the configured plugin directories. If no tenzir.plugins key is specified, Tenzir will load all plugins by default. To load no plugins at all, specify a tenzir.plugins configuration key with no plugin values, e.g. the configuration file entry plugins: [] or launch parameter --plugins=.

Since dynamic plugins are shared libraries, they must be loaded first into the running Tenzir process. At startup, Tenzir looks for the tenzir.plugins inside the tenzir.plugin-dirs directories configured in tenzir.yaml. For example:

tenzir:
  plugin-dirs:
    - .
    - /opt/foo/lib
  plugins:
    - example
    - /opt/bar/lib/libtenzir-plugin-example.so

Before executing plugin code, Tenzir loads the specified plugins via dlopen(3) and attempts to initialize them as plugins. Part of the initialization is passing configuration options to the plugin. To this end, Tenzir looks for a YAML dictionary under plugins.<name> in the tenzir.yaml file. For example:

# <configdir>/tenzir/tenzir.yaml
plugins:
  example:
    option: 42

Alternatively, you can specify a plugin/<plugin>.yaml file. The example configurations above and below are equivalent. This makes plugin deployments easier, as plugins can be installed and uninstalled alongside their respective configuration.

# <configdir>/tenzir/plugin/example.yaml
option: 42

After initialization with the configuration options, the plugin is fully operational and Tenzir will call its functions at the plugin-specific customization points.

List plugins

You can get the list of available plugins using the plugins operator:

tenzir 'plugins'

Block plugins

As part of your Tenzir deployment, you can selectively disable plugins by name. For example, if you do not want the shell operator and the kafka connector to be available, set this in your configuration:

# <configdir>/tenzir/tenzir.yaml
tenzir:
  disable-plugins:
    - shell
    - kafka

Example Configuration

Tenzir reads a configuration file at startup. Here is an example configuration that you can adapt to your needs.

# This is an example configuration file for Tenzir that shows all available
# options. Options in angle brackets have their default value determined at
# runtime.

# Options that concern Tenzir.
tenzir:
# The host and port to listen at for node-to-node connections to in the form
# `<host>:<port>`. Host or port may be emitted to use their defaults, which
# are localhost and 5158, respectively. Set the port to zero to automatically
# choose a port. Set to false to disable exposing an endpoint.
endpoint: localhost:5158

# The timeout for connecting to a Tenzir server. Set to 0 seconds to wait
# indefinitely.
connection-timeout: 5m

# The delay between two connection attempts. Set to 0s to try connecting
# without retries.
connection-retry-delay: 3s

# The file system path used for persistent state.
# Defaults to one of the following paths, selecting the first that is
# available:
# - $STATE_DIRECTORY
# - $PWD/tenzir.db
#state-directory:

# The file system path used for recoverable state.
# In a node process, defaults to the first of the following paths that is
# available:
# - $CACHE_DIRECTORY
# - $XDG_CACHE_HOME
# - $XDG_HOME_DIR/.cache/tenzir (linux) or $XDG_HOME_DIR/Libraries/caches/tenzir (mac)
# - $HOME/.cache/tenzir (linux) or $HOME/Libraries/caches/tenzir (mac)
# - $TEMPORARY_DIRECTORY/tenzir-cache-<uid>
# To determine $TEMPORARY_DIRECTORY, the values of TMPDIR, TMP, TEMP, TEMPDIR are
# checked in that order, and as a last resort "/tmp" is used.
# In a client process, this setting is ignored and
# `$TEMPORARY_DIRECTORY/tenzir-client-cache-<uid>` is used as cache directory.
#cache-directory:

# The file system path used for log files.
# Defaults to one of the following paths, selecting the first that is
# available:
# - $LOGS_DIRECTORY/server.log
# - <state-directory>/server.log
#log-file:

# The file system path used for client log files relative to the current
# working directory of the client. Note that this is disabled by default.
# If not specified no log files are written for clients at all.
client-log-file: "client.log"

# Format for printing individual log entries to the log-file.
# For a list of valid format specifiers, see spdlog format specification
# at https://github.com/gabime/spdlog/wiki/3.-Custom-formatting.
file-format: "[%Y-%m-%dT%T.%e%z] [%n] [%l] [%s:%#] %v"

# Configures the minimum severity of messages written to the log file.
# Possible values: quiet, error, warning, info, verbose, debug, trace.
# File logging is only available for commands that start a node (e.g.,
# tenzir-node). The levels above 'verbose' are usually not available in
# release builds.
file-verbosity: debug

# Whether to enable automatic log rotation. If set to false, a new log file
# will be created when the size of the current log file exceeds 10 MiB.
disable-log-rotation: false

# The size limit when a log file should be rotated.
log-rotation-threshold: 10MiB

# Maximum number of log messages in the logger queue.
log-queue-size: 1000000

# The sink type to use for console logging. Possible values: stderr,
# syslog, journald. Note that 'journald' can only be selected on linux
# systems, and only if Tenzir was built with journald support.
# The journald sink is used as default if Tenzir is started as a systemd
# service and the service is configured to use the journal for stderr,
# otherwise the default is the unstructured stderr sink.
#console-sink: stderr/journald

# Mode for console log output generation. Automatic renders color only when
# writing to a tty.
# Possible values: always, automatic, never. (default automatic)
console: automatic

# Format for printing individual log entries to the console. For a list
# of valid format specifiers, see spdlog format specification at
# https://github.com/gabime/spdlog/wiki/3.-Custom-formatting.
console-format: "%^[%T.%e] %v%$"

# Configures the minimum severity of messages written to the console.
# For a list of valid log levels, see file-verbosity.
console-verbosity: info

# List of directories to look for schema files in ascending order of
# priority.
schema-dirs: []

# Additional directories to load plugins specified using `tenzir.plugins`
# from.
plugin-dirs: []

# List of paths that contain statically configured packages.
# This setting is ignored unless the package manager plugin is enabled.
package-dirs: []

# The plugins to load at startup. For relative paths, Tenzir tries to find
# the files in the specified `tenzir.plugin-dirs`. The special values
# 'bundled' and 'all' enable autoloading of bundled and all plugins
# respectively. Note: Add `example` or `/path/to/libtenzir-plugin-example.so`
# to load the example plugin.
plugins: []

# Names of plugins and builtins to explicitly forbid from being used in
# Tenzir. For example, adding `shell` will prohibit use of the `shell`
# operator builtin, and adding `kafka` will prohibit use of the `kafka`
# connector plugin.
disable-plugins: []

# The unique ID of this node.
node-id: "node"

# Forbid unsafe location overrides for pipelines with the 'local' and 'remote'
# keywords, e.g., remotely reading from a file.
no-location-overrides: false

# The size of an index shard, expressed in number of events. This should
# be a power of 2.
max-partition-size: 4Mi

# Timeout after which an active partition is forcibly flushed, regardless of
# its size.
active-partition-timeout: 30 seconds

# Automatically rebuild undersized and outdated partitions in the background.
# The given number controls how much resources to spend on it. Set to 0 to
# disable.
automatic-rebuild: 1

# Timeout after which an automatic rebuild is triggered.
rebuild-interval: 2 hours

# Zstd compression level applied to the Feather store backend.
# zstd-compression-level: <default>

# Control how operator's calculate demand from their upstream operator. Note
# that this is an expert feature and should only be changed if you know what
# you are doing. All values may either be set to a number, or to a record
# containing `bytes` and `events` fields with numbers depending on the
# operator's input type.
demand:
# Issue demand only if room for at least this many elements is available.
# Must be greater than zero.
min-elements:
bytes: 128Ki
events: 8Ki
# Controls how many elements may be buffered until the operator stops
# issuing demand. Must be greater or equal to min-elements.
max-elements:
bytes: 4Mi
events: 254Ki
# Controls how many batches of elements may be buffered until the operator
# stops issuing demand. Must be greater than zero.
max-batches: 20

# Context configured as part of the configuration that are always available.
contexts:
# A unique name for the context that's used in the context, enrich, and
# lookup operators to refer to the context.
indicators:
# The type of the context.
type: bloom-filter
# Arguments for creating the context, depending on the type. Refer to the
# documentation of the individual context types to see the arguments they
# require. Note that changes to these arguments to not apply to any
# contexts that were previously created.
arguments:
capacity: 1B
fp-probability: 0.001

# The `index` key is used to adjust the false-positive rate of
# the first-level lookup data structures (called synopses) in the
# catalog. The lower the false-positive rate the more space will be
# required, so this setting can be used to manually tune the trade-off
# of performance vs. space.
index:
# The default false-positive rate for type synopses.
default-fp-rate: 0.01
# rules:
# Every rule adjusts the behaviour of Tenzir for a set of targets.
# Tenzir creates one synopsis per target. Targets can be either types
# or field names.
#
# fp-rate - false positive rate. Has effect on string and address type
# targets
#
# partition-index - Tenzir will not create dense index when set to false
# - targets: [:ip]
# fp-rate: 0.01

# The `tenzir-ctl start` command starts a new Tenzir server process.
start:

# Prints the endpoint for clients when the server is ready to accept
# connections. This comes in handy when letting the OS choose an
# available random port, i.e., when specifying 0 as port value.
print-endpoint: false

# Writes the endpoint for clients when the server is ready to accept
# connections to the specified destination. This comes in handy when letting
# the OS choose an available random port, i.e., when specifying 0 as port
# value, and `print-endpoint` is not sufficient.
#write-endpoint: /tmp/tenzir-node-endpoint

# An ordered list of commands to run inside the node after starting.
# As an example, to configure an auto-starting PCAP source that listens
# on the interface 'en0' and lives inside the Tenzir node, add `spawn
# source pcap -i en0`.
# Note that commands are not executed sequentially but in parallel.
commands: []

# Triggers removal of old data when the disk budget is exceeded.
disk-budget-high: 0GiB

# When the budget was exceeded, data is erased until the disk space is
# below this value.
disk-budget-low: 0GiB

# Seconds between successive disk space checks.
disk-budget-check-interval: 90

# When erasing, how many partitions to erase in one go before rechecking
# the size of the database directory.
disk-budget-step-size: 1

# Binary to use for checking the size of the database directory. If left
# unset, Tenzir will recursively add up the size of all files in the
# database directory to compute the size. Mainly useful for e.g.
# compressed filesystem where raw file size is not the correct metric.
# Must be the absolute path to an executable file, which will get passed
# the database directory as its first and only argument.
#disk-budget-check-binary: /opt/tenzir/libexec/tenzir-df-percent.sh

# User-defined operators.
operators:
# The Zeek operator is an example that takes raw bytes in the form of a
# PCAP and then parses Zeek's output via the `zeek-json` format to generate
# a stream of events.
zeek:
shell "zeek -r - LogAscii::output_to_stdout=T
JSONStreaming::disable_default_logs=T
JSONStreaming::enable_log_rotation=F
json-streaming-logs"
| read zeek-json
# The Suricata operator is analogous to the above Zeek example, with the
# difference that we are using Suricata. The commmand line configures
# Suricata such that it reads PCAP on stdin and produces EVE JSON logs on
# stdout, which we then parse with the `suricata` format.
suricata:
shell "suricata -r /dev/stdin
--set outputs.1.eve-log.filename=/dev/stdout
--set logging.outputs.0.console.enabled=no"
| read suricata

# In addition to running pipelines interactively, you can also deploy
# *Pipelines as Code*. This infrastrucutre-as-code-like method differs from
# pipelines run on the command-line or through app.tenzir.com in two ways:
# 1. Pipelines deployed as code always start alongside the Tenzir node.
# 2. Deletion via the user interface is not allowed for pipelines configured
# as code.
pipelines:
# A unique identifier for the pipeline that's used for metrics, diagnostics,
# and API calls interacting with the pipeline.
publish-suricata:
# An optional user-facing name for the pipeline. Defaults to the id.
name: Import Suricata from TCP
# The definition of the pipeline. Configured pipelines that fail to start
# cause the node to fail to start.
definition: |
from tcp://0.0.0.0:34343 read suricata --no-infer
| where event_type != "stats"
| publish suricata
# Pipelines that encounter an error stop running and show an error state.
# This option causes pipelines to automatically restart when they
# encounter an error instead. The first restart happens immediately, and
# subsequent restarts after the configured delay, defaulting to 1 minute.
# The following values are valid for this option:
# - Omit the option, or set it to null or false to disable.
# - Set the option to true to enable with the default delay of 1 minute.
# - Set the option to a valid duration to enable with a custom delay.
restart-on-error: 1 minute
# Pipelines that are unstoppable will run automatically and indefinitely.
# They are not able to pause or stop.
# If they do complete, they will end up in a failed state.
# If `restart-on-error` is enabled, they will restart after the specified
# duration.
unstoppable: false


# The below settings are internal to CAF, and aren't checked by Tenzir directly.
# Please be careful when changing these options. Note that some CAF options may
# be in conflict with Tenzir options, and are only listed here for completeness.
caf:

# Options affecting the internal scheduler.
scheduler:

# Accepted alternative: "sharing".
policy: stealing

# Configures whether the scheduler generates profiling output.
enable-profiling: false

# Output file for profiler data (only if profiling is enabled).
#profiling-output-file: </dev/null>

# Measurement resolution in milliseconds (only if profiling is enabled).
profiling-resolution: 100ms

# Forces a fixed number of threads if set. Defaults to the number of
# available CPU cores if starting a Tenzir node, or *2* for client commands.
#max-threads: <number of cores>

# Maximum number of messages actors can consume in one run.
max-throughput: 500

# When using "stealing" as scheduler policy.
work-stealing:

# Number of zero-sleep-interval polling attempts.
aggressive-poll-attempts: 100

# Frequency of steal attempts during aggressive polling.
aggressive-steal-interval: 10

# Number of moderately aggressive polling attempts.
moderate-poll-attempts: 500

# Frequency of steal attempts during moderate polling.
moderate-steal-interval: 5

# Sleep interval between poll attempts.
moderate-sleep-duration: 50us

# Frequency of steal attempts during relaxed polling.
relaxed-steal-interval: 1

# Sleep interval between poll attempts.
relaxed-sleep-duration: 10ms

stream:

# Maximum delay for partial batches.
max-batch-delay: 15ms

# Selects an implementation for credit computation.
# Accepted alternative: "token-based".
credit-policy: token-based

# When using "size-based" as credit-policy.
size-based-policy:

# Desired batch size in bytes.
bytes-per-batch: 32

# Maximum input buffer size in bytes.
buffer-capacity: 256

# Frequency of collecting batch sizes.
sampling-rate: 100

# Frequency of re-calibrations.
calibration-interval: 1

# Factor for discounting older samples.
smoothing-factor: 2.5

# When using "token-based" as credit-policy.
token-based-policy:

# Number of elements per batch.
batch-size: 1

# Max. number of elements in the input buffer.
buffer-size: 64

# Collecting metrics can be resource consuming. This section is used for
# filtering what should and what should not be collected
metrics-filters:

# Rules for actor based metrics filtering.
actors:

# List of selected actors for run-time metrics.
includes: []

# List of excluded actors from run-time metrics.
excludes: []