Configuration
There exist multiple options to configure a Tenzir deployment:
- Command-line arguments
- Environment variables
- Configuration files
- Compile-time defaults
These options apply to the tenzir
and tenzir-node
executables that ship with
a Tenzir package. The options are sorted by precedence, i.e., command-line
arguments override environment variables, which override configuration file
settings. Compile-time defaults can only be changed by rebuilding Tenzir from
source.
Let's discuss the first three options in more detail.
Command Line Arguments
The command line arguments of the executables have the following synopsis:
tenzir [opts] <pipeline>
tenzir-node [opts]
We have both long --long=X
and short -s X
options. Boolean options do not
require explicit specification of a value, and it suffices to write --long
and
-s
to set an option to true.
Environment Variables
You can use environment variables as an alternative method to passing command line options. This comes in handy when working with non-interactive deployments where the command line is hard-coded, such as in Docker containers.
An environment variable has the form KEY=VALUE
, and we describe the format of
KEY
and VALUE
below. Tenzir processes only environment variables that have
the form TENZIR_{KEY}=VALUE
. For example, TENZIR_ENDPOINT=1.2.3.4
translates
to the command line option --endpoint=1.2.3.4
and YAML configuration
tenzir.endpoint: 1.2.3.4
.
Keys
There exists a one-to-one mapping from configuration file keys to environment variable names. Here are two examples:
tenzir.import.batch-size
👈 configuration file keyTENZIR_IMPORT__BATCH_SIZE
👈 environment variable
A hierarchical key of the form tenzir.x.y.z
maps to the environment variable
TENZIR_X__Y__Z
. More generally, the KEY
in TENZIR_{KEY}=VALUE
adheres to
the following rules:
Double underscores map to the
.
separator of YAML dictionaries.Single underscores
_
map to a-
in the corresponding configuration file key. This is unambiguous because Tenzir does not have any options that include a literal underscore.
From the perspective of the command line, setting the --foo
option via tenzir
--foo
or tenzir-node --foo
maps onto the environment variable TENZIR_FOO
and the configuration file key tenzir.foo
. Here are two examples with
identical behavior:
TENZIR_ENDPOINT=0.0.0.0:42000 tenzir-node
tenzir-node --endpoint=0.0.0.0:42000
To provide CAF and plugin
settings, which have the form caf.x.y.z
and plugins.name.x.y.z
in the
configuration file, the environment variable must have the form
TENZIR_CAF__X__Y__Z
and TENZIR_PLUGINS__NAME__X__Y__Z
respectively.
The configuration file is an exception in this regard: tenzir.caf.
and
tenzir.plugins.
are invalid key prefixes. Instead, CAF and plugin
configuration file keys have the prefixes caf.
and plugins.
, i.e., they are
hoisted into the global scope.
Values
While all environment variables are strings on the shell, Tenzir parses them into a typed value internally. In general, parsing values from the environment follows the same syntactical rules as command line parsing.
In particular, this applies to lists. For example, TENZIR_PLUGINS="foo,bar"
is equivalent to --plugins=foo,bar
.
Tenzir ignores environment variables with an empty value because the type cannot
be inferred. For example, TENZIR_PLUGINS=
will not be considered.
Configuration files
Tenzir's configuration file is in YAML format. On startup, Tenzir attempts to read configuration files from the following places, in order:
<sysconfdir>/tenzir/tenzir.yaml
for system-wide configuration, wheresysconfdir
is the platform-specific directory for configuration files, e.g.,<install-prefix>/etc
.~/.config/tenzir/tenzir.yaml
for user-specific configuration. Tenzir respects the XDG base directory specification and its environment variables.A path to a configuration file passed via
--config=/path/to/tenzir.yaml
.
If there exist configuration files in multiple locations, options from all configuration files are merged in order, with the latter files receiving a higher precedence than former ones. For lists, merging means concatenating the list elements.
Plugin Configuration Files
In addition to tenzir/tenzir.yaml
, Tenzir loads tenzir/plugin/<plugin>.yaml
for plugin-specific configuration for a given plugin named <plugin>
. The same
rules apply as for the regular configuration file directory lookup.
Bare Mode
Sometimes, users may wish to run Tenzir without side effects, e.g., when
wrapping Tenzir in their own scripts. Run with --bare-mode
to disable looking
at all system- and user-specified configuration paths.
Plugins
Tenzir's plugin architecture allows for flexible replacement and enhancement of functionality at various pre-defined customization points. There exist dynamic plugins that ship as shared libraries and static plugins that are compiled into libtenzir.
Install plugins
Dynamic plugins are just shared libraries and can be placed at a location of
your choice. We recommend putting them into a single directory and add the path
to the tenzir.plugin-dirs
configuration option..
Static plugins do not require installation since they are compiled into Tenzir.
Load plugins
The configuration key tenzir.plugins
specifies the list of plugins that should
load at startup. The all
plugin name is reserved. When all
is specified
Tenzir loads all available plugins in the configured plugin directories. If no
tenzir.plugins
key is specified, Tenzir will load all
plugins by default. To
load no plugins at all, specify a tenzir.plugins
configuration key with no
plugin values, e.g. the configuration file entry plugins: []
or launch
parameter --plugins=
.
Since dynamic plugins are shared libraries, they must be loaded first into the
running Tenzir process. At startup, Tenzir looks for the tenzir.plugins
inside
the tenzir.plugin-dirs
directories configured in tenzir.yaml
. For example:
tenzir:
plugin-dirs:
- .
- /opt/foo/lib
plugins:
- example
- /opt/bar/lib/libtenzir-plugin-example.so
Before executing plugin code, Tenzir loads the specified plugins via dlopen(3)
and attempts to initialize them as plugins. Part of the initialization is
passing configuration options to the plugin. To this end, Tenzir looks for a
YAML dictionary under plugins.<name>
in the tenzir.yaml
file. For example:
# <configdir>/tenzir/tenzir.yaml
plugins:
example:
option: 42
Alternatively, you can specify a plugin/<plugin>.yaml
file. The example
configurations above and below are equivalent. This makes plugin deployments
easier, as plugins can be installed and uninstalled alongside their respective
configuration.
# <configdir>/tenzir/plugin/example.yaml
option: 42
After initialization with the configuration options, the plugin is fully operational and Tenzir will call its functions at the plugin-specific customization points.
List plugins
You can get the list of available plugins using the
show
operator:
tenzir 'show plugins'
Block plugins
As part of your Tenzir deployment, you can selectively disable plugins by name.
For example, if you do not want the shell
operator and the kafka
connector
to be available, set this in your configuration:
# <configdir>/tenzir/tenzir.yaml
tenzir:
disable-plugins:
- shell
- kafka
Example Configuration
Tenzir reads a configuration file at startup. Here is an example configuration that you can adapt to your needs.
# This is an example configuration file for Tenzir that shows all available
# options. Options in angle brackets have their default value determined at
# runtime.
# Options that concern Tenzir.
tenzir:
# The host and port to listen at for node-to-node connections to in the form
# `<host>:<port>`. Host or port may be emitted to use their defaults, which
# are localhost and 5158, respectively. Set the port to zero to automatically
# choose a port. Set to false to disable exposing an endpoint.
endpoint: localhost:5158
# The timeout for connecting to a Tenzir server. Set to 0 seconds to wait
# indefinitely.
connection-timeout: 5m
# The delay between two connection attempts. Set to 0s to try connecting
# without retries.
connection-retry-delay: 3s
# The file system path used for persistent state.
# Defaults to one of the following paths, selecting the first that is
# available:
# - $STATE_DIRECTORY
# - $PWD/tenzir.db
#state-directory:
# The file system path used for persistent state.
# Defaults to one of the following paths, selecting the first that is
# available:
# - $CACHE_DIRECTORY
# - $XDG_CACHE_HOME/tenzir
# - $HOME/.cache/tenzir (Linux)
# - $HOME/Library/Caches/tenzir (macOS)
#cache-directory:
# The file system path used for persistent state.
# Defaults to one of the following paths, selecting the first that is
# available:
# - $CACHE_DIRECTORY
# - $XDG_CACHE_HOME
# - $XDG_HOME_DIR/.cache/tenzir (linux) or $XDG_HOME_DIR/Libraries/caches/tenzir (mac)
# - $HOME/.cache/tenzir (linux) or $HOME/Libraries/caches/tenzir (mac)
# - $TEMPORARY_DIRECTORY/tenzir/cache
# To determine $TEMPORARY_DIRECTORY, the values of TMPDIR, TMP, TEMP, TEMPDIR are
# checked in that order, and as a last resort "/tmp" is used.
#cache-directory:
# The file system path used for log files.
# Defaults to one of the following paths, selecting the first that is
# available:
# - $LOGS_DIRECTORY/server.log
# - <state-directory>/server.log
#log-file:
# The file system path used for client log files relative to the current
# working directory of the client. Note that this is disabled by default.
# If not specified no log files are written for clients at all.
client-log-file: "client.log"
# Format for printing individual log entries to the log-file.
# For a list of valid format specifiers, see spdlog format specification
# at https://github.com/gabime/spdlog/wiki/3.-Custom-formatting.
file-format: "[%Y-%m-%dT%T.%e%z] [%n] [%l] [%s:%#] %v"
# Configures the minimum severity of messages written to the log file.
# Possible values: quiet, error, warning, info, verbose, debug, trace.
# File logging is only available for commands that start a node (e.g.,
# tenzir-node). The levels above 'verbose' are usually not available in
# release builds.
file-verbosity: debug
# Whether to enable automatic log rotation. If set to false, a new log file
# will be created when the size of the current log file exceeds 10 MiB.
disable-log-rotation: false
# The size limit when a log file should be rotated.
log-rotation-threshold: 10MiB
# Maximum number of log messages in the logger queue.
log-queue-size: 1000000
# The sink type to use for console logging. Possible values: stderr,
# syslog, journald. Note that 'journald' can only be selected on linux
# systems, and only if Tenzir was built with journald support.
# The journald sink is used as default if Tenzir is started as a systemd
# service and the service is configured to use the journal for stderr,
# otherwise the default is the unstructured stderr sink.
#console-sink: stderr/journald
# Mode for console log output generation. Automatic renders color only when
# writing to a tty.
# Possible values: always, automatic, never. (default automatic)
console: automatic
# Format for printing individual log entries to the console. For a list
# of valid format specifiers, see spdlog format specification at
# https://github.com/gabime/spdlog/wiki/3.-Custom-formatting.
console-format: "%^[%T.%e] %v%$"
# Configures the minimum severity of messages written to the console.
# For a list of valid log levels, see file-verbosity.
console-verbosity: info
# List of directories to look for schema files in ascending order of
# priority.
schema-dirs: []
# Additional directories to load plugins specified using `tenzir.plugins`
# from.
plugin-dirs: []
# List of paths that contain statically configured packages.
# This setting is ignored unless the package manager plugin is enabled.
package-dirs: []
# The plugins to load at startup. For relative paths, Tenzir tries to find
# the files in the specified `tenzir.plugin-dirs`. The special values
# 'bundled' and 'all' enable autoloading of bundled and all plugins
# respectively. Note: Add `example` or `/path/to/libtenzir-plugin-example.so`
# to load the example plugin.
plugins: []
# Names of plugins and builtins to explicitly forbid from being used in
# Tenzir. For example, adding `shell` will prohibit use of the `shell`
# operator builtin, and adding `kafka` will prohibit use of the `kafka`
# connector plugin.
disable-plugins: []
# The unique ID of this node.
node-id: "node"
# Forbid unsafe location overrides for pipelines with the 'local' and 'remote'
# keywords, e.g., remotely reading from a file.
no-location-overrides: false
# The size of an index shard, expressed in number of events. This should
# be a power of 2.
max-partition-size: 4Mi
# Timeout after which an active partition is forcibly flushed, regardless of
# its size.
active-partition-timeout: 30 seconds
# Automatically rebuild undersized and outdated partitions in the background.
# The given number controls how much resources to spend on it. Set to 0 to
# disable.
automatic-rebuild: 1
# Timeout after which an automatic rebuild is triggered.
rebuild-interval: 2 hours
# Zstd compression level applied to the Feather store backend.
# zstd-compression-level: <default>
# Control how operator's calculate demand from their upstream operator. Note
# that this is an expert feature and should only be changed if you know what
# you are doing. All values may either be set to a number, or to a record
# containing `bytes` and `events` fields with numbers depending on the
# operator's input type.
demand:
# Issue demand only if room for at least this many elements is available.
# Must be greater than zero.
min-elements:
bytes: 128Ki
events: 8Ki
# Controls how many elements may be buffered until the operator stops
# issuing demand. Must be greater or equal to min-elements.
max-elements:
bytes: 4Mi
events: 254Ki
# Controls how many batches of elements may be buffered until the operator
# stops issuing demand. Must be greater than zero.
max-batches: 20
# Context configured as part of the configuration that are always available.
contexts:
# A unique name for the context that's used in the context, enrich, and
# lookup operators to refer to the context.
indicators:
# The type of the context.
type: bloom-filter
# Arguments for creating the context, depending on the type. Refer to the
# documentation of the individual context types to see the arguments they
# require. Note that changes to these arguments to not apply to any
# contexts that were previously created.
arguments:
capacity: 1B
fp-probability: 0.001
# The `index` key is used to adjust the false-positive rate of
# the first-level lookup data structures (called synopses) in the
# catalog. The lower the false-positive rate the more space will be
# required, so this setting can be used to manually tune the trade-off
# of performance vs. space.
index:
# The default false-positive rate for type synopses.
default-fp-rate: 0.01
# rules:
# Every rule adjusts the behaviour of Tenzir for a set of targets.
# Tenzir creates one synopsis per target. Targets can be either types
# or field names.
#
# fp-rate - false positive rate. Has effect on string and address type
# targets
#
# partition-index - Tenzir will not create dense index when set to false
# - targets: [:ip]
# fp-rate: 0.01
# The `tenzir-ctl start` command starts a new Tenzir server process.
start:
# Prints the endpoint for clients when the server is ready to accept
# connections. This comes in handy when letting the OS choose an
# available random port, i.e., when specifying 0 as port value.
print-endpoint: false
# Writes the endpoint for clients when the server is ready to accept
# connections to the specified destination. This comes in handy when letting
# the OS choose an available random port, i.e., when specifying 0 as port
# value, and `print-endpoint` is not sufficient.
#write-endpoint: /tmp/tenzir-node-endpoint
# An ordered list of commands to run inside the node after starting.
# As an example, to configure an auto-starting PCAP source that listens
# on the interface 'en0' and lives inside the Tenzir node, add `spawn
# source pcap -i en0`.
# Note that commands are not executed sequentially but in parallel.
commands: []
# Triggers removal of old data when the disk budget is exceeded.
disk-budget-high: 0GiB
# When the budget was exceeded, data is erased until the disk space is
# below this value.
disk-budget-low: 0GiB
# Seconds between successive disk space checks.
disk-budget-check-interval: 90
# When erasing, how many partitions to erase in one go before rechecking
# the size of the database directory.
disk-budget-step-size: 1
# Binary to use for checking the size of the database directory. If left
# unset, Tenzir will recursively add up the size of all files in the
# database directory to compute the size. Mainly useful for e.g.
# compressed filesystem where raw file size is not the correct metric.
# Must be the absolute path to an executable file, which will get passed
# the database directory as its first and only argument.
#disk-budget-check-binary: /opt/tenzir/libexec/tenzir-df-percent.sh
# User-defined operators.
operators:
# The Zeek operator is an example that takes raw bytes in the form of a
# PCAP and then parses Zeek's output via the `zeek-json` format to generate
# a stream of events.
zeek:
shell "zeek -r - LogAscii::output_to_stdout=T
JSONStreaming::disable_default_logs=T
JSONStreaming::enable_log_rotation=F
json-streaming-logs"
| read zeek-json
# The Suricata operator is analogous to the above Zeek example, with the
# difference that we are using Suricata. The commmand line configures
# Suricata such that it reads PCAP on stdin and produces EVE JSON logs on
# stdout, which we then parse with the `suricata` format.
suricata:
shell "suricata -r /dev/stdin
--set outputs.1.eve-log.filename=/dev/stdout
--set logging.outputs.0.console.enabled=no"
| read suricata
# In addition to running pipelines interactively, you can also deploy
# *Pipelines as Code*. This infrastrucutre-as-code-like method differs from
# pipelines run on the command-line or through app.tenzir.com in two ways:
# 1. Pipelines deployed as code always start alongside the Tenzir node.
# 2. Deletion via the user interface is not allowed for pipelines configured
# as code.
pipelines:
# A unique identifier for the pipeline that's used for metrics, diagnostics,
# and API calls interacting with the pipeline.
publish-suricata:
# An optional user-facing name for the pipeline. Defaults to the id.
name: Import Suricata from TCP
# The definition of the pipeline. Configured pipelines that fail to start
# cause the node to fail to start.
definition: |
from tcp://0.0.0.0:34343 read suricata --no-infer
| where event_type != "stats"
| publish suricata
# Pipelines that encounter an error stop running and show an error state.
# This option causes pipelines to automatically restart when they
# encounter an error instead. The first restart happens immediately, and
# subsequent restarts after the configured delay, defaulting to 1 minute.
# The following values are valid for this option:
# - Omit the option, or set it to null or false to disable.
# - Set the option to true to enable with the default delay of 1 minute.
# - Set the option to a valid duration to enable with a custom delay.
restart-on-error: 1 minute
# Pipelines that are unstoppable will run automatically and indefinitely.
# They are not able to pause or stop.
# If they do complete, they will end up in a failed state.
# If `restart-on-error` is enabled, they will restart after the specified
# duration.
unstoppable: false
# The below settings are internal to CAF, and aren't checked by Tenzir directly.
# Please be careful when changing these options. Note that some CAF options may
# be in conflict with Tenzir options, and are only listed here for completeness.
caf:
# Options affecting the internal scheduler.
scheduler:
# Accepted alternative: "sharing".
policy: stealing
# Configures whether the scheduler generates profiling output.
enable-profiling: false
# Output file for profiler data (only if profiling is enabled).
#profiling-output-file: </dev/null>
# Measurement resolution in milliseconds (only if profiling is enabled).
profiling-resolution: 100ms
# Forces a fixed number of threads if set. Defaults to the number of
# available CPU cores if starting a Tenzir node, or *2* for client commands.
#max-threads: <number of cores>
# Maximum number of messages actors can consume in one run.
max-throughput: 500
# When using "stealing" as scheduler policy.
work-stealing:
# Number of zero-sleep-interval polling attempts.
aggressive-poll-attempts: 100
# Frequency of steal attempts during aggressive polling.
aggressive-steal-interval: 10
# Number of moderately aggressive polling attempts.
moderate-poll-attempts: 500
# Frequency of steal attempts during moderate polling.
moderate-steal-interval: 5
# Sleep interval between poll attempts.
moderate-sleep-duration: 50us
# Frequency of steal attempts during relaxed polling.
relaxed-steal-interval: 1
# Sleep interval between poll attempts.
relaxed-sleep-duration: 10ms
stream:
# Maximum delay for partial batches.
max-batch-delay: 15ms
# Selects an implementation for credit computation.
# Accepted alternative: "token-based".
credit-policy: token-based
# When using "size-based" as credit-policy.
size-based-policy:
# Desired batch size in bytes.
bytes-per-batch: 32
# Maximum input buffer size in bytes.
buffer-capacity: 256
# Frequency of collecting batch sizes.
sampling-rate: 100
# Frequency of re-calibrations.
calibration-interval: 1
# Factor for discounting older samples.
smoothing-factor: 2.5
# When using "token-based" as credit-policy.
token-based-policy:
# Number of elements per batch.
batch-size: 1
# Max. number of elements in the input buffer.
buffer-size: 64
# Collecting metrics can be resource consuming. This section is used for
# filtering what should and what should not be collected
metrics-filters:
# Rules for actor based metrics filtering.
actors:
# List of selected actors for run-time metrics.
includes: []
# List of excluded actors from run-time metrics.
excludes: []