Tenzir vs. Cribl
We get a lot of questions about Cribl from our users: How do Tenzir pipelines differ? What is the equivalent of a Cribl source and a sink? Does Tenzir have routes? How does Tenzir break events? Does Tenzir have packs? To answer all these questions and quench the thirst of your inquisitive minds, we put together this side-by-side comparison of Cribl and Tenzir.
Product
Cribl
- Cribl has several products:
- Cribl Stream: runs pipelines that process data in motion using a JavaScript-based pipeline engine.
- Cribl Edge: agent to collect data for forwarding to other Cribl products.
- Cribl Search: cloud-based federated search over remote data sources. Microsoft's Kusto Query language (KQL) is the pipeline language for running queries over data at rest.
- Cribl Lake: a data lake running on top of public cloud providers
- Cribl's product suite is closed source.
Tenzir
- Tenzir has a single, unified product. The Tenzir Query Language (TQL) is a unified language to process historical and streaming data. Users deploy nodes in that can be managed through the platform at app.tenzir.com.
- Tenzir is an open-core product, with an open-source project and a commercial platform for enterprise needs.
Architecture
Deployment
Cribl
- Concepts:
- Leader Node: a Cribl Stream instance in leader mode to manage configurations and watch Worker Nodes.
- Worker Node: a Cribl Stream instance in worker mode, managed by a leader node.
- Worker Group: a collection of worker nodes with the same configuration.
- Mapping Ruleset: maps nodes to worker groups.
- The Enterprise Edition supports on-prem hosting instances of leader and workers.
Tenzir
- Concepts:
- Node: manages pipelines and optional storage.
- Platform: centrally manages nodes.
- You deploy nodes in your infrastructure.
- Users manage nodes and pipelines through the platform.
- Nodes connect to the platform on startup.
- Nodes can run in the cloud and on premises.
- Tenzir hosts an instance of the platform at app.tenzir.com for the Community Edition, Professional Edition, and Enterprise Edition.
- The Sovereign Edition allows for an on-premise, air-gapped deployment of the platform.
Pipelines
Cribl
Cribl Stream has the following pipeline concepts:
- Sources: configurations to collect data from remote resources
- Pipelines: a series of functions
that process data, attached to routes
- Pre-processing Pipelines: attached to sources, e.g., to apply function to all input events
- Post-processing Pipelines: attached to destinations, e.g., to apply function to all output events
- Routes: assign events to pipelines
- Destinations: receive data
Tenzir
- Everything in Tenzir is a pipeline that consist of one or more operators.
- Tenzir does not have separate abstractions for Sources and Destinations. Rather, operators can be a source (no input, only output), a transformation (input and output), or a sink (only input, no output).
Functions vs. Operators
Cribl
- A pipeline in Cribl Stream consists of a series of functions.
- A "pipeline" in Cribl Search consists of a dataset followed by one or more operators.
Tenzir
- Tenzir does not differentiate between streaming and historical search pipelines.
- Tenzir operators can leverage other abstractions
- Connectors: loads or saves bytes from a remote resource
- Formats: parse or print data
- Contexts: stateful objects for enrichment/contextualization
- Tenzir connectors and formats can be used from
various operators, such as
load
,from
,save
,to
,parse
.
Routing
Cribl
- Cribl Stream's Routes are sequential filters that determine the pipelines events should be delivered to.
Tenzir
- Tenzir uses a publish/subscribe model to support various event forwarding patterns.
- You can re-implement Cribl Stream Routes using a combination of the
publish
,subscribe
andwhere
operators.
Installation
Provisioning
Cribl
- Cribl Stream runs on multiple Linux distributions
- A Docker deployment is also an option.
- Cribl Stream offers a sizing calculator to estimate CPU and RAM requirements.
- A typical deployment consists of one more worker processes per machine.
- To scale horizontally, worker groups can spawn additional workers with the same configuration.
Tenzir
- Tenzir nodes run natively on any Linux distribution as a static binary
- A Docker deployment is also an option. The platform generate a Docker Compose file for your node.
- Tenzir offers a node sizing calculator to estimate CPU cores, RAM, and storage requirements.
- A typical deployment consists of exactly one Tenzir node process per machine.
- To scale horizontally, users can spawn multiple nodes, each of which runs a subset of pipelines.
- To scale vertically, a node uses a thread pool to adapt to the number of available CPU cores.
Executables
Cribl
- The
cribl
binary starts/stops a Cribl Stream instance. - By default, the UI listens on port 9000.
- By default, a HTTP In source listens at port 10080.
Tenzir
- The
tenzir
executable runs a single pipeline. - The
tenzir-node
executable spawns up a node. - If a platform configuration is present, the node attempts to connect to the platform so that you can manage.
- By default, a node listens on TCP port 5158 for incoming Tenzir connections.
- There is no default HTTP ingest source, you need to deploy a pipeline for that.
Data Model
Cribl
- An event is a collection of key-value pairs.
- Events are JSON objects.
- Fields starting with a double-underscore are known as internal fields that
sources can add to events, e.g., Syslog adds an
__srcIpPort
field. Internal fields are used within Cribl Stream and are not passed to destinations. - Cribl allows users to write JavaScript to process events.
Tenzir
- An event is a semi-structured record, similar to a JSON object but with additional data types.
- Tenzir's type system is a superset of JSON,
providing additional first-class types, such as
ip
,subnet
,time
, orduration
. - Events have a schema that includes the field names and types
- Internally, Tenzir represents events as Apache Arrow record batches, which you can think of as data frames.
Dataflow
Cribl
- A source generates bytes or events.
- Collector sources fetch data in a triggered fashion.
- Push sources send data to Cribl.
- Pull sources continuously fetch data.
- System sources generate events about Cribl itself
- Internal sources are similar to system sources but do not count towards license usage.
- A custom command is an optional customization point in the form of an executable that takes bytes on stdin bytes from the source and forwards the command output on stdout downstream.
- For sources that generates bytes, an event breaker splits bytes into individual events.
- Fields enable for enrichment on a key-value basis where the key matches a field in an event and the value is a JavaScript expression.
- A parser is a configuration of the parser function, which extracts fields from events. It supports JSON, CSV, key-value pairs, Grok, regular expressions, among others.
- The
_raw
field catches all events that cannot be parsed. - Cribl stream sets the event time in the
_time
field and uses the current wallclock time if there is no suitable field.
Tenzir
- A source is an operator that only produces data. Source operators that use a
loader, such as
load
andfrom
, produce bytes. - A sink is an operator that only consumes data. Sink operators that use a
saver, such as
save
andto
, consume bytes. - A transformation is an operator that consumes and produces data. Numerous events-to-events transformations allow for shaping the data.
- A parser converts bytes to events and is used in the
read
andparse
operators. Parsers are equivalent to event breakers. For example, breaking at a newline is equivalent to applying thelines
parser. Another event breaker is JSON Array, which lifts every single array element into a dedicated event. In Tenzir, this is a transformation of a list field, since an array (list
in Tenzir) is already structured data. Theyield
operator implements this lifting, e.g.,yield xs[]
pulls the elements of arrayxs
out as top-level events. - A printer converts events to bytes and is used in the
write
operator. - The
shell
is a bytes-to-bytes transformation that can be placed freely in a pipeline where the operator types match. Unlike Cribl's custom commands, there are no restrictions where to place this operator in a pipeline. - Similarly, the
python
is an events-to-events transformation that can be placed freely in a pipeline where the operator types match. The operator takes inline Python or a path to a file as argument, with the current event being represented by the variableself
. - The
parse
operator applies a parser to single field an an event and is equivalent to the Cribl parser function. - Parse errors generate a diagnostic that can be processed separately with the
diagnostics
source operator. - There is not special
_time
field in Tenzir. TODO: discussextend _time=now()
and thetimestamp
alias.
Use Cases
This section compares how Cribl and Tenzir handle common use cases that we encounter.
Unrolling Arrays
Cribl
- The
unroll
function unrolls/explodes an array of objects into individual events. - The
unroll
function can only operate on the string value of an event that has a_raw
field.
Tenzir
- The
unroll
operator performs the same operation as Cribl'sunroll
function. - The
unroll
operator can operate on any array in an event. yield
performs as similar operation:unroll xs
andyield xs[]
differ in that theyield
operator strips all outer fields and makes the array elements the new top-level event.
Deduplication
Deduplication means removing duplicate events from a stream. Check out our blog post on deduplication that discusses this topic in more depth.
Cribl
Cribl Stream has a Suppress function for deduplicating events.
- Controls:
- Key expression: a string that describes a unique key for deduplicating,
e.g.,
${ip}:${port}
refers to fieldsip
andport
. - Number to allow: number of events per time period.
- Suppression period: the interval to suppress events for after the maximum number of allowed events have been seen.
- Drop suppressed events: flag to control whether events get dropped or
enriched with a
suppress=1
field.
- Key expression: a string that describes a unique key for deduplicating,
e.g.,
Cribl Search has a dedup
operator.
Tenzir
Tenzir has a deduplicate
operator.
- Controls:
- Extractors: a list of field names that uniquely identify the event ("key expression").
- Limit: the number of events to emit per unique key.
- Timeout: The time that needs to pass until a suppressed event is no longer considered a duplicate. ("suppression period")
- Distance: The number of events in sequence since the last occurrence of a unique event.
Enrichment
Cribl
- Lookups are tables usable for enrichment with the lookup function
- Lookup files can be CSV or GeoIP databases in MMDB format.
- Changing lookup state must be periodically refreshed by providing a reload interval, which checks the underlying file for changes.
- For frequently changing data, Cribl recommends the Redis function.
Tenzir
- Contexts are stateful objects usable for enrichment with the
enrich
operator. - There exist several context types, such as lookup tables, Bloom filters, GeoIP databases, or user-written C++ plugins.
- Contexts are not static and limited to CSV or MMDB files; you can add data
dynamically from any another pipeline, using the
context
update
operator. In other words, you can use all existing connectors and formats to feed data into a context. - When Tenzir lookup tables have CIDR subnets as key, you can perform an enrichment with single IP addresses (using a longest-prefix match). This comes in handy for enriching with a network inventory.
- Tenzir lookup tables support expiration of entries with per-key timeouts. This makes it possible to automatically expire no-longer-relevant entries, e.g., stale observables. There are two types of timeouts: a create timeout that counts down after an entry is inserted into the table and an update timeout that resets when an entry gets accessed.
Packs vs. Packages
Cribl
- Packs bundle configurations and workflows for easy deployment.
- Packs can include routes, pipelines, functions, sample data, and knowledge objects (e.g., lookups, parsers, schemas).
Tenzir
- The library is a set of packages.
- A library corresponds to a GitHub repository.
- A package can include pipelines and contexts.
- The Community Edition has read-only access to the community library.
- The Professional Edition and Enterprise Edition support managing custom libraries.
The Tenzir library is still under development and coming soon with one of the next releases. We're still including a comparison here to explain terminology already.