Version: Next

Tenzir vs. Cribl

We get a lot of questions about Cribl from our users:

How do Tenzir pipelines differ?
What is the equivalent of a Cribl source and destination?
Does Tenzir have routes?
How does Tenzir break events?
Does Tenzir have packs?

To answer all these questions and quench the thirst of your inquisitive minds, we put together this side-by-side comparison of Cribl and Tenzir.

Product

Cribl

Cribl has several products:
- Cribl Stream: runs pipelines that process data in motion using a JavaScript-based pipeline engine.
- Cribl Edge: agent to collect data for forwarding to other Cribl products.
- Cribl Search: cloud-based federated search over remote data sources. Microsoft's Kusto Query language (KQL) is the pipeline language for running queries over data at rest.
- Cribl Lake: a data lake running on top of public cloud providers
Cribl's product suite is closed source.

Tenzir

Tenzir has a single, unified product. The Tenzir Query Language (TQL) is a unified language to process historical and streaming data. Users deploy nodes in that can be managed through the platform at app.tenzir.com.
Tenzir is an open-core product, with an open-source project and a commercial platform for enterprise needs.

Architecture

Deployment

Cribl

Concepts:
- Leader Node: a Cribl Stream instance in leader mode to manage configurations and watch Worker Nodes.
- Worker Node: a Cribl Stream instance in worker mode, managed by a leader node.
- Worker Group: a collection of worker nodes with the same configuration.
- Mapping Ruleset: maps nodes to worker groups.
The Enterprise Edition supports on-prem hosting instances of leader and workers.

Tenzir

Concepts:
- Node: manages pipelines and optional storage.
- Platform: centrally manages nodes.
You deploy nodes in your infrastructure.
Users manage nodes and pipelines through the platform.
Nodes connect to the platform on startup.
Nodes can run in the cloud and on premises.
Tenzir hosts an instance of the platform at app.tenzir.com for the Community Edition, Professional Edition, and Enterprise Edition.
The Sovereign Edition allows for an on-premise, air-gapped deployment of the platform.

Pipelines

Cribl

Cribl Stream has the following pipeline concepts:

Sources: configurations to collect data from remote resources
Pipelines: a series of functions that process data, attached to routes
- Pre-processing Pipelines: attached to sources, e.g., to apply function to all input events
- Post-processing Pipelines: attached to destinations, e.g., to apply function to all output events
Routes: assign events to pipelines
Destinations: receive data

Tenzir

Everything in Tenzir is a pipeline that consist of one or more operators.
Pipeline operators can be a input, a transformation, or an output.
Tenzir will soon feature Sources and Destinations as concepts on top of pipeline operators.

Functions vs. Operators

Cribl

A pipeline in Cribl Stream consists of a series of functions.
A "pipeline" in Cribl Search consists of a dataset followed by one or more operators.

Tenzir

Tenzir does not differentiate between streaming and historical search pipelines. To run a historical query, simply use the export input operator.
Tenzir operators are typed, supporting both unstructured data (bytes) and structured data (events), as well as conversions betwen the two types.

Routing

Cribl

Cribl Stream's Routes are sequential filters that determine the pipelines events should be delivered to.

Tenzir

Tenzir uses a publish/subscribe model to support various event forwarding patterns.
You can re-implement Cribl Stream Routes using a combination of the publish, subscribe and where operators.

Installation

Provisioning

Cribl

Cribl Stream runs on multiple Linux distributions
A Docker deployment is also an option.
Cribl Stream offers a sizing calculator to estimate CPU and RAM requirements.
A typical deployment consists of one more worker processes per machine.
To scale horizontally, worker groups can spawn additional workers with the same configuration.

Tenzir

Tenzir nodes run natively on any Linux distribution as a static binary
A Docker deployment is also an option. The platform generate a Docker Compose file for your node.
Tenzir offers a node sizing calculator to estimate CPU cores, RAM, and storage requirements.
A typical deployment consists of exactly one Tenzir node process per machine.
To scale horizontally, users can spawn multiple nodes, each of which runs a subset of pipelines.
To scale vertically, a node uses a thread pool to adapt to the number of available CPU cores.

Executables

Cribl

The cribl binary starts/stops a Cribl Stream instance.
By default, the UI listens on port 9000.
By default, a HTTP In source listens at port 10080.

Tenzir

The tenzir executable runs a single pipeline.
The tenzir-node executable spawns up a node.
If a platform configuration is present, the node attempts to connect to the platform so that you can manage.
By default, a node listens on TCP port 5158 for incoming Tenzir connections.
There is no default HTTP ingest source, you need to deploy a pipeline for that.

Data Model

Cribl

An event is a collection of key-value pairs.
Events are JSON objects.
Fields starting with a double-underscore are known as internal fields that sources can add to events, e.g., Syslog adds an __srcIpPort field. Internal fields are used within Cribl Stream and are not passed to destinations.
Cribl allows users to write JavaScript to process events.

Tenzir

Events:

An event is a semi-structured record, similar to a JSON object but with additional data types.
Tenzir's type system is a superset of JSON, providing additional first-class types, such as ip, subnet, time, and duration.
Events have a schema that includes the field names and types
Internally, Tenzir represents events as Apache Arrow record batches, which you can think of as data frames.

Bytes:

In addition to events, Tenzir pipelines can also transport raw bytes.
The operator decides whether it support bytes, events, or both.
All Tenzir connectors produce or consume byte streams; formats parse or print byte streams.

Dataflow

Cribl

A source generates bytes or events.
- Collector sources fetch data in a triggered fashion.
- Push sources send data to Cribl.
- Pull sources continuously fetch data.
- System sources generate events about Cribl itself
- Internal sources are similar to system sources but do not count towards license usage.
A custom command is an optional customization point in the form of an executable that takes bytes on stdin bytes from the source and forwards the command output on stdout downstream.
For sources that generates bytes, an event breaker splits bytes into individual events.
Fields enable for enrichment on a key-value basis where the key matches a field in an event and the value is a JavaScript expression.
A parser is a configuration of the parser function, which extracts fields from events. It supports JSON, CSV, key-value pairs, Grok, regular expressions, among others.
The _raw field catches all events that cannot be parsed.
Cribl stream sets the event time in the _time field and uses the current wallclock time if there is no suitable field.

Tenzir

An input operator produces data.
A output operator consumes data.
A transformation operator consumes and produces data. Events-to-events transformations make it easy to shape data.
A parser is a bytes-to-events operator. Parsers are equivalent to event breakers. For example, breaking at a newline is equivalent to applying the read_lines parser.
A printer is an events-to-bytes operator.
The shell is a bytes-to-bytes transformation that can be placed freely in a pipeline where the operator types match. Unlike Cribl's custom commands, there are no restrictions where to place this operator in a pipeline.
Similarly, the python is an events-to-events transformation that can be placed freely in a pipeline where the operator upstream/downstream types match. The operator takes inline Python or a path to a file as argument, with the current event being represented by the variable self.
Parse errors generate a diagnostic that can be processed separately with the diagnostics input operator.
There is not special _time field in Tenzir.

Use Cases

This section compares how Cribl and Tenzir handle common use cases that we encounter.

Unrolling Arrays

Cribl

The unroll function unrolls/explodes an array of objects into individual events.
The unroll function can only operate on the string value of an event that has a _raw field.

Tenzir

The unroll operator performs the same operation as Cribl's unroll function.
The unroll operator can operate on any array in an event.

Deduplication

Deduplication means removing duplicate events from a stream. Check out our blog post on deduplication that discusses this topic in more depth.

Cribl

Cribl Stream has a Suppress function for deduplicating events.

Controls:
- Key expression: a string that describes a unique key for deduplicating, e.g., ${ip}:${port} refers to fields ip and port.
- Number to allow: number of events per time period.
- Suppression period: the interval to suppress events for after the maximum number of allowed events have been seen.
- Drop suppressed events: flag to control whether events get dropped or enriched with a suppress=1 field.

Cribl Search has a dedup operator.

Tenzir

Tenzir has a deduplicate operator.

Controls:
- Extractors: a list of field names that uniquely identify the event ("key expression").
- Limit: the number of events to emit per unique key.
- Timeout: The time that needs to pass until a suppressed event is no longer considered a duplicate. ("suppression period")
- Distance: The number of events in sequence since the last occurrence of a unique event.

Enrichment

Cribl

Lookups are tables usable for enrichment with the lookup function
Lookup files can be CSV or GeoIP databases in MMDB format.
Changing lookup state must be periodically refreshed by providing a reload interval, which checks the underlying file for changes.
For frequently changing data, Cribl recommends the Redis function.

Tenzir

Contexts are stateful objects usable for enrichment with the context::enrich operator.
There exist several context types, such as lookup tables, Bloom filters, GeoIP databases, or user-written C++ plugins.
Contexts are not static and limited to CSV or MMDB files; you can add data dynamically from any another pipeline, using the context::* management operators. That is, you can use all existing operators to get data in and then use it to update a context.
When Tenzir lookup tables have CIDR subnets as key, you can perform an enrichment with single IP addresses (using a longest-prefix match). This comes in handy for enriching with a network inventory.
Tenzir lookup tables support expiration of entries with per-key timeouts. This makes it possible to automatically expire no-longer-relevant entries, e.g., stale observables.
Tenzir lookup tables support aggregation functions as values so that you can easily build passive DNS or an asset inventory by extracting suitable information from events.

Packs vs. Packages

Cribl

Packs bundle configurations and workflows for easy deployment.
Packs can include routes, pipelines, functions, sample data, and knowledge objects (e.g., lookups, parsers, schemas).
Cribl hosts various packs at the Packs Dispensary.

Tenzir

A library is a set of packages.
Packages can include pipelines and contexts.
Tenzir maintains an open source Community Library on GitHub.
The Professional Edition and Enterprise Edition support managing custom libraries.

Tenzir vs. Cribl

Product​

Cribl​

Tenzir​

Architecture​

Deployment​

Cribl​

Tenzir​

Pipelines​

Cribl​

Tenzir​

Functions vs. Operators​

Cribl​

Tenzir​

Routing​

Cribl​

Tenzir​

Installation​

Provisioning​

Cribl​

Tenzir​

Executables​

Cribl​

Tenzir​

Data Model​

Cribl​

Tenzir​

Dataflow​

Cribl​

Tenzir​

Use Cases​

Unrolling Arrays​

Cribl​

Tenzir​

Deduplication​

Cribl​

Tenzir​

Enrichment​

Cribl​

Tenzir​

Packs vs. Packages​

Cribl​

Tenzir​

Product

Cribl

Tenzir

Architecture

Deployment

Cribl

Tenzir

Pipelines

Cribl

Tenzir

Functions vs. Operators

Cribl

Tenzir

Routing

Cribl

Tenzir

Installation

Provisioning

Cribl

Tenzir

Executables

Cribl

Tenzir

Data Model

Cribl

Tenzir

Dataflow

Cribl

Tenzir

Use Cases

Unrolling Arrays

Cribl

Tenzir

Deduplication

Cribl

Tenzir

Enrichment

Cribl

Tenzir

Packs vs. Packages

Cribl

Tenzir