Skip to main content
Version: Next

Tenzir vs. Cribl

We get a lot of questions about Cribl from our users:

  • How do Tenzir pipelines differ?
  • What is the equivalent of a Cribl source and destination?
  • Does Tenzir have routes?
  • How does Tenzir break events?
  • Does Tenzir have packs?

To answer all these questions and quench the thirst of your inquisitive minds, we put together this side-by-side comparison of Cribl and Tenzir.

Product

Cribl

  • Cribl has several products:
    • Cribl Stream: runs pipelines that process data in motion using a JavaScript-based pipeline engine.
    • Cribl Edge: agent to collect data for forwarding to other Cribl products.
    • Cribl Search: cloud-based federated search over remote data sources. Microsoft's Kusto Query language (KQL) is the pipeline language for running queries over data at rest.
    • Cribl Lake: a data lake running on top of public cloud providers
  • Cribl's product suite is closed source.

Tenzir

  • Tenzir has a single, unified product. The Tenzir Query Language (TQL) is a unified language to process historical and streaming data. Users deploy nodes in that can be managed through the platform at app.tenzir.com.
  • Tenzir is an open-core product, with an open-source project and a commercial platform for enterprise needs.

Architecture

Deployment

Cribl

  • Concepts:
    • Leader Node: a Cribl Stream instance in leader mode to manage configurations and watch Worker Nodes.
    • Worker Node: a Cribl Stream instance in worker mode, managed by a leader node.
    • Worker Group: a collection of worker nodes with the same configuration.
    • Mapping Ruleset: maps nodes to worker groups.
  • The Enterprise Edition supports on-prem hosting instances of leader and workers.

Tenzir

  • Concepts:
    • Node: manages pipelines and optional storage.
    • Platform: centrally manages nodes.
  • You deploy nodes in your infrastructure.
  • Users manage nodes and pipelines through the platform.
  • Nodes connect to the platform on startup.
  • Nodes can run in the cloud and on premises.
  • Tenzir hosts an instance of the platform at app.tenzir.com for the Community Edition, Professional Edition, and Enterprise Edition.
  • The Sovereign Edition allows for an on-premise, air-gapped deployment of the platform.

Pipelines

Cribl

Cribl Stream has the following pipeline concepts:

Tenzir

  • Everything in Tenzir is a pipeline that consist of one or more operators.
  • Pipeline operators can be a input, a transformation, or an output.
  • Tenzir will soon feature Sources and Destinations as concepts on top of pipeline operators.

Functions vs. Operators

Cribl

  • A pipeline in Cribl Stream consists of a series of functions.
  • A "pipeline" in Cribl Search consists of a dataset followed by one or more operators.

Tenzir

  • Tenzir does not differentiate between streaming and historical search pipelines. To run a historical query, simply use the export input operator.
  • Tenzir operators are typed, supporting both unstructured data (bytes) and structured data (events), as well as conversions betwen the two types.

Routing

Cribl

  • Cribl Stream's Routes are sequential filters that determine the pipelines events should be delivered to.

Tenzir

  • Tenzir uses a publish/subscribe model to support various event forwarding patterns.
  • You can re-implement Cribl Stream Routes using a combination of the publish, subscribe and where operators.

Installation

Provisioning

Cribl

  • Cribl Stream runs on multiple Linux distributions
  • A Docker deployment is also an option.
  • Cribl Stream offers a sizing calculator to estimate CPU and RAM requirements.
  • A typical deployment consists of one more worker processes per machine.
  • To scale horizontally, worker groups can spawn additional workers with the same configuration.

Tenzir

  • Tenzir nodes run natively on any Linux distribution as a static binary
  • A Docker deployment is also an option. The platform generate a Docker Compose file for your node.
  • Tenzir offers a node sizing calculator to estimate CPU cores, RAM, and storage requirements.
  • A typical deployment consists of exactly one Tenzir node process per machine.
  • To scale horizontally, users can spawn multiple nodes, each of which runs a subset of pipelines.
  • To scale vertically, a node uses a thread pool to adapt to the number of available CPU cores.

Executables

Cribl

  • The cribl binary starts/stops a Cribl Stream instance.
  • By default, the UI listens on port 9000.
  • By default, a HTTP In source listens at port 10080.

Tenzir

  • The tenzir executable runs a single pipeline.
  • The tenzir-node executable spawns up a node.
  • If a platform configuration is present, the node attempts to connect to the platform so that you can manage.
  • By default, a node listens on TCP port 5158 for incoming Tenzir connections.
  • There is no default HTTP ingest source, you need to deploy a pipeline for that.

Data Model

Cribl

  • An event is a collection of key-value pairs.
  • Events are JSON objects.
  • Fields starting with a double-underscore are known as internal fields that sources can add to events, e.g., Syslog adds an __srcIpPort field. Internal fields are used within Cribl Stream and are not passed to destinations.
  • Cribl allows users to write JavaScript to process events.

Tenzir

Events:

  • An event is a semi-structured record, similar to a JSON object but with additional data types.
  • Tenzir's type system is a superset of JSON, providing additional first-class types, such as ip, subnet, time, and duration.
  • Events have a schema that includes the field names and types
  • Internally, Tenzir represents events as Apache Arrow record batches, which you can think of as data frames.

Bytes:

  • In addition to events, Tenzir pipelines can also transport raw bytes.
  • The operator decides whether it support bytes, events, or both.
  • All Tenzir connectors produce or consume byte streams; formats parse or print byte streams.

See also the section on dataflow below.

Dataflow

Cribl

  • A source generates bytes or events.
  • A custom command is an optional customization point in the form of an executable that takes bytes on stdin bytes from the source and forwards the command output on stdout downstream.
  • For sources that generates bytes, an event breaker splits bytes into individual events.
  • Fields enable for enrichment on a key-value basis where the key matches a field in an event and the value is a JavaScript expression.
  • A parser is a configuration of the parser function, which extracts fields from events. It supports JSON, CSV, key-value pairs, Grok, regular expressions, among others.
  • The _raw field catches all events that cannot be parsed.
  • Cribl stream sets the event time in the _time field and uses the current wallclock time if there is no suitable field.

Tenzir

  • An input operator produces data.
  • A output operator consumes data.
  • A transformation operator consumes and produces data. Events-to-events transformations make it easy to shape data.
  • A parser is a bytes-to-events operator. Parsers are equivalent to event breakers. For example, breaking at a newline is equivalent to applying the read_lines parser.
  • A printer is an events-to-bytes operator.
  • The shell is a bytes-to-bytes transformation that can be placed freely in a pipeline where the operator types match. Unlike Cribl's custom commands, there are no restrictions where to place this operator in a pipeline.
  • Similarly, the python is an events-to-events transformation that can be placed freely in a pipeline where the operator upstream/downstream types match. The operator takes inline Python or a path to a file as argument, with the current event being represented by the variable self.
  • Parse errors generate a diagnostic that can be processed separately with the diagnostics input operator.
  • There is not special _time field in Tenzir.

Use Cases

This section compares how Cribl and Tenzir handle common use cases that we encounter.

Unrolling Arrays

Cribl

  • The unroll function unrolls/explodes an array of objects into individual events.
  • The unroll function can only operate on the string value of an event that has a _raw field.

Tenzir

  • The unroll operator performs the same operation as Cribl's unroll function.
  • The unroll operator can operate on any array in an event.

Deduplication

Deduplication means removing duplicate events from a stream. Check out our blog post on deduplication that discusses this topic in more depth.

Cribl

Cribl Stream has a Suppress function for deduplicating events.

  • Controls:
    • Key expression: a string that describes a unique key for deduplicating, e.g., ${ip}:${port} refers to fields ip and port.
    • Number to allow: number of events per time period.
    • Suppression period: the interval to suppress events for after the maximum number of allowed events have been seen.
    • Drop suppressed events: flag to control whether events get dropped or enriched with a suppress=1 field.

Cribl Search has a dedup operator.

Tenzir

Tenzir has a deduplicate operator.

  • Controls:
    • Extractors: a list of field names that uniquely identify the event ("key expression").
    • Limit: the number of events to emit per unique key.
    • Timeout: The time that needs to pass until a suppressed event is no longer considered a duplicate. ("suppression period")
    • Distance: The number of events in sequence since the last occurrence of a unique event.

Enrichment

Cribl

  • Lookups are tables usable for enrichment with the lookup function
  • Lookup files can be CSV or GeoIP databases in MMDB format.
  • Changing lookup state must be periodically refreshed by providing a reload interval, which checks the underlying file for changes.
  • For frequently changing data, Cribl recommends the Redis function.

Tenzir

  • Contexts are stateful objects usable for enrichment with the context::enrich operator.
  • There exist several context types, such as lookup tables, Bloom filters, GeoIP databases, or user-written C++ plugins.
  • Contexts are not static and limited to CSV or MMDB files; you can add data dynamically from any another pipeline, using the context::* management operators. That is, you can use all existing operators to get data in and then use it to update a context.
  • When Tenzir lookup tables have CIDR subnets as key, you can perform an enrichment with single IP addresses (using a longest-prefix match). This comes in handy for enriching with a network inventory.
  • Tenzir lookup tables support expiration of entries with per-key timeouts. This makes it possible to automatically expire no-longer-relevant entries, e.g., stale observables.
  • Tenzir lookup tables support aggregation functions as values so that you can easily build passive DNS or an asset inventory by extracting suitable information from events.

Packs vs. Packages

Cribl

  • Packs bundle configurations and workflows for easy deployment.
  • Packs can include routes, pipelines, functions, sample data, and knowledge objects (e.g., lookups, parsers, schemas).
  • Cribl hosts various packs at the Packs Dispensary.

Tenzir