Skip to main content
Version: v4.24

Tenzir vs. Cribl

We get a lot of questions about Cribl from our users: How do Tenzir pipelines differ? What is the equivalent of a Cribl source and a sink? Does Tenzir have routes? How does Tenzir break events? Does Tenzir have packs? To answer all these questions and quench the thirst of your inquisitive minds, we put together this side-by-side comparison of Cribl and Tenzir.

Product

Cribl

  • Cribl has several products:
    • Cribl Stream: runs pipelines that process data in motion using a JavaScript-based pipeline engine.
    • Cribl Edge: agent to collect data for forwarding to other Cribl products.
    • Cribl Search: cloud-based federated search over remote data sources. Microsoft's Kusto Query language (KQL) is the pipeline language for running queries over data at rest.
    • Cribl Lake: a data lake running on top of public cloud providers
  • Cribl's product suite is closed source.

Tenzir

Architecture

Deployment

Cribl

  • Concepts:
    • Leader Node: a Cribl Stream instance in leader mode to manage configurations and watch Worker Nodes.
    • Worker Node: a Cribl Stream instance in worker mode, managed by a leader node.
    • Worker Group: a collection of worker nodes with the same configuration.
    • Mapping Ruleset: maps nodes to worker groups.
  • The Enterprise Edition supports on-prem hosting instances of leader and workers.

Tenzir

  • Concepts:
    • Node: manages pipelines and optional storage.
    • Platform: centrally manages nodes.
  • You deploy nodes in your infrastructure.
  • Users manage nodes and pipelines through the platform.
  • Nodes connect to the platform on startup.
  • Nodes can run in the cloud and on premises.
  • Tenzir hosts an instance of the platform at app.tenzir.com for the Community Edition, Professional Edition, and Enterprise Edition.
  • The Sovereign Edition allows for an on-premise, air-gapped deployment of the platform.

Pipelines

Cribl

Cribl Stream has the following pipeline concepts:

Tenzir

  • Everything in Tenzir is a pipeline that consist of one or more operators.
  • Tenzir does not have separate abstractions for Sources and Destinations. Rather, operators can be a source (no input, only output), a transformation (input and output), or a sink (only input, no output).

Functions vs. Operators

Cribl

  • A pipeline in Cribl Stream consists of a series of functions.
  • A "pipeline" in Cribl Search consists of a dataset followed by one or more operators.

Tenzir

Routing

Cribl

  • Cribl Stream's Routes are sequential filters that determine the pipelines events should be delivered to.

Tenzir

  • Tenzir uses a publish/subscribe model to support various event forwarding patterns.
  • You can re-implement Cribl Stream Routes using a combination of the publish, subscribe and where operators.

Installation

Provisioning

Cribl

  • Cribl Stream runs on multiple Linux distributions
  • A Docker deployment is also an option.
  • Cribl Stream offers a sizing calculator to estimate CPU and RAM requirements.
  • A typical deployment consists of one more worker processes per machine.
  • To scale horizontally, worker groups can spawn additional workers with the same configuration.

Tenzir

  • Tenzir nodes run natively on any Linux distribution as a static binary
  • A Docker deployment is also an option. The platform generate a Docker Compose file for your node.
  • Tenzir offers a node sizing calculator to estimate CPU cores, RAM, and storage requirements.
  • A typical deployment consists of exactly one Tenzir node process per machine.
  • To scale horizontally, users can spawn multiple nodes, each of which runs a subset of pipelines.
  • To scale vertically, a node uses a thread pool to adapt to the number of available CPU cores.

Executables

Cribl

  • The cribl binary starts/stops a Cribl Stream instance.
  • By default, the UI listens on port 9000.
  • By default, a HTTP In source listens at port 10080.

Tenzir

  • The tenzir executable runs a single pipeline.
  • The tenzir-node executable spawns up a node.
  • If a platform configuration is present, the node attempts to connect to the platform so that you can manage.
  • By default, a node listens on TCP port 5158 for incoming Tenzir connections.
  • There is no default HTTP ingest source, you need to deploy a pipeline for that.

Data Model

Cribl

  • An event is a collection of key-value pairs.
  • Events are JSON objects.
  • Fields starting with a double-underscore are known as internal fields that sources can add to events, e.g., Syslog adds an __srcIpPort field. Internal fields are used within Cribl Stream and are not passed to destinations.
  • Cribl allows users to write JavaScript to process events.

Tenzir

Events:

  • An event is a semi-structured record, similar to a JSON object but with additional data types.
  • Tenzir's type system is a superset of JSON, providing additional first-class types, such as ip, subnet, time, or duration.
  • Events have a schema that includes the field names and types
  • Internally, Tenzir represents events as Apache Arrow record batches, which you can think of as data frames.

Bytes:

  • In addition to events, Tenzir pipelines can also transport raw bytes.
  • The operator decides whether it support bytes, events, or both.
  • All Tenzir connectors produce or consume byte streams; formats parse or print byte streams.

See also the section on dataflow below.

Dataflow

Cribl

  • A source generates bytes or events.
  • A custom command is an optional customization point in the form of an executable that takes bytes on stdin bytes from the source and forwards the command output on stdout downstream.
  • For sources that generates bytes, an event breaker splits bytes into individual events.
  • Fields enable for enrichment on a key-value basis where the key matches a field in an event and the value is a JavaScript expression.
  • A parser is a configuration of the parser function, which extracts fields from events. It supports JSON, CSV, key-value pairs, Grok, regular expressions, among others.
  • The _raw field catches all events that cannot be parsed.
  • Cribl stream sets the event time in the _time field and uses the current wallclock time if there is no suitable field.

Tenzir

  • A source is an operator that only produces data. Source operators that use a loader, such as load and from, produce bytes.
  • A sink is an operator that only consumes data. Sink operators that use a saver, such as save and to, consume bytes.
  • A transformation is an operator that consumes and produces data. Numerous events-to-events transformations allow for shaping the data.
  • A parser converts bytes to events and is used in the read and parse operators. Parsers are equivalent to event breakers. For example, breaking at a newline is equivalent to applying the lines parser. Another event breaker is JSON Array, which lifts every single array element into a dedicated event. In Tenzir, this is a transformation of a list field, since an array (list in Tenzir) is already structured data. The yield operator implements this lifting, e.g., yield xs[] pulls the elements of array xs out as top-level events.
  • A printer converts events to bytes and is used in the write operator.
  • The shell is a bytes-to-bytes transformation that can be placed freely in a pipeline where the operator types match. Unlike Cribl's custom commands, there are no restrictions where to place this operator in a pipeline.
  • Similarly, the python is an events-to-events transformation that can be placed freely in a pipeline where the operator types match. The operator takes inline Python or a path to a file as argument, with the current event being represented by the variable self.
  • The parse operator applies a parser to single field an an event and is equivalent to the Cribl parser function.
  • Parse errors generate a diagnostic that can be processed separately with the diagnostics source operator.
  • There is not special _time field in Tenzir. TODO: discuss extend _time=now() and the timestamp alias.

Use Cases

This section compares how Cribl and Tenzir handle common use cases that we encounter.

Unrolling Arrays

Cribl

  • The unroll function unrolls/explodes an array of objects into individual events.
  • The unroll function can only operate on the string value of an event that has a _raw field.

Tenzir

  • The unroll operator performs the same operation as Cribl's unroll function.
  • The unroll operator can operate on any array in an event.
  • yield performs as similar operation: unroll xs and yield xs[] differ in that the yield operator strips all outer fields and makes the array elements the new top-level event.

Deduplication

Deduplication means removing duplicate events from a stream. Check out our blog post on deduplication that discusses this topic in more depth.

Cribl

Cribl Stream has a Suppress function for deduplicating events.

  • Controls:
    • Key expression: a string that describes a unique key for deduplicating, e.g., ${ip}:${port} refers to fields ip and port.
    • Number to allow: number of events per time period.
    • Suppression period: the interval to suppress events for after the maximum number of allowed events have been seen.
    • Drop suppressed events: flag to control whether events get dropped or enriched with a suppress=1 field.

Cribl Search has a dedup operator.

Tenzir

Tenzir has a deduplicate operator.

  • Controls:
    • Extractors: a list of field names that uniquely identify the event ("key expression").
    • Limit: the number of events to emit per unique key.
    • Timeout: The time that needs to pass until a suppressed event is no longer considered a duplicate. ("suppression period")
    • Distance: The number of events in sequence since the last occurrence of a unique event.

Enrichment

Cribl

  • Lookups are tables usable for enrichment with the lookup function
  • Lookup files can be CSV or GeoIP databases in MMDB format.
  • Changing lookup state must be periodically refreshed by providing a reload interval, which checks the underlying file for changes.
  • For frequently changing data, Cribl recommends the Redis function.

Tenzir

  • Contexts are stateful objects usable for enrichment with the enrich operator.
  • There exist several context types, such as lookup tables, Bloom filters, GeoIP databases, or user-written C++ plugins.
  • Contexts are not static and limited to CSV or MMDB files; you can add data dynamically from any another pipeline, using the context update operator. In other words, you can use all existing connectors and formats to feed data into a context.
  • When Tenzir lookup tables have CIDR subnets as key, you can perform an enrichment with single IP addresses (using a longest-prefix match). This comes in handy for enriching with a network inventory.
  • Tenzir lookup tables support expiration of entries with per-key timeouts. This makes it possible to automatically expire no-longer-relevant entries, e.g., stale observables. There are two types of timeouts: a create timeout that counts down after an entry is inserted into the table and an update timeout that resets when an entry gets accessed.

Packs vs. Packages

Cribl

  • Packs bundle configurations and workflows for easy deployment.
  • Packs can include routes, pipelines, functions, sample data, and knowledge objects (e.g., lookups, parsers, schemas).
  • Cribl hosts various packs at the Packs Dispensary.

Tenzir

  • A library is a set of packages.
  • Packages can include pipelines and contexts.
  • Tenzir maintains an open source Community Library on GitHub.
  • The Professional Edition and Enterprise Edition support managing custom libraries.