Skip to main content
Version: Next

Write a Package

This tutorial walks you through the creation of a package, which is a bundle of related pipelines and contexts. You can install packages with a few clicks from the Tenzir Library or deploy them as code.

Map the use case

The goal of a package is to enable a specific use case. In this tutorial, we want to make it easy to detect malicious certificates that are on the SSLBL from abuse.ch. At a glance, the idea is as follows:

  1. We get SHA1 hashes of SSL certifcates from network monitor logs, such as Zeek or Suricata.
  2. We check each hash value against a lookup table.
  3. We generate a detection finding when we encounter a match.

Given this idea, we have to it to the building blocks we have in Tenzir: pipelines and contexts. We can model it as follows.

  1. A lookup table that includes a copy of the SSLBL data.
  2. A pipeline that synchronizes the SSLBL data with the lookup table.
  3. A pipeline to enrich network telemetry and generate detection findings.

Create the scaffold

We begin with creating a package.yaml file with the following metadata:

package.yaml
id: sslbl
name: SSLBL
author: Tenzir
author_icon: https://github.com/tenzir.png
package_icon: |
  https://raw.githubusercontent.com/tenzir/library/main/sslbl/package.svg
description: |
  The [SSLBL](https://sslbl.abuse.ch/) package makes available a lookup table
  with SHA1 hashes of blacklisted certificates that can be used when monitoring
  SSL/TLS certificate exchanges.

Add your pipelines, context, and examples

After providing the package metadata, we now do the heavy lift of writing pipelines, contexts, and examples.

Add a context

First, we need a data structure to hold the SSLBL data so that we can use it inside the node. TO this end, we define a lookup table in the contexts section:

package.yaml
contexts:
  sslbl:
    type: lookup-table
    description: |
      A table that is keyed by SHA1 hashes of SSL certificates on the SSL
      blocklist.

Let's figure out how to get data into the context manually. There's a CSV file at https://sslbl.abuse.ch/blacklist/sslblacklist.csv. Let's take a look at that in the browser:

################################################################
# abuse.ch SSLBL SSL Certificate Blacklist (SHA1 Fingerprints) #
# Last updated: 2024-09-02 06:17:52 UTC                        #
#                                                              #
# Terms Of Use: https://sslbl.abuse.ch/blacklist/              #
# For questions please contact sslbl [at] abuse.ch             #
################################################################
#
# Listingdate,SHA1,Listingreason
2024-09-02 06:17:52,4da14224452c1fe61f46b1112c43ecfd9f322c82,Rhadamanthys C&C
2024-09-02 06:16:33,b331526a0949f88ce218555edf1060c4a02de5a2,Rhadamanthys C&C
2024-09-02 05:19:10,b785d1a9e5784703b98a96698ad05dff5e07229a,DCRat C&C
2024-09-02 05:18:33,be4d4f077bdd90618367eb2ab88f9f3074ccb7aa,AsyncRAT C&C
2024-08-30 13:46:12,6bef207908bfad6b19580067ce770bc820d3d2ef,CobaltStrike C&C
2024-08-29 09:02:20,9a57379949f734b11f32bb8462db1f3fd8898722,DarkGate C&C
2024-08-29 09:01:39,47122f7861b7488f6711c999e40caaa5e560630b,Rhadamanthys C&C
2024-08-29 09:01:33,fb0e13607d045047c29ab44941f450950a349dd7,Rhadamanthys C&C

Okay, a simple CSV table with comments. We can write a pipeline for reading that:

// tql2
load_http "from https://sslbl.abuse.ch/blacklist/sslblacklist.csv"
read_csv comments=true, header="timestamp,SHA1,reason"

Now that we have onboarded the data into a pipeline, we just need to push it into the context by piping it to context update:

// tql2
load_http "from https://sslbl.abuse.ch/blacklist/sslblacklist.csv"
read_csv comments=true, header="timestamp,SHA1,reason"
legacy "context update sslbl --key=SHA1"

With context inspect sslbl we can list the table contents, keyed by SHA1 hash and ready for enrichment.

Keep the context synchronized

So far we did a one-shot download of the SSLBL data into our lookup table. But the fine folks at abuse.ch update the data regularly, and we want to keep our lookup table in sync with the latest version.

To this end, we do the data onboarding periodically with every:

// tql2
every 1h {
  load_http "from https://sslbl.abuse.ch/blacklist/sslblacklist.csv"
  read_csv comments=true, header="timestamp,SHA1,reason"
}
legacy "context update sslbl --key=SHA1"
Why not wrap the entire pipeline in every?

We could've also wrapped the entire pipeline in every 1h {…} with the same effect. However, if we split the data acquisition from the remaining work, we have a more local change that turns a one-shot data download into a continuous stream. This has the effect that the other pipeline operators outside of every are running continuously, and more importantly, emit metrics continuously and could also fail earlier, leading to an overall more robust pipeline architecture.

Now let's copy that into the package definition:

package.yaml
pipelines:
  update-context:
    name: Update SSLBL Context
    description: |
      A pipeline that periodically refreshes the SSLBL lookup table.
    definition: |
      // tql2
      every 1h {
        load_http "from https://sslbl.abuse.ch/blacklist/sslblacklist.csv"
        read_csv comments=true, header="timestamp,SHA1,reason"
      }
      legacy "context update sslbl --key=SHA1"
    restart-on-error: 1 hour

From a developer's perspective, we now have a complete package consisting of a context and a pipeline that updates it. But from a user's perspective, what do we do now? This is where the examples section comes into play.

Entice with examples

After you put thought into implementing the intricate dataflows and thoughtfully configured pipelines and context, it's time to switch from the developer to the user persona. As a package developer, you want to make things reusable and easy! In the examples section you showcase how users can profit from the work that the package does behind the scenes.

For our concrete scenario, we now want to use the SSLBL context that we set up and keep up to date. Our "work" was allowing users to just come with a SHA1 hash digest, and the context quickly tells us good or bad.

package.yaml
examples:
  - name: Enrich Suricata TLS logs with SSLBL domains
    description: |
      Enriches the certificate SHA1 fingerprint from Suricata TLS logs with the
      SSLBL data.
    definition: |
      // tql2
      subscribe "suricata"
      where @name == "suricata.tls"
      sha1 = tls.fingerprint.replace(":", "")
      legacy "enrich sha1 sslbl"
 
  - name: Display top-10 listing reasons
    description: |
      Shows a bar chart of the top-10 reasons why a certificate is in the
      dataset.
    definition: |
      context inspect sslbl
      | yield value
      | top reason
      | head
      | chart bar

Make the package configurable

After we have illustrated how the package works with examples, let's step back for a moment and assess how customizable the package should be:

  • Is there anything that might differ from user to user?
  • Do they have to bring their API key for the package to work?
  • Are timeouts highly subjective and specific to the local environment?

For all places where it's difficult to offer a one-size-fits-all assumption, we want the user to make decision on how to proceed. These customization points are called inputs, and the correspondingly named section in the package definition specifies them.

In our case, we hard-coded the refresh interval that updates the SSLBL lookup table to exactly 1 hour. Maybe other users want to update just once a day? Or more often? This is a typical example where policy is user-specific and where we can turn an assumption into a decision. Let's define the input:

package.yaml
inputs:
  refresh-interval:
    name: Time between context updates
    description: |
      The duration between updates that fetch the SSLBL database via the API.
    default: 1 hour

By specifying a default value, we also make it easy for users to skip the decision making. In other words, defaults make a configuration knob optional.

After we've defined our configuration knobs, we now go over the pipeline definitions and replace the hard-coded constants with a placeholder:

package.yaml
pipelines:
  update-context:
    name: Update SSLBL Context
    description: |
      A pipeline that periodically refreshes the SSLBL lookup table.
    definition: |
      // tql2
      every {{ inputs.refresh-interval }} {
        load_http "from https://sslbl.abuse.ch/blacklist/sslblacklist.csv"
        read_csv comments=true, header="timestamp,SHA1,reason"
        legacy "context update sslbl --key=SHA1"
      }
    restart-on-error: 1 hour

Note how we simply replaced 1h with {{ inputs.refresh-interval }}.

Test your package

When you think you're done, it's time to validate that things work as you expect. This means effectively trying to install, configure and use it.

Testing Framework

We currently don't offer a native testing framework for packages where you can provide tests and baselines along the package definition. But we love the idea, and if you do as well, please swing by our Community Discord and discuss it with us.

Installing a package given a package.yaml file is easiest with the package_add operator, since a package is just data:

package_add "/path/to/package.yaml"

This fails with the following error:

error: failed to add package
= note: with error: !! unspecified: named argument `header` does not exist
= note: failed to add package

Doh, we didn't substitute the template {{ inputs.refresh-interval }}. We can do this by passing one additional argument:

package_add "/path/to/package.yaml", inputs={
  "refresh-interval": 1h
}

The package should show up in the list of packages after installation:

// tql2
packages
where id == "sslbl"

The pipelines that came with the package also have its ID prefixed:

// tql2
pipelines
where id.starts_with("sslbl")

And the context is also there:

// tql2
contexts
where id.starts_with("sslbl")

Since we checked that everything works as expected, we now remove our package with:

// tql2
package_remove "sslbl"

Share and contribute

🙌 Fantastic, you've just wrapped a use case and made it accessible to a broader audience! Now spread the word and share it with the broader community of Tenzir users for an even bigger impact. Here's how:

  1. Join our Discord server and showcase your package in the show-and-tell channel. We encourage you to seek feedback to make your package even better.
  2. File a pull request in the official Community Library GitHub repository. All packages in there will automatically show up in the Tenzir Library.
  3. Share it on social media and let us know. We'll amplify it! 🫶