Write a Package
This tutorial walks you through the creation of a package, which is a bundle of related pipelines and contexts. You can install packages with a few clicks from the Tenzir Library or deploy them as code.
Map the use case
The goal of a package is to enable a specific use case. In this tutorial, we want to make it easy to detect malicious certificates that are on the SSLBL from abuse.ch. At a glance, the idea is as follows:
- We get SHA1 hashes of SSL certifcates from network monitor logs, such as Zeek or Suricata.
- We check each hash value against a lookup table.
- We generate a detection finding when we encounter a match.
Given this idea, we have to it to the building blocks we have in Tenzir: pipelines and contexts. We can model it as follows.
- A lookup table that includes a copy of the SSLBL data.
- A pipeline that synchronizes the SSLBL data with the lookup table.
- A pipeline to enrich network telemetry and generate detection findings.
Create the scaffold
We begin with creating a package.yaml
file with the following metadata:
Add your pipelines, context, and examples
After providing the package metadata, we now do the heavy lift of writing pipelines, contexts, and examples.
Add a context
First, we need a data structure to hold the SSLBL data so that we can use it
inside the node. TO this end, we define a lookup table in the contexts
section:
Let's figure out how to get data into the context manually. There's a CSV file at https://sslbl.abuse.ch/blacklist/sslblacklist.csv. Let's take a look at that in the browser:
Okay, a simple CSV table with comments. We can write a pipeline for reading that:
Now that we have onboarded the data into a pipeline, we just need to push it
into the context by piping it to context update
:
With context inspect sslbl
we can list the table contents, keyed by SHA1 hash
and ready for enrichment.
Keep the context synchronized
So far we did a one-shot download of the SSLBL data into our lookup table. But the fine folks at abuse.ch update the data regularly, and we want to keep our lookup table in sync with the latest version.
To this end, we do the data onboarding periodically with every
:
every
?We could've also wrapped the entire pipeline in every 1h {…}
with the same
effect. However, if we split the data acquisition from the remaining work, we
have a more local change that turns a one-shot data download into a continuous
stream. This has the effect that the other pipeline operators outside of every
are running continuously, and more importantly, emit metrics continuously and
could also fail earlier, leading to an overall more robust pipeline
architecture.
Now let's copy that into the package definition:
From a developer's perspective, we now have a complete package consisting of a
context and a pipeline that updates it. But from a user's perspective, what do
we do now? This is where the examples
section comes into play.
Entice with examples
After you put thought into implementing the intricate dataflows and thoughtfully
configured pipelines and context, it's time to switch from the developer to the
user persona. As a package developer, you want to make things reusable and easy!
In the examples
section you showcase how users can profit from the work that
the package does behind the scenes.
For our concrete scenario, we now want to use the SSLBL context that we set up and keep up to date. Our "work" was allowing users to just come with a SHA1 hash digest, and the context quickly tells us good or bad.
Make the package configurable
After we have illustrated how the package works with examples, let's step back for a moment and assess how customizable the package should be:
- Is there anything that might differ from user to user?
- Do they have to bring their API key for the package to work?
- Are timeouts highly subjective and specific to the local environment?
For all places where it's difficult to offer a one-size-fits-all assumption, we want the user to make decision on how to proceed. These customization points are called inputs, and the correspondingly named section in the package definition specifies them.
In our case, we hard-coded the refresh interval that updates the SSLBL lookup table to exactly 1 hour. Maybe other users want to update just once a day? Or more often? This is a typical example where policy is user-specific and where we can turn an assumption into a decision. Let's define the input:
By specifying a default value, we also make it easy for users to skip the decision making. In other words, defaults make a configuration knob optional.
After we've defined our configuration knobs, we now go over the pipeline definitions and replace the hard-coded constants with a placeholder:
Note how we simply replaced 1h
with {{ inputs.refresh-interval }}
.
Test your package
When you think you're done, it's time to validate that things work as you expect. This means effectively trying to install, configure and use it.
We currently don't offer a native testing framework for packages where you can provide tests and baselines along the package definition. But we love the idea, and if you do as well, please swing by our Community Discord and discuss it with us.
Installing a package given a package.yaml
file is easiest with the package_add
operator, since
a package is just data:
This fails with the following error:
error: failed to add package
= note: with error: !! unspecified: named argument `header` does not exist
= note: failed to add package
Doh, we didn't substitute the template {{ inputs.refresh-interval }}
. We can
do this by passing one additional argument:
The package should show up in the list of packages after installation:
// tql2
packages
where id == "sslbl"
The pipelines that came with the package also have its ID prefixed:
// tql2
pipelines
where id.starts_with("sslbl")
And the context is also there:
// tql2
contexts
where id.starts_with("sslbl")
Since we checked that everything works as expected, we now remove our package with:
Share and contribute
🙌 Fantastic, you've just wrapped a use case and made it accessible to a broader audience! Now spread the word and share it with the broader community of Tenzir users for an even bigger impact. Here's how:
- Join our Discord server and showcase your package in the show-and-tell channel. We encourage you to seek feedback to make your package even better.
- File a pull request in the official Community Library GitHub repository. All packages in there will automatically show up in the Tenzir Library.
- Share it on social media and let us know. We'll amplify it! 🫶