Skip to main content

· 5 min read
Matthias Vallentin

We are thrilled to announce that after a year of rigorous development and testing, the Tenzir Platform is now generally available! Our journey began at Black Hat 2023, where we first introduced the Tenzir Platform in early access mode. Over the past year, we've worked diligently, incorporating user feedback and refining our technology to ensure stability and performance. Today, at Black Hat 2024, we are proud to officially launch the Tenzir Platform, confident in its capabilities and excited to bring it to a wider audience.

· 5 min read
Matthias Vallentin

In the bustling world of data operations, handling large volumes of information is an everyday affair. Each day, countless bytes of data move around in systems, challenging organizations to maintain data accuracy, efficiency, and cost-effectiveness. Amid this vast data landscape, one concept has emerged as a critical ally—deduplication.

· 2 min read
Dominik Lohmann

Did you ever want to get a sneak peek behind the scenes at Tenzir? Now you can!

· 3 min read
Matthias Vallentin

We re-wired Tenzir's fluent-bit operator and introduced a significant performance boost as a side effect: A 3–5x gain for throughput in events per second (EPS) and 4–8x improvement of latency in terms of processing time.

· 9 min read
Matthias Vallentin

How would you create a contextualization engine? What are the essential building blocks? We asked ourselves these questions after studying what's out there and built from scratch a high-performance contextualization framework in Tenzir. This blog post introduces this brand-new framework, provides usage examples, and describes how you can build your own context plugin.

· 6 min read
Matthias Vallentin

Enrichment is a major part of a security data lifecycle and can take on many forms: adding GeoIP locations for all IP addresses in a log, attaching asset inventory data via user or hostname lookups, or extending alerts with magic score to bump it up the triaging queue. The goal is always to make the data more actionable by providing a better ground for decision making.

This is the first part of series of blog posts on contextualization. We kick things off by looking at how existing systems do enrichment. In the next blog post, we introduce how we address this use case with pipeline-first mindset in the Tenzir stack.

· 6 min read
Matthias Vallentin

The new yara operator matches YARA rules on bytes, producing a structured match output to conveniently integrate alerting tools or trigger next processing steps in your detection workflows.

· 4 min read
Christoph Lobmeyer
Matthias Vallentin

The new velociraptor operator allows you to run Velociraptor Query Language (VQL) expressions against a Velociraptor server and process the results in a Tenzir pipeline. You can also subscribe to matching artifacts in hunt flows over a large fleet of assets, making endpoint telemetry collection and processing a breeze.

· 15 min read
Matthias Vallentin

One thing we are observing is that organizations are actively seeking out solutions to better manage their security data operations. Until recently, they have been aggressively repurposing common data and observability tools. I believe that this is a stop-gap measure because there was no alternative. But now there is a growing ecosystem of security data operations tools to support the modern security data stack. Ross Haleliuk's epic article lays this out at length.

In this article I am explaining the underlying design principles for developing our own data pipeline engine, coming from the perspective of security teams that are building out their detection and response architecture. These principles emerged during design and implementation. Many times, we asked ourselves "what's the right way of solving this problem?" We often went back to the drawing board and started challenging existing approaches, such as what a data source is, or what a connector should do. To our surprise, we found a coherent way to answer these questions without having to make compromises. When things feel Just Right, it is a good sign to have found the right solution for a particular problem. What we are describing here are the lessons learned from studying other systems, distilled as principles to follow for others.

· 5 min read
Oliver Rochford

In today's digital age, businesses are under immense pressure to bolster their cybersecurity. Understanding the financial implications of security tools is vital to ensure optimal ROI through risk reduction and breach resilience. This is particularly true for consumption-based security solutions like Security Information and Event Management (SIEM).

· 8 min read
Matthias Vallentin

Elastic just released their new pipeline query language called ES|QL. This is a conscious attempt to consolidate the language zoo in the Elastic ecosystem (queryDSL, EQL, KQL, SQL, Painless, Canvas/Timelion). Elastic said that they worked on this effort for over a year. The documentation is still sparse, but we still tried to read between the lines to understand what this new pipeline language has to offer.

· 2 min read
Oliver Rochford

Staying ahead in the realm of cybersecurity means relentlessly navigating an endless sea of emerging threats and ever-increasing data volumes. The battle to stay one step ahead can often feel overwhelming, especially when your organization's data costs are skyrocketing.

· 5 min read
Oliver Rochford

We're overjoyed to announce our highly-anticipated security data pipeline platform at the renowned BlackHat conference in Las Vegas. The launch marks a milestone in our journey to bring simplicity to data engineering for cybersecurity operations, and to bring a cost-efficient way to tackle the increasingly complex data engineering challenges that security teams confront daily.

· 9 min read
Matthias Vallentin

Our Tenzir Query Language (TQL) is a pipeline language that works by chaining operators into data flows. When we designed TQL, we specifically studied Splunk's Search Processing Language (SPL), as it generally leaves a positive impression for security analysts that are not data engineers. Our goal was to take all the good things of SPL, but provide a more powerful language without compromising simplicity. In this blog post, we explain how the two languages differ using concrete threat hunting examples.

· 5 min read
Matthias Vallentin

Did you know that Zeek supports log rotation triggers, so that you can do anything you want with a newly rotated batch of logs?

· 5 min read
Matthias Vallentin

As an incident responder, threat hunter, or detection engineer, getting quickly to your analytics is key for productivity. For network-based visibility and detection, Zeek and Suricata are the bedrock for many security teams. But operationalizing these tools can take a good chunk of time.

So we asked ourselves: How can we make it super easy to work with Zeek and Suricata logs?

· 3 min read
Matthias Vallentin

Zeek turns packets into structured logs. By default, Zeek generates one file per log type and per rotation timeframe. If you don't want to wrangle files and directly process the output, this short blog post is for you.

· 8 min read
Matthias Vallentin

Zeek offers many ways to produce and consume logs. In this blog, we explain the various Zeek logging formats and show how you can get the most out of Zeek with Tenzir. We conclude with recommendations for when to use what Zeek format based on your use case.

· 2 min read
Dominik Lohmann

VAST is now Tenzir. This blog post describes what changed when we renamed the project.

· 6 min read
Matthias Vallentin
Thomas Peiselt

Apache Parquet is the common denominator for structured data at rest. The data science ecosystem has long appreciated this. But infosec? Why should you care about Parquet when building a threat detection and investigation platform? In this blog post series we share our opinionated view on this question. In the next three blog posts, we

  1. describe how VAST uses Parquet and its little brother Feather
  2. benchmark the two formats against each other for typical workloads
  3. share our experience with all the engineering gotchas we encountered along the way

· 5 min read
Matthias Vallentin

The VAST project is roughly a decade old. But what happened over the last 10 years? This blog post looks back over time through the lens of the git merge commits.

Why merge commits? Because they represent a unit of completed contribution. Feature work takes place in dedicated branches, with the merge to the main branch sealing the deal. Some feature branches have just one commit, whereas others dozens. The distribution is not uniform. As of 6f9c84198 on Sep 2, 2022, there are a total of 13,066 commits, with 2,334 being merges (17.9%). We’ll take a deeper look at the merge commits.

· 5 min read
Matthias Vallentin

VAST's Sigma frontend now supports more modifiers. In the Sigma language, modifiers transform predicates in various ways, e.g., to apply a function over a value or to change the operator of a predicate. Modifiers are the customization point to enhance expressiveness of query operations.

The new pySigma effort, which will eventually replace the now-considered-legacy sigma project, comes with new modifiers as well. Most notably, lt, lte, gt, gte provide comparisons over value domains with a total ordering, e.g., numbers: x >= 42. In addition, the cidr modifier interprets a value as subnet, e.g., 10.0.0.0/8. Richer typing!

· 6 min read
Matthias Vallentin

VAST bets on Apache Arrow as the open interface to structured data. By "bet," we mean that VAST does not work without Arrow. And we are not alone. Influx's IOx, DataDog's Husky, Anyscale's Ray, TensorBase, and others committed themselves to making Arrow a corner stone of their system architecture. For us, Arrow was not always a required dependency. We shifted to a tighter integration over the years as the Arrow ecosystem matured. In this blog post we explain our journey of becoming an Arrow-native engine.