Blog Archive | Tenzir

Skip to main content

Tenzir Platform is Now Generally Available

August 6, 2024 · 4 min read

Matthias Vallentin

Founder & CEO

We are thrilled to announce that after a year of rigorous development and testing, the Tenzir Platform is now generally available! Our journey began at Black Hat 2023, where we first introduced the Tenzir Platform in early access mode. Over the past year, we've worked diligently, incorporating user feedback and refining our technology to ensure stability and performance. Today, at Black Hat 2024, we are proud to officially launch the Tenzir Platform, confident in its capabilities and excited to bring it to a wider audience.

An Intern's Reflection

June 14, 2024 · 7 min read

Bala Vinaithirthan

Software Engineering Intern

I spent the past twelve weeks interning at Tenzir and am excited to share my experiences.

Reduce Cost and Noise with Deduplication

March 28, 2024 · 5 min read

Matthias Vallentin

Founder & CEO

In the bustling world of data operations, handling large volumes of information is an everyday affair. Each day, countless bytes of data move around in systems, challenging organizations to maintain data accuracy, efficiency, and cost-effectiveness. Amid this vast data landscape, one concept has emerged as a critical ally—deduplication.

Introducing Office Hours

February 13, 2024 · 2 min read

Dominik Lohmann

VP Engineering

Did you ever want to get a sneak peek behind the scenes at Tenzir? Now you can!

Switching Fluent Bit from JSON to MsgPack

January 10, 2024 · 3 min read

Matthias Vallentin

Founder & CEO

We re-wired Tenzir's fluent-bit operator and introduced a significant performance boost as a side effect: A 3–5x gain for throughput in events per second (EPS) and 4–8x improvement of latency in terms of processing time.

Contextualization Made Simple

December 7, 2023 · 9 min read

Matthias Vallentin

Founder & CEO

How would you create a contextualization engine? What are the essential building blocks? We asked ourselves these questions after studying what's out there and built from scratch a high-performance contextualization framework in Tenzir. This blog post introduces this brand-new framework, provides usage examples, and describes how you can build your own context plugin.

Enrichment Complexity in the Wild

November 27, 2023 · 6 min read

Matthias Vallentin

Founder & CEO

Enrichment is a major part of a security data lifecycle and can take on many forms: adding GeoIP locations for all IP addresses in a log, attaching asset inventory data via user or hostname lookups, or extending alerts with magic score to bump it up the triaging queue. The goal is always to make the data more actionable by providing a better ground for decision making.

This is the first part of series of blog posts on contextualization. We kick things off by looking at how existing systems do enrichment. In the next blog post, we introduce how we address this use case with pipeline-first mindset in the Tenzir stack.

Matching YARA Rules in Byte Pipelines

November 1, 2023 · 6 min read

Matthias Vallentin

Founder & CEO

The new yara operator matches YARA rules on bytes, producing a structured match output to conveniently integrate alerting tools or trigger next processing steps in your detection workflows.

Integrating Velociraptor into Tenzir Pipelines

October 19, 2023 · 4 min read

Christoph Lobmeyer

Senior Expert Incident Response (External)

Matthias Vallentin

Founder & CEO

The new velociraptor operator allows you to run Velociraptor Query Language (VQL) expressions against a Velociraptor server and process the results in a Tenzir pipeline. You can also subscribe to matching artifacts in hunt flows over a large fleet of assets, making endpoint telemetry collection and processing a breeze.

Five Design Principles for Building a Data Pipeline Engine

October 17, 2023 · 15 min read

Matthias Vallentin

Founder & CEO

One thing we are observing is that organizations are actively seeking out solutions to better manage their security data operations. Until recently, they have been aggressively repurposing common data and observability tools. I believe that this is a stop-gap measure because there was no alternative. But now there is a growing ecosystem of security data operations tools to support the modern security data stack. Ross Haleliuk's epic article lays this out at length.

In this article I am explaining the underlying design principles for developing our own data pipeline engine, coming from the perspective of security teams that are building out their detection and response architecture. These principles emerged during design and implementation. Many times, we asked ourselves "what's the right way of solving this problem?" We often went back to the drawing board and started challenging existing approaches, such as what a data source is, or what a connector should do. To our surprise, we found a coherent way to answer these questions without having to make compromises. When things feel Just Right, it is a good sign to have found the right solution for a particular problem. What we are describing here are the lessons learned from studying other systems, distilled as principles to follow for others.

We Need to Talk About the Cost of Security Operations Infrastructure

September 21, 2023 · 5 min read

Oliver Rochford

Chief Futurist

In today's digital age, businesses are under immense pressure to bolster their cybersecurity. Understanding the financial implications of security tools is vital to ensure optimal ROI through risk reduction and breach resilience. This is particularly true for consumption-based security solutions like Security Information and Event Management (SIEM).

A First Look at ES|QL

August 29, 2023 · 8 min read

Matthias Vallentin

Founder & CEO

Elastic just released their new pipeline query language called ES|QL. This is a conscious attempt to consolidate the language zoo in the Elastic ecosystem (queryDSL, EQL, KQL, SQL, Painless, Canvas/Timelion). Elastic said that they worked on this effort for over a year. The documentation is still sparse, but we still tried to read between the lines to understand what this new pipeline language has to offer.

Slash Your SIEM, Cloud, and Data Costs with Tenzir Security Data Pipelines

August 17, 2023 · 2 min read

Oliver Rochford

Chief Futurist

Staying ahead in the realm of cybersecurity means relentlessly navigating an endless sea of emerging threats and ever-increasing data volumes. The battle to stay one step ahead can often feel overwhelming, especially when your organization's data costs are skyrocketing.

Introducing Tenzir Security Data Pipelines

August 9, 2023 · 5 min read

Oliver Rochford

Chief Futurist

We're overjoyed to announce our highly-anticipated security data pipeline platform at the renowned BlackHat conference in Las Vegas. The launch marks a milestone in our journey to bring simplicity to data engineering for cybersecurity operations, and to bring a cost-efficient way to tackle the increasingly complex data engineering challenges that security teams confront daily.

Tenzir for Splunk Users

August 3, 2023 · 9 min read

Matthias Vallentin

Founder & CEO

Our Tenzir Query Language (TQL) is a pipeline language that works by chaining operators into data flows. When we designed TQL, we specifically studied Splunk's Search Processing Language (SPL), as it generally leaves a positive impression for security analysts that are not data engineers. Our goal was to take all the good things of SPL, but provide a more powerful language without compromising simplicity. In this blog post, we explain how the two languages differ using concrete threat hunting examples.

Native Zeek Log Rotation & Shipping

July 27, 2023 · 5 min read

Matthias Vallentin

Founder & CEO

Did you know that Zeek supports log rotation triggers, so that you can do anything you want with a newly rotated batch of logs?

Shell Yeah! Supercharging Zeek and Suricata with Tenzir

July 20, 2023 · 5 min read

Matthias Vallentin

Founder & CEO

As an incident responder, threat hunter, or detection engineer, getting quickly to your analytics is key for productivity. For network-based visibility and detection, Zeek and Suricata are the bedrock for many security teams. But operationalizing these tools can take a good chunk of time.

So we asked ourselves: How can we make it super easy to work with Zeek and Suricata logs?

Zeek and Ye Shall Pipe

July 13, 2023 · 3 min read

Matthias Vallentin

Founder & CEO

Zeek turns packets into structured logs. By default, Zeek generates one file per log type and per rotation timeframe. If you don't want to wrangle files and directly process the output, this short blog post is for you.

Mobilizing Zeek Logs

July 6, 2023 · 8 min read

Matthias Vallentin

Founder & CEO

Zeek offers many ways to produce and consume logs. In this blog, we explain the various Zeek logging formats and show how you can get the most out of Zeek with Tenzir. We conclude with recommendations for when to use what Zeek format based on your use case.

Migrating from VAST to Tenzir

June 26, 2023 · 2 min read

Dominik Lohmann

VP Engineering

VAST is now Tenzir. This blog post describes what changed when we renamed the project.

Visibility Across Space and Time is now Tenzir

June 20, 2023 · 2 min read

Matthias Vallentin

Founder & CEO

After 5 years of developing two identities, the VAST project and Tenzir the company, we decided to streamline our efforts and rename VAST to Tenzir.

From Slack to Discord

February 9, 2023 · One min read

Matthias Vallentin

Founder & CEO

We are moving our community chat from Slack to Discord. Why? TL;DR: because Discord has better support for community building. VAST is not the first project that abandons Slack. Numerous open-source projects have done the same.

The New REST API

January 26, 2023 · 7 min read

Principal Engineer

Matthias Vallentin

Founder & CEO

As of v2.4 VAST ships with a new web plugin that provides a REST API. The API documentation describes the available endpoints also provides an OpenAPI spec for download. This blog post shows how we built the API and what you can do with it.

Parquet & Feather: Data Engineering Woes

January 10, 2023 · 8 min read

Data Engineer

Apache Arrow and Apache Parquet have become the de-facto columnar formats for in-memory and on-disk representations when it comes to structured data. Both are strong together, as they provide data interoperability and foster a diverse ecosystem of data tools. But how well do they actually work together from an engineering perspective?

Parquet & Feather: Writing Security Telemetry

October 24, 2022 · 27 min read

Data Engineer

Matthias Vallentin

Founder & CEO

How does Apache Parquet compare to Feather for storing structured security data? In this blog post, we answer this question.

Parquet & Feather: Enabling Open Investigations

October 7, 2022 · 6 min read

Matthias Vallentin

Founder & CEO

Data Engineer

Apache Parquet is the common denominator for structured data at rest. The data science ecosystem has long appreciated this. But infosec? Why should you care about Parquet when building a threat detection and investigation platform? In this blog post series we share our opinionated view on this question. In the next three blog posts, we

describe how VAST uses Parquet and its little brother Feather
benchmark the two formats against each other for typical workloads
share our experience with all the engineering gotchas we encountered along the way

A Git Retrospective

September 15, 2022 · 5 min read

Matthias Vallentin

Founder & CEO

The VAST project is roughly a decade old. But what happened over the last 10 years? This blog post looks back over time through the lens of the git merge commits.

Why merge commits? Because they represent a unit of completed contribution. Feature work takes place in dedicated branches, with the merge to the main branch sealing the deal. Some feature branches have just one commit, whereas others dozens. The distribution is not uniform. As of 6f9c84198 on Sep 2, 2022, there are a total of 13,066 commits, with 2,334 being merges (17.9%). We’ll take a deeper look at the merge commits.

Richer Typing in Sigma

August 12, 2022 · 5 min read

Matthias Vallentin

Founder & CEO

VAST's Sigma frontend now supports more modifiers. In the Sigma language, modifiers transform predicates in various ways, e.g., to apply a function over a value or to change the operator of a predicate. Modifiers are the customization point to enhance expressiveness of query operations.

The new pySigma effort, which will eventually replace the now-considered-legacy sigma project, comes with new modifiers as well. Most notably, lt, lte, gt, gte provide comparisons over value domains with a total ordering, e.g., numbers: x >= 42. In addition, the cidr modifier interprets a value as subnet, e.g., 10.0.0.0/8. Richer typing!

Apache Arrow as Platform for Security Data Engineering

June 17, 2022 · 6 min read

Matthias Vallentin

Founder & CEO

VAST bets on Apache Arrow as the open interface to structured data. By "bet," we mean that VAST does not work without Arrow. And we are not alone. Influx's IOx, DataDog's Husky, Anyscale's Ray, TensorBase, and others committed themselves to making Arrow a corner stone of their system architecture. For us, Arrow was not always a required dependency. We shifted to a tighter integration over the years as the Arrow ecosystem matured. In this blog post we explain our journey of becoming an Arrow-native engine.