This is unreleased documentation for Tenzir Next version.
For up-to-date documentation, see the latest version (v4.25).
Version: Next
Execute Detections and Analytics
Security teams integrate a variety of different detection content and custom
analytics to drive turn visibility into actionable alerts. The spectrum of
analytics ranges across atomic of indicators of compromise (IoCs),
YARA and Sigma rules,
Python scripts, and machine learning models. All of these forms require
different execution methods when deploying them on the raw event data. Scaling
up in both diversity of analytics and volume of event data is daunting
engineering task.
Tenzir's Security Data Pipelines provide an execution vehicle for security
content across the entire spectrum. This helps security teams to operationalize
their detections at ease without devoting precious cycles to in-house data
engineering.
Key Takeaways
Tenzir has built-in operators for Sigma and YARA rule matching
Tenzir supports matching of indicators of compromise via lookup tables and
Bloom filters
Tenzir has a generic contextualization framework for enriching data with any
form of security content
Tenzir's pipeline supporting running inline Python code at high speeds
Using Tenzir's pipelines, security teams have a turnkey solution to bring
their detections and analytics to the security data—no need for in-house data
engineering resources.
Security teams typically deploy several dozens of tools, of which many ship with
various detections. Rarely these work out of the box, but rather require
substantial tuning to achieve the right right true positive and false positive
rate. Mature teams realize that they ultimately need to own the detection
content to produce actionable alerts and sensibly dispatch investigations.
Bringing the security content to the data so that it can act is a non-trivial
task. It requires substantial amount of data engineering and in-depth knowledge
of the execution engine—be it a SIEM, log management solution, a data lake, a
custom database, or a mere streaming engine. But why is that so?
When it comes to translating detections to the execution engine, deeper data
engineering skills are required to maintain a scalable system. Security
engineers are domain experts and know how to describe threats and look for
malware patterns, but often lack the skills to tune their rules so that they run
efficiently.
Moreover, security content needs to be deployed not only forward-looking in
a live streaming fashion, but also be translated into searches to apply it
retroactively. The reason is that there is an inherent lag in the availability
of detection content: threat research publish reports with observables and
detections only after an attack is already ongoing. By the team the security
team receives the new insight, the initial attack activity has likely already
happened, and may not occur again. Therefore, security teams also need a
backward-looking retro-matching capability for new content.
Live and retro matching often have different mechanisms within the same system,
making it complicated to configure for the analysts.
Detections come in many shapes, and they all need to be operationalized
differently. This can eat up a substantial amount of time from security
analysts, who should rather spend their time with the investigation of the
results.
For example, observables or indicators of compromise (IoCs) are a atomic pieces
of knowledge, typically disseminated through reports created by threat analysts.
Security engineers consume these reports, either manually or in structured form
via a intelligence platform. While such a platform may help to organize the
knowledge and organize relevant threats, the relevant observables still needs to
be deployed for detection.
The sheer volume of telemetry from endpoint, network, and cloud data sources
makes executing detections a non-trivial challenge. If you collect petabytes of
data, performing a full scan over the data is not just inefficient, but
cost-prohibitive when doing so for every detection. Understanding the workloads
that detections generate is a prerequisite to effective execution on large
datasets.
For example, an engine must handle the continuous churn of observables
that describe the ever-changing threat landscape, and use that dataset for
continuous live and retro detection. This "background radiation" alone puts a
growing pressure the historical and live data event feeds that require a
carefully designed data architecture. On top of that come threat hunting and
incident response workloads. Orchestrating these workloads Without deep data
engineering knowledge is next to infeasible.
Building detections is often a bottom-up process that starts with a specific
data source and references a subset of fields. Generalizing it to a larger set
of inputs necessitates an upfront normalization step. At this point many
security teams build on top of proprietary event taxonomies (or schemas)
inherent in the log management or SIEM solution. The trade-off is a deep
coupling of a library of detections and rules to specific vendor ecosystem.
Similarly, when orchestrating a diverse set of detections that come from
different producers and marketplaces, they rarely adhere to a standardized alert
output shape. It's up to the security engineers to wrangle the alert data or
buy themselves out of the problem by deploying SOAR tool.
SecDataOps puts data utility front and center by making it easy to manage
security event dataflows. Tenzir's powerful reshaping capabilities make it
effortless to normalize data into the right form so that detections can run on
them. In fact, pipelines themselves can run detection workloads and produce
streamlined findings for easy-button consumption.
The Sec in SecDataOps refers to security data, and detection is fundamental
operation on it. Tenzir features dedicated operators for executing detections,
such as yara or
sigma. If Tenzir does not support a
specific rule engine or detection format, you can extend implement your own
operator in C++ that may depend on custom third party libraries.
There's also a generic python
operator for integrating arbitrary third-party analytics on structured data.
Given the dominance of Python in the data science community, turn-key
transition to security data enables a much more efficient detection engineering
process.
Normalizing the shape of input and output of a detection is a prerequisite to
building scalable detection and response pipelines. Standardizing the input is
central to achieve a wide applicability and reuse of a detection. Standardizing
the detection output is critical to scale up the number and diversity of
detections while keeping the subsequent alerting processes lean.
Tenzir's wide array of reshaping operators make it easy to perform ad-hoc
adjustments of the data shape of a particular detection input or output. More
generally, Tenzir supports building entire libraries to normalize data into
broadly used event taxonomies, such as the Open Cybersecurity Schema Framework
(OCSF). This ultimately decouples not only data collection
from detection, but also detection from response.
The line between detection and contextualization can be blurry. For example,
a contextualization may be the basis for a detection when enriching events with
a "magic risk score" between 0 and 1, and then raising an alert when the score
is greater than 0.8.
Tenzir has built-in contextualization framework for high-speed in-band
enriching, offering a one-stop-shop solution for integrating third-party context
into the detection process.
Tenzir's SecDataOps approach and Security Data Pipelines lay the foundation for
sustainable and scalable detection operations. Built-in detection operators
jump-start the operationalization of security content, normalization using OCSF
decouple data collection, detection, and response, and an extensible high-speed
contextualization framework makes integrating third-party security content a
breeze. These capabilities dramatically improve the productivity of security
engineers and analysts by shifting their attention from wrangling data to
hunting threats.