We re-wired Tenzir's
fluent-bit operator and
introduced a significant performance boost as a side effect: A 3–5x gain for
throughput in events per second (EPS) and 4–8x improvement of latency in terms
of processing time.
Why were these gains available? Because we eliminated one round-trip of internal JSON printing and parsing.
Our primary goal was actually working around an issue
with the Fluent Bit
lib output plugin, which we use whenever we have a
fluent-bit source operator. For example, if you configure an
Fluent Bit source, Tenzir's
fluent-bit operator autocomplete the
plugin. This plugin has two modes of accepting input: JSON or
Up to now, we relied on the JSON transfer mode because it was faster to get
started. However, during testing with the
elasticsearch input that receives
large Windows event logs via Winlogbeat, we noticed that Fluent Bit's
output produces messages of the form
[timestamp, object] where
cropped. This basically generated invalid JSON.
The fix involved switching the exchange format of the
lib output plugin from
JSON to MsgPack. If you're curious, take a look at
#3770 for the full scoop. The
improvement is already in the current development version and will be available
with the next release.
We were curious how much this removal of the extra layer of printing and parsing actually buys us. To this end, we use the following pipeline:
tenzir --dump-metrics 'fluent-bit stdin | head 10M | discard' < eve.json
--dump-metrics adds detailed per-operator metrics that help us
understand where operators spend their time. The
operator take the first 10 million events, and
simply drops its input. The
eve.json input into the
tenzir binary is from
our Suricata dataset that we use in the user guides. We measured
ran our measurements on a 2021 Apple MacBook Pro M1 Max, as well as on a Manjaro
Linux laptop with a 14-core Intel i7 CPU.
Our intuition was that we won't see major improvements, because generating JSON isn't that expensive and we use simdjson to parse JSON. But the results surprised us:
On macOS, events per second tripled from 50k to 150k, and the pipeline runtime went from 42 to 10 seconds. On Linux, the improvements were even higher. We don't have a good explanation for the rather stark difference between the operating systems. Our hunch is that the allocator performance is the high-order bit explaining the difference.
We switched from JSON to MsgPack for our
source operator. This removed one round-trip of printing JSON (in Fluent Bit)
and parsing JSON (in Tenzir). We were surprised to see that this change resulted
in such substantial performance improvements. As a result, you can now run many
more Fluent Bit ingestion pipelines in parallel at a single node with the same
resources, or vertically scale your Fluent Bit pipeline to new limits.
Thanks to Christoph Lobmeyer and Yannik Meinhardt for reporting this issue! 🙏