Apache Parquet is the common denominator for structured data at rest. The data science ecosystem has long appreciated this. But infosec? Why should you care about Parquet when building a threat detection and investigation platform? In this blog post series we share our opinionated view on this question. In the next three blog posts, we
- describe how VAST uses Parquet and its little brother Feather
- benchmark the two formats against each other for typical workloads
- share our experience with all the engineering gotchas we encountered along the way
The VAST project is roughly a decade old. But what happened over the last 10 years? This blog post looks back over time through the lens of the git merge commits.
Why merge commits? Because they represent a unit of completed contribution.
Feature work takes place in dedicated branches, with the merge to the main
branch sealing the deal. Some feature branches have just one commit, whereas
others dozens. The distribution is not uniform. As of
6f9c84198 on Sep 2,
2022, there are a total of 13,066 commits, with 2,334 being merges (17.9%).
We’ll take a deeper look at the merge commits.
VAST v2.3 is now available, which introduces an automatic data defragmentation capability.
VAST's Sigma frontend now supports more modifiers. In the Sigma language, modifiers transform predicates in various ways, e.g., to apply a function over a value or to change the operator of a predicate. Modifiers are the customization point to enhance expressiveness of query operations.
The new pySigma effort, which will eventually replace the
now-considered-legacy sigma project, comes with new modifiers as well.
gte provide comparisons over value domains
with a total ordering, e.g., numbers:
x >= 42. In addition, the
modifier interprets a value as subnet, e.g.,
10.0.0.0/8. Richer typing!
VAST v2.1 is out! This release comes with a particular focus on performance and reducing the size of VAST databases. It brings a new utility for optimizing databases in production, allowing existing deployments to take full advantage of the improvements after upgrading.
VAST bets on Apache Arrow as the open interface to structured data. By "bet," we mean that VAST does not work without Arrow. And we are not alone. Influx's IOx, DataDog's Husky, Anyscale's Ray, TensorBase, and others committed themselves to making Arrow a corner stone of their system architecture. For us, Arrow was not always a required dependency. We shifted to a tighter integration over the years as the Arrow ecosystem matured. In this blog post we explain our journey of becoming an Arrow-native engine.
Dear community, we are excited to announce VAST v2.0, bringing faster execution of bulk-submitted queries, improved tunability of index structures, and new configurability through environment variables.
Dear community, we are happy to announce the release of VAST v1.1.2, the latest release on the VAST v1.1 series. This release contains a fix for a race condition that could lead to VAST eventually becoming unresponsive to queries in large deployments.
Dear community, we are excited to announce VAST v1.1, which ships with exciting new features: query language plugins to exchange the query expression frontend, and compaction as a mechanism for expressing fine-grained data retention policies and gradually aging out data instead of simply deleting it.