Skip to main content

4 posts tagged with "arrow"

View All Tags

· 6 min read
Matthias Vallentin
Thomas Peiselt

Apache Parquet is the common denominator for structured data at rest. The data science ecosystem has long appreciated this. But infosec? Why should you care about Parquet when building a threat detection and investigation platform? In this blog post series we share our opinionated view on this question. In the next three blog posts, we

  1. describe how VAST uses Parquet and its little brother Feather
  2. benchmark the two formats against each other for typical workloads
  3. share our experience with all the engineering gotchas we encountered along the way

· 6 min read
Matthias Vallentin

VAST bets on Apache Arrow as the open interface to structured data. By "bet," we mean that VAST does not work without Arrow. And we are not alone. Influx's IOx, DataDog's Husky, Anyscale's Ray, TensorBase, and others committed themselves to making Arrow a corner stone of their system architecture. For us, Arrow was not always a required dependency. We shifted to a tighter integration over the years as the Arrow ecosystem matured. In this blog post we explain our journey of becoming an Arrow-native engine.