Apache Arrow and Apache Parquet have become the de-facto columnar formats for in-memory and on-disk representations when it comes to structured data. Both are strong together, as they provide data interoperability and foster a diverse ecosystem of data tools. But how well do they actually work together from an engineering perspective?
VAST v2.4 completes the switch to open storage formats, and includes an early peek at three upcoming features for VAST: A web plugin with a REST API and an integrated frontend user interface, Docker Compose configuration files for getting started with VAST faster and showing how to integrate VAST into your SOC, and new Python bindings that will make writing integrations easier and allow for using VAST with your data science libraries, like Pandas.
Apache Parquet is the common denominator for structured data at rest. The data science ecosystem has long appreciated this. But infosec? Why should you care about Parquet when building a threat detection and investigation platform? In this blog post series we share our opinionated view on this question. In the next three blog posts, we
- describe how VAST uses Parquet and its little brother Feather
- benchmark the two formats against each other for typical workloads
- share our experience with all the engineering gotchas we encountered along the way