4 posts tagged with "parquet"

Parquet & Feather: Data Engineering Woes

January 10, 2023 · 8 min read

Data Engineer

Apache Arrow and Apache Parquet have become the de-facto columnar formats for in-memory and on-disk representations when it comes to structured data. Both are strong together, as they provide data interoperability and foster a diverse ecosystem of data tools. But how well do they actually work together from an engineering perspective?

VAST v2.4

December 9, 2022 · 5 min read

Dominik Lohmann

VP Engineering

VAST v2.4 completes the switch to open storage formats, and includes an early peek at three upcoming features for VAST: A web plugin with a REST API and an integrated frontend user interface, Docker Compose configuration files for getting started with VAST faster and showing how to integrate VAST into your SOC, and new Python bindings that will make writing integrations easier and allow for using VAST with your data science libraries, like Pandas.

Parquet & Feather: Writing Security Telemetry

October 24, 2022 · 27 min read

Thomas Peiselt

Data Engineer

Matthias Vallentin

Founder & CEO

How does Apache Parquet compare to Feather for storing structured security data? In this blog post, we answer this question.

Parquet & Feather: Enabling Open Investigations

October 7, 2022 · 6 min read

Matthias Vallentin

Founder & CEO

Thomas Peiselt

Data Engineer

Apache Parquet is the common denominator for structured data at rest. The data science ecosystem has long appreciated this. But infosec? Why should you care about Parquet when building a threat detection and investigation platform? In this blog post series we share our opinionated view on this question. In the next three blog posts, we

describe how VAST uses Parquet and its little brother Feather
benchmark the two formats against each other for typical workloads
share our experience with all the engineering gotchas we encountered along the way