Skip to main content
Version: Next

parquet

Reads events from a Parquet file. Writes events to a Parquet file.

Synopsis

Parser:

parquet

Printer:

parquet [—compression-type=<type>] [—compression-level=<level>]

Description

The parquet format provides both a parser and a printer for Parquet files.

Apache Parquet is a columnar storage format that a variety of data tools support.

MMAP Parsing

When using the parser with the file connector, we recommend passing the --mmap option to file to give the parser full control over the reads, which leads to better performance and memory usage.

Limitation

Tenzir currently assumes that all Parquet files use metadata recognized by Tenzir. We plan to lift this restriction in the future.

--compression-type (Printer)

Specifies an optional compression type. Supported options are zstd for Zstandard compression, brotli for brotli compression, gzip for gzip compression, and snappy for snappy compression.

Why would I use this over the compress operator?

The Parquet format offers more efficient compression compared to the compress operator. This is because it compresses the data column-by-column, leaving metadata that needs to be accessed frequently uncompressed.

--compression-level (Printer)

An optional compression level for the corresponding compression type. This option is ignored if no compression type is specified.

Defaults to the compression type's default compression level.

Examples

Read a Parquet file via the from operator:

from file --mmap /tmp/data.prq read parquet

Write a Zstd-compressed Parquet file via the to operator:

to /tmp/suricata.parquet write parquet --compression-type zstd