parquet
Reads events from a Parquet file. Writes events to a Parquet file.
Synopsis
Parser:
parquet
Printer:
parquet [—compression-type=<type>] [—compression-level=<level>]
Description
The parquet
format provides both a parser and a printer for Parquet files.
Apache Parquet is a columnar storage format that a variety of data tools support.
When using the parser with the file
connector, we
recommend passing the --mmap
option to file
to give the parser full control
over the reads, which leads to better performance and memory usage.
Tenzir currently assumes that all Parquet files use metadata recognized by Tenzir. We plan to lift this restriction in the future.
--compression-type
(Printer)
Specifies an optional compression type. Supported options are zstd
for
Zstandard compression, brotli
for brotli
compression, gzip
for gzip compression, and snappy
for
snappy compression.
compress
operator?The Parquet format offers more efficient compression compared to the
compress
operator. This is because it compresses
the data column-by-column, leaving metadata that needs to be accessed frequently
uncompressed.
--compression-level
(Printer)
An optional compression level for the corresponding compression type. This option is ignored if no compression type is specified.
Defaults to the compression type's default compression level.
Examples
Read a Parquet file via the from
operator:
from file --mmap /tmp/data.prq read parquet
Write a Zstd-compressed Parquet file via to
operator:
to /tmp/suricata.parquet write parquet --compression-type zstd