Skip to main content
Version: Next

write_parquet

Transforms event stream to a Parquet byte stream.

write_parquet [compression_level=int, compression_type=str]

Description

Apache Parquet is a columnar storage format that a variety of data tools support.

compression_level = int (optional)

An optional compression level for the corresponding compression type. This option is ignored if no compression type is specified.

Defaults to the compression type's default compression level.

compression_type = str (optional)

Specifies an optional compression type. Supported options are zstd for Zstandard compression, brotli for brotli compression, gzip for gzip compression, and snappy for snappy compression.

Why would I use this over the compress operator?

The Parquet format offers more efficient compression compared to the compress operator. This is because it compresses the data column-by-column, leaving metadata that needs to be accessed frequently uncompressed.

Examples

Write a Parquet file:

load_file "/tmp/data.json"
read_json
write_parquet