This guide shows you how to parse binary data formats into structured events. You’ll learn to work with columnar formats like Parquet and Feather, packet captures in PCAP format, Tenzir’s native Bitz format, and compressed data.
The examples use from_file with a
parsing subpipeline to illustrate
each technique.
Parquet
Section titled “Parquet”Apache Parquet is a columnar format widely used in data lakes and analytics pipelines. Given this Parquet file containing user data:
from_file "users.parquet" { read_parquet}{id: 1, name: "alice", email: "alice@example.com", role: "admin"}{id: 2, name: "bob", email: "bob@example.com", role: "user"}{id: 3, name: "carol", email: "carol@example.com", role: "user"}Parquet files often come from cloud storage:
from_file "s3://datalake/events/*.parquet"The from_file operator automatically detects
Parquet format from the file extension.
Feather
Section titled “Feather”Apache Feather is Parquet’s little brother—a lightweight columnar format optimized for fast I/O:
from_file "data.feather" { read_feather}Use read_feather to parse Feather files.
PCAP is the standard
format for packet captures. Use read_pcap to
parse captured packets:
from_file "capture.pcap" { read_pcap}{linktype: 1, timestamp: 2024-01-15T10:30:45.123456Z, captured_packet_length: 74, original_packet_length: 74, data: "ABY88f1tZJ7zvttmCABFAAA8..."}Use from_nic to parse directly from a live interface. TQL furhter comes with light-weight packet processing functions. For
example, you can extract protocol headers from raw packet data using the
decapsulate function:
from_file "capture.pcap" { read_pcap}packet = decapsulate(this){packet: {ether: {src: "64-9E-F3-BE-DB-66", dst: "00-16-3C-F1-FD-6D", type: 2048}, ip: {src: "192.168.1.100", dst: "10.0.0.1", type: 6}, tcp: {src_port: 54321, dst_port: 443}, community_id: "1:YXWfTYEyYLKVv5Ge4WqijUnKTrM="}}Bitz is Tenzir’s native columnar format, optimized for schema-rich security
data. Use read_bitz to parse it:
from_file "archive.bitz" { read_bitz}Compressed data
Section titled “Compressed data”Binary formats often come compressed. The
from_file operator automatically detects
compression based on file extensions like .gz, .zst, .bz2, .lz4, and
.br:
from_file "data.parquet.gz" // Auto-detects gzipfrom_file "logs.json.zst" // Auto-detects zstdWhen automatic detection doesn’t apply (e.g., custom extensions or chained formats), use explicit decompression operators in the parsing subpipeline. These are bytes-to-bytes operators, so they must appear before the parser:
| Format | Operator |
|---|---|
| Gzip | decompress_gzip |
| Zstandard | decompress_zstd |
| Bzip2 | decompress_bz2 |
| LZ4 | decompress_lz4 |
| Brotli | decompress_brotli |
Example with explicit decompression:
from_file "capture.pcap.zst" { decompress_zstd read_pcap}