Formats

Version: Next
Formats
General
A format is the bridge between raw bytes and structured data. A format provides a parser and/or printer:
Parser: translates raw bytes into structured event data
Printer: translates structured events into raw bytes
Parsers and printers interact with their corresponding dual from a connector:

Formats appear as an argument to the parse, and print operators:
parse <field> <format> print <field> <format>
Parser Schema Inference
Parsers will attempt to infer an event schema from the input and potentially data format.
The following builtin parsers provide options for more specific control over schema inference:
CEF
CSV
GELF
JSON
KV
LEEF
Suricata
Syslog
XSV
YAML
Zeek JSON
The Suricata, Zeek JSON and XSV parsers do not provide all of the options.
--merge (Parsers)
Merges all incoming events into a single schema* that converges over time. This option is usually the fastest for reading highly heterogeneous data, but can lead to huge schemas filled with nulls and imprecise results. Use with caution.
*: In selector mode, only events with the same selector are merged.
--schema <schema> (Parsers)
Explicitly set the output schema.
If a schema with a matching name is installed, the result will always have all fields from that schema.
Fields that are specified in the schema, but did not appear in the input will be null.
Fields that appear in the input, but not in the schema will also be kept. --schema-only can be used to reject fields that are not in the schema.
If the given schema does not exist, this option instead assigns the output schema name only.
This option can not be combined with --selector.
--selector <field>[:<prefix>] (Parsers)
Similar to --schema, but use the value of the field specified in <field> as the schema name.
If the optional <prefix> is specified, then the schema is prepended with a prefix. For example, the selector event_type:suricata with an event that has the field event_type set to the value flow looks for a schema named suricata.flow.
This option can not be combined with --schema.
--schema-only (Parsers)
When working with an existing schema, this option will ensure that the output schema has only the fields from that schema. If the schema name is obtained via a selector and it does not exist, this has no effect.
This option requires either --schema or --selector to be set.
--unnest-separator <separator> (Parsers)
A delimiter that, if present in keys, causes values to be treated as values of nested records.
A popular example of this is the Zeek JSON format. It includes the fields id.orig_h, id.orig_p, id.resp_h, and id.resp_p at the top-level. The data is best modeled as an id record with four nested fields orig_h, orig_p, resp_h, and resp_p.
Without an unnest separator, the data looks like this:
{ "id.orig_h" : "1.1.1.1", "id.orig_p" : 10, "id.resp_h" : "1.1.1.2", "id.resp_p" : 5 }
With the unnest separator set to ., Tenzir reads the events like this:
{ "id" : { "orig_h" : "1.1.1.1", "orig_p" : 10, "resp_h" : "1.1.1.2", "resp_p" : 5 } }
--raw (Parsers)
Use only the raw types that are native to the parsed format. Fields that have a type specified in the chosen schema will still be parsed according to the schema.
For example, the JSON format has no notion of an IP address, so this will cause all IP addresses to be parsed as strings, unless the field is specified to be an IP address by the schema. JSON however has numeric types, so those would be parsed.
Use with caution.
MIME Types
When a printer constructs raw bytes, it sets a MIME content type so that savers can make assumptions about the otherwise opaque content. For example, the HTTP connector uses this value to populate the Content-Type header when copying the raw bytes into the HTTP request body.
The printers set the following MIME types:
Format MIME Type
CSV text/csv
JSON application/json
NDJSON application/x-ndjson
Parquet application/x-parquet
PCAP application/vnd.tcpdump.pcap
SSV text/plain
TSV text/tab-separated-values
YAML application/x-yaml
Zeek TSV application/x-zeek
Available Formats
📄️ bitz
Reads and writes BITZ, Tenzir's internal wire format.
📄️ cef
Parses events in the Common Event Format (CEF).
📄️ csv
The csv format is a configuration of the xsv format:
📄️ feather
Reads and writes the Feather file format, a thin wrapper around
📄️ gelf
Reads Graylog Extended Log Format (GELF) events.
📄️ grok
Parses a string using a grok-pattern, backed by regular expressions.
📄️ json
Reads and writes JSON.
📄️ kv
Reads key-value pairs by splitting strings based on regular expressions.
📄️ leef
Parses events in the Log Event Extended Format (LEEF).
📄️ lines
Parses and prints events as lines.
📄️ parquet
Reads events from a Parquet file. Writes events to a Parquet file.
📄️ pcap
Reads and writes raw network packets in PCAP file format.
📄️ ssv
The ssv format is a configuration of the xsv format:
📄️ suricata
Reads Suricata's EVE JSON output. The parser is an alias
📄️ syslog
Reads syslog messages.
📄️ time
Parses a datetime/timestamp using a strptime-like format string.
📄️ tsv
The tsv format is a configuration of the xsv format:
📄️ xsv
Reads and writes lines with separated values.
📄️ yaml
Reads and writes YAML.
📄️ zeek-json
The zeek-json format is an alias for json with the arguments:
📄️ zeek-tsv
Reads and writes Zeek tab-separated values.
Edit this page

General

Parser Schema Inference

`--merge` (Parsers)

`--schema <schema>` (Parsers)

`--selector <field>[:<prefix>]` (Parsers)

`--schema-only` (Parsers)

`--unnest-separator <separator>` (Parsers)

`--raw` (Parsers)

MIME Types

Available Formats

📄️ bitz

📄️ cef

📄️ csv

📄️ feather

📄️ gelf

📄️ grok

📄️ json

📄️ kv

📄️ leef

📄️ lines

📄️ parquet

📄️ pcap

📄️ ssv

📄️ suricata

📄️ syslog

📄️ time

📄️ tsv

📄️ xsv

📄️ yaml

📄️ zeek-json

📄️ zeek-tsv

Format	MIME Type
CSV	`text/csv`
JSON	`application/json`
NDJSON	`application/x-ndjson`
Parquet	`application/x-parquet`
PCAP	`application/vnd.tcpdump.pcap`
SSV	`text/plain`
TSV	`text/tab-separated-values`
YAML	`application/x-yaml`
Zeek TSV	`application/x-zeek`