read_syslog

Parses an incoming Syslog stream into events.

read_syslog [octet_counting=bool, raw_message=field, merge=bool, raw=bool, schema=string, selector=string, schema_only=bool, unflatten_separator=string]

Description

Syslog is a standard format for message logging.

Tenzir supports reading syslog messages in both the standardized “Syslog Protocol” format (RFC 5424), and the older “BSD syslog Protocol” format (RFC 3164).

It also accepts common Check Point structured-data variants, including key:"value" parameters, semicolon separators, and messages with no SD-ID.

Depending on the syslog format, the result can be different. Here’s an example of a syslog message in RFC 5424 format:

<165>8 2023-10-11T22:14:15.003Z mymachineexamplecom evntslog 1370 ID47 [exampleSDID@32473 eventSource="Application" eventID="1011"] Event log entry

With this input, the parser will produce the following output, with the schema name syslog.rfc5424:

{
  input: "<165>8 2023-10-11T22:14:15.003Z mymachineexamplecom evntslog 1370 ID47 [exampleSDID@32473 eventSource=\"Application\" eventID=\"1011\"] Event log entry",
  output: {
    facility: 20,
    severity: 5,
    version: 8,
    timestamp: 2023-10-11T22:14:15.003Z,
    hostname: "mymachineexamplecom",
    app_name: "evntslog",
    process_id: "1370",
    message_id: "ID47",
    structured_data: {
      "exampleSDID@32473": {
        eventSource: "Application",
        eventID: 1011,
      },
    },
    message: "Event log entry",
  },
}

Here’s an example of a syslog message in RFC 3164 format:

<34>Nov 16 14:55:56 mymachine PROGRAM: Freeform message

With this input, the parser will produce the following output, with the schema name syslog.rfc3164:

{
  "facility": 4,
  "severity": 2,
  "timestamp": "Nov 16 14:55:56",
  "hostname": "mymachine",
  "app_name": "PROGRAM",
  "process_id": null,
  "content": "Freeform message"
}

Some RFC 3164 emitters prepend structured-data blocks to the message content using syntax similar to RFC 5424. When the parser detects valid structured data at the start of the content field, it extracts the key-value pairs into a structured_data field and stores the remaining text in content. These events use the schema name syslog.rfc3164.structured:

<166>2026-02-11T18:01:45.587Z myhost Hostd[2099494]: [Originator@6876 sub=Vimsvc.TaskManager opID=23d59ade] Task Completed

{
  facility: 20,
  severity: 6,
  timestamp: "2026-02-11T18:01:45.587Z",
  hostname: "myhost",
  app_name: "Hostd",
  process_id: "2099494",
  structured_data: {
    "Originator@6876": {
      sub: "Vimsvc.TaskManager",
      opID: "23d59ade",
    },
  },
  content: "Task Completed",
}

If the leading brackets don’t contain valid structured data, the parser leaves the content intact and uses the regular syslog.rfc3164 schema.

`octet_counting = bool (optional)`

Controls handling of length-prefixed syslog messages as defined in RFC 6587. Some syslog implementations prepend the message length in bytes, followed by a space:

<length> <syslog-message>

For example, where 65 is the byte count of the syslog message that follows:

65 <165>1 2023-10-11T22:14:15.003Z host app 1234 ID01 - Test message

The parameter supports three modes:

Not specified (default): Auto-detect. Strips a length prefix if present and valid, otherwise parses the input as-is.
octet_counting=true: Require a length prefix. Emits a warning and returns null if the input lacks a valid prefix.
octet_counting=false: Never strip a length prefix. Parse the input as-is, treating any leading digits as part of the message.

`raw_message = field (optional)`

Stores the original, unparsed syslog message in the specified field.

For multi-line messages, the raw message includes all lines joined with newlines.

`merge = bool (optional)`

Merges all incoming events into a single schema* that converges over time. This option is usually the fastest for reading highly heterogeneous data, but can lead to huge schemas filled with nulls and imprecise results. Use with caution.

*: In selector mode, only events with the same selector are merged.

In merging mode, a repeated key will always overwrite the previous value.

`raw = bool (optional)`

Use only the raw types that are native to the parsed format. Fields that have a type specified in the chosen schema will still be parsed according to the schema.

`schema = string (optional)`

Provide the name of a schema to be used by the parser.

If a schema with a matching name is installed, the result will always have all fields from that schema.

Fields that are specified in the schema, but did not appear in the input will be null.
Fields that appear in the input, but not in the schema will also be kept. Use schema_only=true to reject fields that are not in the schema.

If the given schema does not exist, this option instead assigns the output schema name only.

The schema option is incompatible with the selector option.

`selector = string (optional)`

Designates a field value as schema name with an optional dot-separated prefix.

The string is parsed as <fieldname>[:<prefix>]. The prefix is optional and will be prepended to the field value to generate the schema name.

For example, the Suricata EVE JSON format includes a field event_type that contains the event type. Setting the selector to event_type:suricata causes an event with the value flow for the field event_type to map onto the schema suricata.flow.

The selector option is incompatible with the schema option.

`schema_only = bool (optional)`

When working with an existing schema, this option will ensure that the output schema has only the fields from that schema.

If the schema name is obtained via a selector and it does not exist, this has no effect.

This option requires either schema or selector to be set.

`unflatten_separator = string (optional)`

A delimiter that, if present in keys, causes values to be treated as values of nested records.

A popular example of this is the Zeek JSON format. It includes the fields id.orig_h, id.orig_p, id.resp_h, and id.resp_p at the top-level. The data is best modeled as an id record with four nested fields orig_h, orig_p, resp_h, and resp_p.

Without an unflatten separator, the data looks like this:

{
  "id.orig_h": "1.1.1.1",
  "id.orig_p": 10,
  "id.resp_h": "1.1.1.2",
  "id.resp_p": 5
}

With the unflatten separator set to ., Tenzir reads the events like this:

{
  "id": {
    "orig_h": "1.1.1.1",
    "orig_p": 10,
    "resp_h": "1.1.1.2",
    "resp_p": 5
  }
}

Duplicate Keys

If the parser encounters a duplicate key in an event, it will transparently upgrade the field to be a list of values instead.

For a simple example, consider this JSON file:

{"key": 7}
{"key": 0.0, "key": 1}
{"key": 42}

{key: 7}
{key: [0.0, 1.0]}
{key: 42}

If the values are of different type, conversions to a common type will be attempted, such as to a common number type. Ultimately values will be stringified if they do not share a common type:

{"key": 0.0, "key": "1.1.1.1", "key": "example.com"}

{key: ["0", "1.1.1.1", "example.com"]}

Examples

Read in the `auth.log`

from_file "/var/log/auth.log" {
  read_syslog
}

{
  facility: null,
  severity: null,
  timestamp: 2024-10-14T07:15:01.348027,
  hostname: "tenzirs-magic-machine",
  app_name: "CRON",
  process_id: "895756",
  content: "pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)",
}
{
  facility: null,
  severity: null,
  timestamp: 2024-10-14T07:15:01.349838,
  hostname: "tenzirs-magic-machine",
  app_name: "CRON",
  process_id: "895756",
  content: "pam_unix(cron:session): session closed for user root"
}

Parse octet-counted syslog over TCP

When receiving syslog over TCP from systems that use RFC 6587 octet counting, the parser auto-detects and strips the length prefix:

accept_tcp "0.0.0.0:514" {
  read_syslog
}

Use octet_counting=true to require the prefix or octet_counting=false to disable auto-detection entirely.

Parse Check Point structured data variants

Some Check Point exports use a structured-data dialect that differs from RFC 5424. read_syslog supports common variants such as key:"value" parameters, semicolon separators, and records that omit the SD-ID.

When an SD-ID is missing, Tenzir stores the parameters under structured_data.checkpoint_2620:

<134>1 2026-02-25T08:14:51Z fwmgt CheckPoint 10024 - [action:"Accept"; conn_direction:"Incoming"; flags:"8667398"]

from_file "checkpoint.txt" {
  read_syslog
}

{
  facility: 16,
  severity: 6,
  version: 1,
  timestamp: 2026-02-25T08:14:51Z,
  hostname: "fwmgt",
  app_name: "CheckPoint",
  process_id: "10024",
  message_id: null,
  structured_data: {
    checkpoint_2620: {
      action: "Accept",
      conn_direction: "Incoming",
      flags: 8667398,
    },
  },
  message: null,
}

Parse RFC 3164 messages with structured data

Some legacy syslog emitters, such as VMware ESXi, prepend structured-data blocks to the message content. The parser extracts them automatically:

from {
  line: r#"<166>2026-02-11T18:01:45.587Z myhost Hostd[2099494]: [Originator@6876 sub=Vimsvc.TaskManager opID=23d59ade] Task Completed"#,
}
write_lines
read_syslog

{
  facility: 20,
  severity: 6,
  timestamp: "2026-02-11T18:01:45.587Z",
  hostname: "myhost",
  app_name: "Hostd",
  process_id: "2099494",
  structured_data: {
    "Originator@6876": {
      sub: "Vimsvc.TaskManager",
      opID: "23d59ade",
    },
  },
  content: "Task Completed",
}

Preserve the original message

Use raw_message to keep the unparsed syslog line alongside the parsed fields:

from_file "/var/log/auth.log" {
  read_syslog raw_message=raw
}

{
  facility: null,
  severity: null,
  timestamp: 2024-10-14T07:15:01.348027,
  hostname: "tenzirs-magic-machine",
  app_name: "CRON",
  process_id: "895756",
  content: "pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)",
  raw: "Oct 14 07:15:01 tenzirs-magic-machine CRON[895756]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)",
}

read_syslog

Description

octet_counting = bool (optional)

raw_message = field (optional)

merge = bool (optional)

raw = bool (optional)

schema = string (optional)

selector = string (optional)

schema_only = bool (optional)

unflatten_separator = string (optional)