This guide shows you how to parse text streams into structured events. You’ll learn to split byte streams on newlines or custom delimiters, and parse line-based formats like JSON lines, CSV, TSV, key-value pairs, Syslog, and CEF.
The examples use from_file with a
parsing subpipeline to illustrate
each technique.
Split on newlines
Section titled “Split on newlines”Use read_lines to split a byte stream on
newline characters. Given this input file:
2024-01-15 10:30:45 INFO Application started2024-01-15 10:30:46 DEBUG Processing requestThis pipeline produces one event per line:
from_file "app.log" { read_lines}{line: "2024-01-15 10:30:45 INFO Application started"}{line: "2024-01-15 10:30:46 DEBUG Processing request"}The same pattern works for network streams:
from "tcp://0.0.0.0:9000" { read_lines}Split on custom delimiters
Section titled “Split on custom delimiters”Use read_delimited when records use
separators other than newlines. Given this input file:
first record|||second record|||third recordThis pipeline splits on every occurrence of |||:
from_file "records.dat" { read_delimited "|||"}{data: "first record"}{data: "second record"}{data: "third record"}Blank line separators
Section titled “Blank line separators”Some formats use blank lines to separate records, such as paragraphs or multi-line entries. Given this input file:
First paragraph withmultiple lines.
Second paragraph here.
Third paragraph.This pipeline splits on blank lines:
from_file "paragraphs.txt" { read_delimited "\n\n"}{data: "First paragraph with\nmultiple lines."}{data: "Second paragraph here."}{data: "Third paragraph."}Null byte terminators
Section titled “Null byte terminators”Some protocols use null bytes as record terminators:
from "tcp://0.0.0.0:9000" { read_delimited "\x00", binary=true}Add binary=true for non-UTF-8 data to produce blob output instead of
string.
XML document streams
Section titled “XML document streams”XML streams often contain multiple documents without a top-level wrapper. Use
include_separator to keep the closing tag as part of each event:
from_file "windows_events.xml" { read_delimited "</Event>\n", include_separator=true}this = data.parse_winlog()See Windows Event Logs for a complete example.
Line-based structured formats
Section titled “Line-based structured formats”Several read_* operators parse line-based formats directly into structured
events.
JSON lines
Section titled “JSON lines”Given this input file with dotted keys:
{"ts": "2024-01-15T10:30:45Z", "id.orig_h": "192.168.1.100", "id.orig_p": 52311, "id.resp_h": "93.184.216.34", "id.resp_p": 443}{"ts": "2024-01-15T10:30:46Z", "id.orig_h": "192.168.1.101", "id.orig_p": 52312, "id.resp_h": "93.184.216.34", "id.resp_p": 80}Use read_ndjson with unflatten_separator
to convert dotted keys into nested records:
from_file "conn.jsonl" { read_ndjson unflatten_separator="."}{ts: 2024-01-15T10:30:45Z, id: {orig_h: 192.168.1.100, orig_p: 52311, resp_h: 93.184.216.34, resp_p: 443}}{ts: 2024-01-15T10:30:46Z, id: {orig_h: 192.168.1.101, orig_p: 52312, resp_h: 93.184.216.34, resp_p: 80}}For regular JSON arrays or objects, use read_json
instead.
CSV / TSV / SSV / XSV
Section titled “CSV / TSV / SSV / XSV”Given this input file:
id,name,email,role1,alice,alice@example.com,admin2,bob,bob@example.com,user3,carol,carol@example.com,userUse read_csv to parse the file with automatic
header detection:
from_file "users.csv" { read_csv}{id: 1, name: "alice", email: "alice@example.com", role: "admin"}{id: 2, name: "bob", email: "bob@example.com", role: "user"}{id: 3, name: "carol", email: "carol@example.com", role: "user"}For tab-separated or space-separated data, use
read_tsv or
read_ssv. For custom delimiters, use
read_xsv.
Key-value pairs (KV)
Section titled “Key-value pairs (KV)”Given this input file:
name=alice age=30name=bob age=25name=carol age=35Use read_kv to parse each line as key-value
pairs:
from_file "records.txt" { read_kv}{name: "alice", age: 30}{name: "bob", age: 25}{name: "carol", age: 35}Given this Common Event Format (CEF) input:
CEF:0|Security|IDS|1.0|100|Intrusion detected|7|src=192.168.1.100 dst=10.0.0.1 spt=54321 dpt=443CEF:0|Security|IDS|1.0|101|Malware found|9|src=192.168.1.101 dst=10.0.0.2 spt=12345 dpt=80Use read_cef to parse security events:
from_file "events.cef" { read_cef}{cef_version: 0, device_vendor: "Security", device_product: "IDS", device_version: "1.0", signature_id: "100", name: "Intrusion detected", severity: "7", extension: {src: 192.168.1.100, dst: 10.0.0.1, spt: 54321, dpt: 443}}{cef_version: 0, device_vendor: "Security", device_product: "IDS", device_version: "1.0", signature_id: "101", name: "Malware found", severity: "9", extension: {src: 192.168.1.101, dst: 10.0.0.2, spt: 12345, dpt: 80}}For IBM QRadar logs, use read_leef.
Syslog messages
Section titled “Syslog messages”Given this input file:
<14>Jan 15 10:30:45 myhost app[1234]: User logged in<11>Jan 15 10:30:46 myhost app[1234]: Error occurredUse read_syslog to parse each line:
from_file "syslog.txt" { read_syslog}{facility: 1, severity: 6, timestamp: "Jan 15 10:30:45", hostname: "myhost", app_name: "app", process_id: "1234", content: "User logged in"}{facility: 1, severity: 3, timestamp: "Jan 15 10:30:46", hostname: "myhost", app_name: "app", process_id: "1234", content: "Error occurred"}