read_cef
Parses an incoming Common Event Format (CEF) stream into events.
Description
The Common Event Format (CEF) is a text-based event format that originally stems from ArcSight. It is line-based and human readable. The first 7 fields of a CEF event are always the same, and the 8th extension field is an optional list of key-value pairs:
CEF:Version|Device Vendor|Device Product|Device Version|Device Event Class ID|Name|Severity|[Extension]
Here is a real-world example:
CEF:0|Cynet|Cynet 360|4.5.4.22139|0|Memory Pattern - Cobalt Strike Beacon ReflectiveLoader|8| externalId=6 clientId=2251997 scanGroupId=3 scanGroupName=Manually Installed Agents sev=High duser=tikasrv01\\administrator cat=END-POINT Alert dhost=TikaSrv01 src=172.31.5.93 filePath=c:\\windows\\temp\\javac.exe fname=javac.exe rt=3/30/2022 10:55:34 AM fileHash=2BD1650A7AC9A92FD227B2AB8782696F744DD177D94E8983A19491BF6C1389FD rtUtc=Mar 30 2022 10:55:34.688 dtUtc=Mar 30 2022 10:55:32.458 hostLS=2022-03-30 10:55:34 GMT+00:00 osVer=Windows Server 2016 Datacenter x64 1607 epsVer=4.5.5.6845 confVer=637842168250000000 prUser=tikasrv01\\administrator pParams="C:\\Windows\\Temp\\javac.exe" sign=Not signed pct=2022-03-30 10:55:27.140, 2022-03-30 10:52:40.222, 2022-03-30 10:52:39.609 pFileHash=1F955612E7DB9BB037751A89DAE78DFAF03D7C1BCC62DF2EF019F6CFE6D1BBA7 pprUser=tikasrv01\\administrator ppParams=C:\\Windows\\Explorer.EXE pssdeep=49152:2nxldYuopV6ZhcUYehydN7A0Fnvf2+ecNyO8w0w8A7/eFwIAD8j3:Gxj/7hUgsww8a0OD8j3 pSign=Signed and has certificate info gpFileHash=CFC6A18FC8FE7447ECD491345A32F0F10208F114B70A0E9D1CD72F6070D5B36F gpprUser=tikasrv01\\administrator gpParams=C:\\Windows\\system32\\userinit.exe gpssdeep=384:YtOYTIcNkWE9GHAoGLcVB5QGaRW5SmgydKz3fvnJYunOTBbsMoMH3nxENoWlymW:YLTVNkzGgoG+5BSmUfvJMdsq3xYu gpSign=Signed actRem=Kill, Rename
The CEF specification pre-defines several extension field key names and data types for the corresponding values. Tenzir's parser does not enforce the strict definitions and instead tries to infer the type from the provided values.
Tenzir translates the extension
field to a nested record, where the key-value
pairs of the extensions map to record fields. Here is an example of the above
event:
merge = bool (optional)
Merges all incoming events into a single schema* that converges over time. This option is may be faster for reading highly heterogeneous data, but can lead to huge schemas filled with nulls and imprecise results. Use with caution.
*: In selector mode, only events with the same selector are merged.
raw = bool (optional)
Use only the raw types that are native to the parsed format. Fields that have a type
specified in the chosen schema
will still be parsed according to the schema.
In the case of CEF, this means that no parsing of data takes place at all
and every value remains a string, unless the field is in the schema
.
schema = str (optional)
Provide the name of a schema to be used by the parser.
If a schema with a matching name is installed, the result will always have all fields from that schema.
- Fields that are specified in the schema, but did not appear in the input will be null.
- Fields that appear in the input, but not in the schema will also be kept.
schema_only=true
can be used to reject fields that are not in the schema.
If the given schema does not exist, this option instead assigns the output schema name only.
The schema
option is incompatible with the selector
option.
selector = str (optional)
Designates a field value as schema name with an optional dot-separated prefix.
The string is parsed as <fieldname>[:<prefix>]
. The prefix
is optional and
will be prepended to the field value to generate the schema name.
For example, the Suricata EVE JSON format includes a field
event_type
that contains the event type. Setting the selector to
event_type:suricata
causes an event with the value flow
for the field
event_type
to map onto the schema suricata.flow
.
The selector
option is incompatible with the schema
option.
schema_only = bool (optional)
When working with an existing schema (obtained via the schema
or selector
option), this option will ensure that the output
schema has only the fields from that schema. If the schema name is obtained
via a selector
and it does not exist, this has no effect.
This option requires either schema
or selector
to be set.
unflatten = str (optional)
A delimiter that, if present in keys, causes values to be treated as values of nested records.
A popular example of this is the Zeek JSON format.
It includes the fields id.orig_h
, id.orig_p
, id.resp_h
, and id.resp_p
at the top-level. The data is best modeled as an id
record with four nested
fields orig_h
, orig_p
, resp_h
, and resp_p
.
Without an unflatten separator, the data looks like this:
With the unflatten separator set to .
, Tenzir reads the events like this: