import

Synopsis

parameters:
[-h | -? | --help] prints the help text
[--batch-encoding=] <string> encoding type of table slices (arrow or msgpack)
[--batch-size=] <uint64> upper bound for the size of a table slice
[--batch-timeout=] <string> timeout after which batched table slices are forwarded
[-b | --blocking] block until the IMPORTER forwarded all data
[-l | --listen=] <string> the endpoint to listen on ([host]:port/type)
[-n | --max-events=] <uint64> the maximum number of events to import
[-r | --read=] <string> path to input where to read events from
[--read-timeout=] <string> timeout for waiting for incoming data
[-S | --schema=] <string> alternate schema as string
[-s | --schema-file=] <string> path to alternate schema
[-t | --type=] <string> filter event type based on prefix matching
[-d | --uds] treat -r as listening UNIX domain socket
subcommands:
zeek imports Zeek TSV logs from STDIN or file
zeek-json imports Zeek JSON logs from STDIN or file
csv imports CSV logs from STDIN or file
json imports JSON with schema
suricata imports suricata eve json
syslog imports syslog messages
test imports random data for testing or benchmarking
pcap imports PCAP logs from STDIN or file

Documentation

The import command ingests data. An optional filter expression allows for restricing the input to matching events. The format of the imported data must be explicitly specified:

vast import [options] <format> [options] [expr]

The import command is the dual to the export command.

This is easiest explained on an example:

vast import suricata < path/to/eve.json

The above command signals the running node to ingest (i.e., to archive and index for later export) all Suricata events from the Eve JSON file passed via standard input.

Filter Expressions

An optional filter expression allows for importing the relevant subset of information only. For example, a user might want to import Suricata Eve JSON, but skip over all events of type suricata.stats.

vast import suricata '#type != "suricata.stats"' < path/to/eve.json

For more information on the optional filter expression, see the query language documentation.

Format-Specific Options

Some import formats have format-specific options. For example, the pcap import format has an interface option that can be used to ingest PCAPs from a network interface directly. To retrieve a list of format-specific options, run vast import <format> help, and similarly to retrieve format-specific documentation, run vast import <format> documentation.

Type Filtering

The --type option filters known event types based on a prefix. E.g., vast import json --type=zeek matches all event types that begin with zeek, and restricts the event types known to the import command accordingly.

VAST permanently tracks imported event types. They do not need to be specified again for consecutive imports.

Batching

The import command parses events into table slices (batches). The following options control the batching:

vast.import.batch-encoding

Selects the encoding of table slices. Available options are msgpack (row-based) and arrow (column-based).

vast.import.batch-size

Sets an upper bound for the number of events per table slice.

Most components in VAST operate on table slices, which makes the table slice size a fundamental tuning knob on the spectrum of throughput and latency. Small table slices allow for shorter processing times, resulting in more scheduler context switches and a more balanced workload. However, the increased pressure on the scheduler comes at the cost of throughput. A large table slice size allows actors to spend more time processing a block of memory, but makes them yield less frequently to the scheduler. As a result, other actors scheduled on the same thread may have to wait a little longer.

The vast.import.batch-size option merely controls number of events per table slice, but not necessarily the number of events until a component forwards a batch to the next stage in a stream. The CAF streaming framework uses a credit-based flow-control mechanism to determine buffering of tables slices. Setting vast.import.batch-size to 0 causes the table slice size to be unbounded and leaves it to other parameters to determine the actual table slice size.

vast.import.batch-timeout

Sets a timeout for forwarding buffered table slices to the importer.

The vast.import.batch-timeout option controls the maximum buffering period until table slices are forwarded to the node. The default batch timeout is one second.

vast.import.read-timeout

Sets a timeout for reading from input sources.

The vast.import.read-timeout option determines how long a call to read data from the input will block. The process yields and tries again at a later time if no data is received for the set value. The default read timeout is 20 milliseconds.

import pcap

Synopsis

imports PCAP logs from STDIN or file
parameters:
[-h | -? | --help] prints the help text
[-i | --interface=] <string> network interface to read packets from
[-c | --cutoff=] <uint64> skip flow packets after this many bytes
[-m | --max-flows=] <uint64> number of concurrent flows to track
[-a | --max-flow-age=] <uint64> max flow lifetime before eviction
[-e | --flow-expiry=] <uint64> flow table expiration interval
[-p | --pseudo-realtime-factor=] <uint64> factor c delaying packets by 1/c
[--snaplen=] <uint64> snapshot length in bytes
[--drop-rate-threshold=] <real64> drop rate that must be exceeded for warnings to occur
[--disable-community-id] disable computation of community id for every packet

Documentation

The import pcap command uses libpcap to read network packets from a trace or an interface.

VAST automatically calculates the Community ID for PCAPs for better pivoting support. The extra computation induces an overhead of approximately 15% of the ingestion rate. The option --disable-community-id disables the computation completely.

The PCAP import format has many additional options that offer a user interface that should be familiar to users of other tools interacting with PCAPs. To see a list of all available options, run vast import pcap help.

Here's an example that reads from the network interface en0 cuts off packets after 65535 bytes.

sudo vast import pcap --interface=en0 --cutoff=65535

import test

Synopsis

imports random data for testing or benchmarking
parameters:
[-h | -? | --help] prints the help text
[--seed=] <uint64> the PRNG seed

Documentation

The import test command exists primarily for testing and benchmarking purposes. It generates and ingests random data for a given schema.

import syslog

Synopsis

imports syslog messages
parameters:
[-h | -? | --help] prints the help text

Documentation

Ingest Syslog messages into VAST. The following formats are supported:

  • RFC 5424
  • A fallback format that consists only of the Syslog message.
# Import from file.
vast import syslog --read=path/to/sys.log
# Continuously import from a stream.
syslog | vast import syslog

import suricata

Synopsis

imports suricata eve json
parameters:
[-h | -? | --help] prints the help text

Documentation

The import suricata command format consumes Eve JSON logs from Suricata. Eve JSON is Suricata's unified format to log all types of activity as single stream of line-delimited JSON.

For each log entry, VAST parses the field event_type to determine the specific record type and then parses the data according to the known schema.

To add support for additional fields and event types, adapt the suricata.schema file that ships with every installation of VAST.

vast import suricata < path/to/eve.log

import json

Synopsis

imports JSON with schema
parameters:
[-h | -? | --help] prints the help text

Documentation

The json import format consumes line-delimited JSON objects according to a specified schema. That is, one line corresponds to one event. The object field names correspond to record field names.

JSON's can express only a subset VAST's data model. For example, VAST has first-class support IP addresses but JSON can only represent them as strings. To get the most out of your data, it is therefore important to define a schema to get a differentiated view of the data.

The infer command also supports schema inference for JSON data. For example, head data.json | vast infer will print a raw schema that can be supplied to --schema-file / -s as file or to --schema / -S as string. However, after infer dumps the schema, the generic type name should still be adjusted and this would be the time to make use of more precise types, such as timestamp instead of time, or annotate them with additional attributes, such as #skip.

If no type prefix is specified with --type / -t, or multiple types match based on the prefix, VAST uses an exact match based on the field names to automatically deduce the event type for every line in the input.

import csv

Synopsis

imports CSV logs from STDIN or file
parameters:
[-h | -? | --help] prints the help text

Documentation

The import csv command imports comma-separated values in tabular form. The first line in a CSV file must contain a header that describes the field names. The remaining lines contain concrete values. Except for the header, one line corresponds to one event.

Because CSV has no notion of typing, it is necessary to select a layout via --type whose field names correspond to the CSV header field names. Such a layout must either be defined in a schema file known to VAST, or be defined in a schema passed using --schema or --schema-file.

E.g., to import Threat Intelligence data into VAST, the known type intel.indicator can be used:

vast import --type=intel.indicator --read=path/to/indicators.csv csv

import zeek-json

Synopsis

imports Zeek JSON logs from STDIN or file
parameters:
[-h | -? | --help] prints the help text

Documentation

The import zeek command consumes Zeek logs in tab-separated value (TSV) style, and the import zeek-json command consumes Zeek logs as line-delimited JSON objects as produced by the json-streaming-logs package. Unlike stock Zeek JSON logs, where one file contains exactly one log type, the streaming format contains different log event types in a single stream and uses an additional _path field to disambiguate the log type. For stock Zeek JSON logs, use the existing import json with the -t flag to specify the log type.

Here's an example of a typical Zeek conn.log:

#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path conn
#open 2014-05-23-18-02-04
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents
#types time string addr port addr port enum string interval count count string bool count string count count count count table[string]
1258531221.486539 Pii6cUUq1v4 192.168.1.102 68 192.168.1.1 67 udp - 0.163820 301 300 SF - 0 Dd 1 329 1 328 (empty)
1258531680.237254 nkCxlvNN8pi 192.168.1.103 137 192.168.1.255 137 udp dns 3.780125 350 0 S0 - 0 D 7 546 0 0 (empty)
1258531693.816224 9VdICMMnxQ7 192.168.1.102 137 192.168.1.255 137 udp dns 3.748647 350 0 S0 - 0 D 7 546 0 0 (empty)
1258531635.800933 bEgBnkI31Vf 192.168.1.103 138 192.168.1.255 138 udp - 46.725380 560 0 S0 - 0 D 3 644 0 0 (empty)
1258531693.825212 Ol4qkvXOksc 192.168.1.102 138 192.168.1.255 138 udp - 2.248589 348 0 S0 - 0 D 2 404 0 0 (empty)
1258531803.872834 kmnBNBtl96d 192.168.1.104 137 192.168.1.255 137 udp dns 3.748893 350 0 S0 - 0 D 7 546 0 0 (empty)
1258531747.077012 CFIX6YVTFp2 192.168.1.104 138 192.168.1.255 138 udp - 59.052898 549 0 S0 - 0 D 3 633 0 0 (empty)
1258531924.321413 KlF6tbPUSQ1 192.168.1.103 68 192.168.1.1 67 udp - 0.044779 303 300 SF - 0 Dd 1 331 1 328 (empty)
1258531939.613071 tP3DM6npTdj 192.168.1.102 138 192.168.1.255 138 udp - - - - S0 - 0 D 1 229 0 0 (empty)
1258532046.693816 Jb4jIDToo77 192.168.1.104 68 192.168.1.1 67 udp - 0.002103 311 300 SF - 0 Dd 1 339 1 328 (empty)
1258532143.457078 xvWLhxgUmj5 192.168.1.102 1170 192.168.1.1 53 udp dns 0.068511 36 215 SF - 0 Dd 1 64 1 243 (empty)
1258532203.657268 feNcvrZfDbf 192.168.1.104 1174 192.168.1.1 53 udp dns 0.170962 36 215 SF - 0 Dd 1 64 1 243 (empty)
1258532331.365294 aLsTcZJHAwa 192.168.1.1 5353 224.0.0.251 5353 udp dns 0.100381 273 0 S0 - 0 D 2 329 0 0 (empty)

When Zeek rotates logs, it produces compressed batches of *.tar.gz regularly. Ingesting a compressed batch involves unpacking and concatenating the input before sending it to VAST:

gunzip -c *.gz | vast import zeek

import zeek

Synopsis

imports Zeek TSV logs from STDIN or file
parameters:
[-h | -? | --help] prints the help text

Documentation

The import zeek command consumes Zeek logs in tab-separated value (TSV) style, and the import zeek-json command consumes Zeek logs as line-delimited JSON objects as produced by the json-streaming-logs package. Unlike stock Zeek JSON logs, where one file contains exactly one log type, the streaming format contains different log event types in a single stream and uses an additional _path field to disambiguate the log type. For stock Zeek JSON logs, use the existing import json with the -t flag to specify the log type.

Here's an example of a typical Zeek conn.log:

#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path conn
#open 2014-05-23-18-02-04
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents
#types time string addr port addr port enum string interval count count string bool count string count count count count table[string]
1258531221.486539 Pii6cUUq1v4 192.168.1.102 68 192.168.1.1 67 udp - 0.163820 301 300 SF - 0 Dd 1 329 1 328 (empty)
1258531680.237254 nkCxlvNN8pi 192.168.1.103 137 192.168.1.255 137 udp dns 3.780125 350 0 S0 - 0 D 7 546 0 0 (empty)
1258531693.816224 9VdICMMnxQ7 192.168.1.102 137 192.168.1.255 137 udp dns 3.748647 350 0 S0 - 0 D 7 546 0 0 (empty)
1258531635.800933 bEgBnkI31Vf 192.168.1.103 138 192.168.1.255 138 udp - 46.725380 560 0 S0 - 0 D 3 644 0 0 (empty)
1258531693.825212 Ol4qkvXOksc 192.168.1.102 138 192.168.1.255 138 udp - 2.248589 348 0 S0 - 0 D 2 404 0 0 (empty)
1258531803.872834 kmnBNBtl96d 192.168.1.104 137 192.168.1.255 137 udp dns 3.748893 350 0 S0 - 0 D 7 546 0 0 (empty)
1258531747.077012 CFIX6YVTFp2 192.168.1.104 138 192.168.1.255 138 udp - 59.052898 549 0 S0 - 0 D 3 633 0 0 (empty)
1258531924.321413 KlF6tbPUSQ1 192.168.1.103 68 192.168.1.1 67 udp - 0.044779 303 300 SF - 0 Dd 1 331 1 328 (empty)
1258531939.613071 tP3DM6npTdj 192.168.1.102 138 192.168.1.255 138 udp - - - - S0 - 0 D 1 229 0 0 (empty)
1258532046.693816 Jb4jIDToo77 192.168.1.104 68 192.168.1.1 67 udp - 0.002103 311 300 SF - 0 Dd 1 339 1 328 (empty)
1258532143.457078 xvWLhxgUmj5 192.168.1.102 1170 192.168.1.1 53 udp dns 0.068511 36 215 SF - 0 Dd 1 64 1 243 (empty)
1258532203.657268 feNcvrZfDbf 192.168.1.104 1174 192.168.1.1 53 udp dns 0.170962 36 215 SF - 0 Dd 1 64 1 243 (empty)
1258532331.365294 aLsTcZJHAwa 192.168.1.1 5353 224.0.0.251 5353 udp dns 0.100381 273 0 S0 - 0 D 2 329 0 0 (empty)

When Zeek rotates logs, it produces compressed batches of *.tar.gz regularly. Ingesting a compressed batch involves unpacking and concatenating the input before sending it to VAST:

gunzip -c *.gz | vast import zeek