import
Synopsis
Documentation
The import
command ingests data. An optional filter expression allows for
restricing the input to matching events. The format of the imported data must
be explicitly specified:
The import
command is the dual to the export
command.
This is easiest explained on an example:
The above command signals the running node to ingest (i.e., to archive and index for later export) all Suricata events from the Eve JSON file passed via standard input.
Filter Expressions
An optional filter expression allows for importing the relevant subset of
information only. For example, a user might want to import Suricata Eve JSON,
but skip over all events of type suricata.stats
.
For more information on the optional filter expression, see the query language documentation.
Format-Specific Options
Some import formats have format-specific options. For example, the pcap
import
format has an interface
option that can be used to ingest PCAPs from a network
interface directly. To retrieve a list of format-specific options, run vast
import <format> help
, and similarly to retrieve format-specific documentation,
run vast import <format> documentation
.
Type Filtering
The --type
option filters known event types based on a prefix. E.g., vast
import json --type=zeek
matches all event types that begin with zeek
, and
restricts the event types known to the import command accordingly.
VAST permanently tracks imported event types. They do not need to be specified again for consecutive imports.
Batching
The import command parses events into table slices (batches). The following options control the batching:
vast.import.batch-encoding
Selects the encoding of table slices. Available options are msgpack
(row-based) and arrow
(column-based).
vast.import.batch-size
Sets an upper bound for the number of events per table slice.
Most components in VAST operate on table slices, which makes the table slice size a fundamental tuning knob on the spectrum of throughput and latency. Small table slices allow for shorter processing times, resulting in more scheduler context switches and a more balanced workload. However, the increased pressure on the scheduler comes at the cost of throughput. A large table slice size allows actors to spend more time processing a block of memory, but makes them yield less frequently to the scheduler. As a result, other actors scheduled on the same thread may have to wait a little longer.
The vast.import.batch-size
option merely controls number of events per table
slice, but not necessarily the number of events until a component forwards a
batch to the next stage in a stream. The CAF streaming
framework uses
a credit-based flow-control mechanism to determine buffering of tables slices.
Setting vast.import.batch-size
to 0 causes the table slice size to be
unbounded and leaves it to other parameters to determine the actual table slice
size.
vast.import.batch-timeout
Sets a timeout for forwarding buffered table slices to the importer.
The vast.import.batch-timeout
option controls the maximum buffering period
until table slices are forwarded to the node. The default batch timeout is one
second.
vast.import.read-timeout
Sets a timeout for reading from input sources.
The vast.import.read-timeout
option determines how long a call to read data
from the input will block. The process yields and tries again at a later time if
no data is received for the set value. The default read timeout is 20
milliseconds.
import pcap
Synopsis
Documentation
The import pcap
command uses libpcap to read
network packets from a trace or an interface.
VAST automatically calculates the Community
ID for PCAPs for better
pivoting support. The extra computation induces an overhead of approximately 15%
of the ingestion rate. The option --disable-community-id
disables the
computation completely.
The PCAP import format has many additional options that offer a user interface
that should be familiar to users of other tools interacting with PCAPs. To see
a list of all available options, run vast import pcap help
.
Here's an example that reads from the network interface en0
cuts off packets
after 65535 bytes.
import test
Synopsis
Documentation
The import test
command exists primarily for testing and benchmarking
purposes. It generates and ingests random data for a given schema.
import syslog
Synopsis
Documentation
Ingest Syslog messages into VAST. The following formats are supported:
- RFC 5424
- A fallback format that consists only of the Syslog message.
import suricata
Synopsis
Documentation
The import suricata
command format consumes Eve
JSON
logs from Suricata. Eve JSON is Suricata's unified
format to log all types of activity as single stream of line-delimited
JSON.
For each log entry, VAST parses the field event_type
to determine the
specific record type and then parses the data according to the known schema.
To add support for additional fields and event types, adapt the
suricata.schema
file that ships with every installation of VAST.
import json
Synopsis
Documentation
The json
import format consumes line-delimited
JSON objects
according to a specified schema. That is, one line corresponds to one event.
The object field names correspond to record field names.
JSON's can express only a subset VAST's data model. For example, VAST has first-class support IP addresses but JSON can only represent them as strings. To get the most out of your data, it is therefore important to define a schema to get a differentiated view of the data.
The infer
command also supports schema inference for JSON data. For example,
head data.json | vast infer
will print a raw schema that can be supplied to
--schema-file
/ -s
as file or to --schema
/ -S
as string. However, after
infer
dumps the schema, the generic type name should still be adjusted and
this would be the time to make use of more precise types, such as timestamp
instead of time
, or annotate them with additional attributes, such as #skip
.
If no type prefix is specified with --type
/ -t
, or multiple types match
based on the prefix, VAST uses an exact match based on the field names to
automatically deduce the event type for every line in the input.
import csv
Synopsis
Documentation
The import csv
command imports comma-separated
values in tabular form.
The first line in a CSV file must contain a header that describes the field
names. The remaining lines contain concrete values. Except for the header, one
line corresponds to one event.
Because CSV has no notion of typing, it is necessary to select a layout via
--type
whose field names correspond to the CSV header field names. Such a
layout must either be defined in a schema file known to VAST, or be defined in a
schema passed using --schema
or --schema-file
.
E.g., to import Threat Intelligence data into VAST, the known type
intel.indicator
can be used:
import zeek-json
Synopsis
Documentation
The import zeek
command consumes Zeek logs in
tab-separated value (TSV) style, and the import zeek-json
command consumes
Zeek logs as line-delimited
JSON objects
as produced by the
json-streaming-logs package.
Unlike stock Zeek JSON logs, where one file contains exactly one log type, the
streaming format contains different log event types in a single stream and uses
an additional _path
field to disambiguate the log type. For stock Zeek JSON
logs, use the existing import json
with the -t
flag to specify the log type.
Here's an example of a typical Zeek conn.log
:
When Zeek rotates
logs, it
produces compressed batches of *.tar.gz
regularly. Ingesting a compressed
batch involves unpacking and concatenating the input before sending it to VAST:
import zeek
Synopsis
Documentation
The import zeek
command consumes Zeek logs in
tab-separated value (TSV) style, and the import zeek-json
command consumes
Zeek logs as line-delimited
JSON objects
as produced by the
json-streaming-logs package.
Unlike stock Zeek JSON logs, where one file contains exactly one log type, the
streaming format contains different log event types in a single stream and uses
an additional _path
field to disambiguate the log type. For stock Zeek JSON
logs, use the existing import json
with the -t
flag to specify the log type.
Here's an example of a typical Zeek conn.log
:
When Zeek rotates
logs, it
produces compressed batches of *.tar.gz
regularly. Ingesting a compressed
batch involves unpacking and concatenating the input before sending it to VAST: