Metrics

VAST keeps detailed track of system metrics that reflect runtime state, such ingestion performance, query latencies, and resource usage. The metrics data behaves like ordinary user data: querying it works the same way.

Internally, every component sends its metrics to a central accountant actor, which in turn relays the metrics feed back to the regular ingestion path.

Usage

The accountant is disabled by default and waits for metrics reports from other components. It represents its data as the vast.metrics event layout and can thus be queried like any other in VAST.

For example, the following query searches for all metrics events with the key pcap-reader.recv, which keeps track of a delta of received packets on all interfaces VAST listens on. With JSON as export format, we can use jq to calculate the sum of these values to get the total amount of packets seen on all interfaces.

vast export json '#type == "vast.metrics" && key == "pcap-reader.recv"' |
jq -s 'map(.value | tonumber) | add'
note

Collecting metrics is optional and incurs minimal overhead. We recommend enabling the accountant unless disk space is scarce or every last bit of performance needs to be made available to other components of VAST.

Configuration

You can enable the accountant and thereby metrics collection either when starting a server (vast --enable-metrics start) or the vast.yaml:

vast:
enable-metrics: true

By default, VAST reports metrics using the self sink, i.e., they are ingested as vast.metrics events back into VAST. This sink has the following configuration options:

vast:
metrics:
self-sink:
enable: true
slice-size: 100
slice-type: arrow

Alternative sinks are the file- and the UDS-sinks. Sinks can be enabled individually, and multiple can be used at the same time:

vast:
metrics:
# Configures if and where metrics should be written to a file.
file-sink:
enable: false
real-time: false
path: "/tmp/vast-metrics.log"
# Configures if and where metrics should be written to a socket.
uds-sink:
enable: false
real-time: false
path: "/tmp/vast-metrics.sock"
type: "datagram"

For the file and UDS sinks, metrics are buffered by default. To enable real-time metrics reporting, enable the options vast.metrics.file-sink.real-time or vast.metrics.uds-sink.real-time respectively in your configuration file. configuration file.

Data Representation

The accountant generates events of type vast.metrics, which has the following schema:

type vast.metrics = record {
ts: time #timestamp,
nodeid: string,
aid: count,
actor_name: string,
key: string,
value: string,
}

Available Keys

KeyDescriptionUnit
archive.rateThe rate of events processed by the archive component.#events/second
arrow-writer.rateThe rate of events processed by the Arrow sink.#events/second
ascii-writer.rateThe rate of events processed by the ascii sink.#events/second
csv-reader.rateThe rate of events processed by the CSV source.#events/second
csv-writer.rateThe rate of events processed by the CSV sink.#events/second
exporter.hits.arrivedThe runtime when the current index hits arrived.nanoseconds
exporter.hits.countThe number of events in the partial index result.#events
exporter.hits.firstThe runtime when the first index hits arrived.nanoseconds
exporter.hits.runtimeThe total runtime until all partial index hits arrived.nanoseconds
exporter.hitsThe total number of index hits.#events
exporter.processedThe number of processed events for the current query.#events
exporter.resultsThe number of results for the current query.#events
exporter.runtimeThe runtime for the current query in nanoseconds.nanoseconds
exporter.selectivityThe rate of results out of processed events.#events-results/#events-processed
exporter.shippedThe number of shipped events for the current query.#events
importer.rateThe rate of events processed by the importer component.#events/second
json-reader.invalid-lineThe number of invalid NDJSON lines.#events
json-reader.rateThe rate of events processed by the JSON source.#events/second
json-reader.unknown-layoutThe number if NDJSON lines with an unknown layout.#event
json-writer.rateThe rate of events processed by the JSON sink.#events/second
node_throughput.rateThe rate of events processed by the node component.#events/second
null-writer.rateThe rate of events processed by the null sink.#events/second
pcap-reader.discard-rateThe rate of packets discarded.#events-dropped/#events-received
pcap-reader.discardThe number of packets discarded by the reader.#events
pcap-reader.drop-rateThe rate of packets dropped.#events-dropped/#events-received
pcap-reader.dropThe number of packets dropped by the reader.#events
pcap-reader.ifdropThe number of packets dropped by the network interface.#events
pcap-reader.rateThe rate of events processed by the PCAP source.#events/second
pcap-reader.recvThe number of packets received.#events
pcap-writer.rateThe rate of events processed by the PCAP sink.#events/second
source.startTimepoint when the source started.nanoseconds since epoch
source.stopTimepoint when the source stopped.nanoseconds since epoch
syslog-reader.rateThe rate of events processed by the syslog source.#events/second
test-reader.rateThe rate of events processed by the test source.#events/second
zeek-reader.rateThe rate of events processed by the Zeek source.#events/second

Note that for all keys that show throughput rates in #events/second, i.e., <component>.rate, the keys <component>.events and <component>.duration are dividend and divisor respectively. They are not listed explicitly in the above table.

Generally, counts are reset after a telemetry report is sent out by a component. E.g., the total number of invalid lines the JSON reader encountered is reflected by the sum of all json-reader.invalid-line events.

TODO

The above list is incomplete. We will gradually add keys and their descriptions over time. Stay tuned.