Skip to content

metrics

Retrieves metrics events from a Tenzir node.

metrics [name:string, live=bool, retro=bool]

The metrics operator retrieves metrics events from a Tenzir node. Metrics events are collected every second.

Show only metrics with the specified name. For example, metrics "cpu" only shows CPU metrics.

Work on all metrics events as they are generated in real-time instead of on metrics events persisted at a Tenzir node.

Work on persisted diagnostic events (first), even when live is given.

Tenzir collects metrics with the following schemas.

Contains information about all accessed API endpoints, emitted once per second.

{
timestamp: time, // The time at which the API request was received.
request_id: string, // The unique request ID assigned by the Tenzir Platform.
method: string, // The HTTP method used to access the API.
path: string, // The path of the accessed API endpoint.
response_time: duration, // The time the API endpoint took to respond.
status_code: uint64, // The HTTP status code of the API response.
params: record, // The API endpoints parameters passed in.
}

The schema of the record params depends on the API endpoint used. Refer to the API documentation to see the available parameters per endpoint.

Contains metrics about the CAF (C++ Actor Framework) runtime system.

{
system: { // Metrics about the CAF actor system.
running_actors: int64, // Number of currently running actors.
running_actors_by_name: [{ // Number of running actors, grouped by actor name.
name: string, // Actor name.
count: int64, // Number of actors with this name currently running.
}],
all_messages: { // Information about the total message metrics.
processed: int64, // Number of processed messages.
rejected: int64, // Number of rejected messages.
},
messages_by_actor: list[{ // List of metrics, grouped by actor.
name: string, // Name of the receiving actor. This may be null for messages without an associated actor.
processed: int64, // Number of processed messages.
rejected: int64, // Number of rejected messages.
}],
},
middleman: { // Metrics about CAF's network layer.
inbound_messages_size: int64, // Size of received messages in bytes since last metric.
outbound_messages_size: int64, // Size of sent messages in bytes since last metric.
serialization_time: duration, // Time spent serializing messages since last metric.
deserialization_time: duration, // Time spent deserializing messages since last metric.
},
actors: list[{ // Per-actor metrics for all running actors.
name: string, // Name of the actor.
processing_time: duration, // Time spent processing messages since last metric.
mailbox_time: duration, // Time messages spent in mailbox since last metric.
mailbox_size: int64, // Current number of messages in actor's mailbox.
}],
}

Contains information about the buffer operator’s internal buffer.

{
pipeline_id: string, // The ID of the pipeline where the associated operator is from.
run: uint64, // The number of the run, starting at 1 for the first run.
hidden: bool, // Indicates whether the corresponding pipeline is hidden from the list of managed pipelines.
timestamp: time, // The time at which this metric was recorded.
operator_id: uint64, // The ID of the `buffer` operator in the pipeline.
used: uint64, // The number of events stored in the buffer.
free: uint64, // The remaining capacity of the buffer.
dropped: uint64, // The number of events dropped by the buffer.
}

Contains a measurement of CPU utilization.

{
timestamp: time, // The time at which this metric was recorded.
loadavg_1m: double, // The load average over the last minute.
loadavg_5m: double, // The load average over the last 5 minutes.
loadavg_15m: double, // The load average over the last 15 minutes.
}

Contains a measurement of disk space usage.

{
timestamp: time, // The time at which this metric was recorded.
path: string, // The byte measurements below refer to the filesystem on which this path is located.
total_bytes: uint64, // The total size of the volume, in bytes.
used_bytes: uint64, // The number of bytes occupied on the volume.
free_bytes: uint64, // The number of bytes still free on the volume.
}

Contains a measurement of the enrich operator, emitted once every second.

{
pipeline_id: string, // The ID of the pipeline where the associated operator is from.
run: uint64, // The number of the run, starting at 1 for the first run.
hidden: bool, // Indicates whether the corresponding pipeline is hidden from the list of managed pipelines.
timestamp: time, // The time at which this metric was recorded.
operator_id: uint64, // The ID of the `enrich` operator in the pipeline.
context: string, // The name of the context the associated operator is using.
events: uint64, // The amount of input events that entered the `enrich` operator since the last metric.
hits: uint64, // The amount of successfully enriched events since the last metric.
}

Contains a measurement of the export operator, emitted once every second per schema. Note that internal events like metrics or diagnostics do not emit metrics themselves.

{
pipeline_id: string, // The ID of the pipeline where the associated operator is from.
run: uint64, // The number of the run, starting at 1 for the first run.
hidden: bool, // Indicates whether the corresponding pipeline is hidden from the list of managed pipelines.
timestamp: time, // The time at which this metric was recorded.
operator_id: uint64, // The ID of the `export` operator in the pipeline.
schema: string, // The schema name of the batch.
schema_id: string, // The schema ID of the batch.
events: uint64, // The amount of events that were imported.
queued_events: uint64, // The total amount of events that are enqueued in the export.
}

Contains a measurement the import operator, emitted once every second per schema. Note that internal events like metrics or diagnostics do not emit metrics themselves.

{
pipeline_id: string, // The ID of the pipeline where the associated operator is from.
run: uint64, // The number of the run, starting at 1 for the first run.
hidden: bool, // Indicates whether the corresponding pipeline is hidden from the list of managed pipelines.
timestamp: time, // The time at which this metric was recorded.
operator_id: uint64, // The ID of the `import` operator in the pipeline.
schema: string, // The schema name of the batch.
schema_id: string, // The schema ID of the batch.
events: uint64, // The amount of events that were imported.
}

Contains a measurement of all data ingested into the database, emitted once per second and schema.

{
timestamp: time, // The time at which this metric was recorded.
schema: string, // The schema name of the batch.
schema_id: string, // The schema ID of the batch.
events: uint64, // The amount of events that were ingested.
}

Contains a measurement of the lookup operator, emitted once every second.

{
pipeline_id: string, // The ID of the pipeline where the associated operator is from.
run: uint64, // The number of the run, starting at 1 for the first run.
hidden: bool, // Indicates whether the corresponding pipeline is hidden from the list of managed pipelines.
timestamp: time, // The time at which this metric was recorded.
operator_id: uint64, // The ID of the `lookup` operator in the pipeline.
context: string, // The name of the context the associated operator is using.
live: { // Information about the live lookup.
events: uint64, // The amount of input events used for the live lookup since the last metric.
hits: uint64, // The amount of live lookup matches since the last metric.
},
retro: { // Information about the retroactive lookup.
events: uint64, // The amount of input events used for the lookup since the last metric.
hits: uint64, // The amount of lookup matches since the last metric.
queued_events: uint64, // The total amount of events that were in the queue for the lookup.
},
context_updates: uint64, // The amount of times the underlying context has been updated while the associated lookup is active.
}

Contains statistics about allocated memory.

{
timestamp: time, // The time at which this metric was recorded.
system: { // Information about the systems memory state.
total_bytes: int, // Total available memory in the system.
used_bytes: int, // Amount of memory used on the system.
free_bytes: int, // Amount of free memory on the system.
},
process: {
peak_bytes: int, // Peak memory usage during the runtime of the process.
current_bytes: int, // Current memory usage of the entire process.
swap_bytes: int, // Swap space used by the process.
},
arrow: { // Information about memory allocated by Arrow buffers.
bytes: {
current: int, // Currently allocated bytes
peak: int, // Peak allocated bytes during this run
cumulative: int, // Cumulative allocations during this run
},
allocations: {
current: int, // Number of current allocations
peak: int, // Peak number of allocations
cumulative: int, // Cumulative allocations during this run
},
},
cpp: { /// Information about memory allocated by `operator new`
bytes: {
current: int, // Currently allocated bytes
peak: int, // Peak allocated bytes during this run
cumulative: int, // Cumulative allocations during this run
},
allocations: {
current: int, // Number of current allocations
peak: int, // Peak number of allocations
cumulative: int, // Cumulative allocations during this run
},
},
c: { /// Information about memory allocated `malloc` and other C/POSIX functions.
bytes: {
current: int, // Currently allocated bytes
peak: int, // Peak allocated bytes during this run
cumulative: int, // Cumulative allocations during this run
},
allocations: {
current: int, // Number of current allocations
peak: int, // Peak number of allocations
cumulative: int, // Cumulative allocations during this run
},
},
}

Contains input and output measurements over some amount of time for a single operator instantiation.

{
pipeline_id: string, // The ID of the pipeline where the associated operator is from.
run: uint64, // The number of the run, starting at 1 for the first run.
hidden: bool, // Indicates whether the corresponding pipeline is hidden from the list of managed pipelines.
timestamp: time, // The time when this event was emitted (immediately after the collection period).
operator_id: uint64, // The ID of the operator inside the pipeline referenced above.
source: bool, // True if this is the first operator in the pipeline.
transformation: bool, // True if this is neither the first nor the last operator.
sink: bool, // True if this is the last operator in the pipeline.
internal: bool, // True if the data flow is considered to internal to Tenzir.
duration: duration, // The timespan over which this data was collected.
starting_duration: duration, // The time spent to start the operator.
processing_duration: duration, // The time spent processing the data.
scheduled_duration: duration, // The time that the operator was scheduled.
running_duration: duration, // The time that the operator was running.
paused_duration: duration, // The time that the operator was paused.
input: { // Measurement of the incoming data stream.
unit: string, // The type of the elements, which is `void`, `bytes` or `events`.
elements: uint64, // Number of elements that were seen during the collection period.
approx_bytes: uint64, // An approximation for the number of bytes transmitted.
batches: uint64, // The number of batches included in this metric.
},
output: { // Measurement of the outgoing data stream.
unit: string, // The type of the elements, which is `void`, `bytes` or `events`.
elements: uint64, // Number of elements that were seen during the collection period.
approx_bytes: uint64, // An approximation for the number of bytes transmitted.
batches: uint64, // The number of batches included in this metric.
},
}

Contains measurements of data flowing through pipelines, emitted once every 10 seconds.

{
timestamp: time, // The time at which this metric was recorded.
pipeline_id: string, // The ID of the pipeline these metrics represent.
ingress: { // Measurement of data entering the pipeline.
duration: duration, // The timespan over which this data was collected.
events: uint64, // Number of events that passed through during this period.
bytes: uint64, // Approximate number of bytes that passed through.
batches: uint64, // Number of batches that passed through.
internal: bool, // True if the data flow is considered internal to Tenzir.
},
egress: { // Measurement of data exiting the pipeline.
duration: duration, // The timespan over which this data was collected.
events: uint64, // Number of events that passed through during this period.
bytes: uint64, // Approximate number of bytes that passed through.
batches: uint64, // Number of batches that passed through.
internal: bool, // True if the data flow is considered internal to Tenzir.
},
}

Signals whether the connection to the Tenzir Platform is working from the node’s perspective. Emitted once per second.

{
timestamp: time, // The time at which this metric was recorded.
connected: bool, // The connection status.
}

Contains a measurement of the amount of memory used by the tenzir-node process.

{
timestamp: time, // The time at which this metric was recorded.
current_memory_usage: uint64, // The memory currently used by this process.
peak_memory_usage: uint64, // The peak amount of memory, in bytes.
swap_space_usage: uint64, // The amount of swap space, in bytes. Only available on Linux systems.
open_fds: uint64, // The amount of open file descriptors by the node. Only available on Linux systems.
}

Contains a measurement of the publish operator, emitted once every second per schema.

{
pipeline_id: string, // The ID of the pipeline where the associated operator is from.
run: uint64, // The number of the run, starting at 1 for the first run.
hidden: bool, // Indicates whether the corresponding pipeline is hidden from the list of managed pipelines.
timestamp: time, // The time at which this metric was recorded.
operator_id: uint64, // The ID of the `publish` operator in the pipeline.
topic: string, // The topic name.
schema: string, // The schema name of the batch.
schema_id: string, // The schema ID of the batch.
events: uint64, // The amount of events that were published to the `topic`.
}

Contains a measurement of the partition rebuild process, emitted once every second.

{
timestamp: time, // The time at which this metric was recorded.
partitions: uint64, // The number of partitions currently being rebuilt.
queued_partitions: uint64, // The number of partitions currently queued for rebuilding.
}

Contains a measurement of the subscribe operator, emitted once every second per schema.

{
pipeline_id: string, // The ID of the pipeline where the associated operator is from.
run: uint64, // The number of the run, starting at 1 for the first run.
hidden: bool, // Indicates whether the corresponding pipeline is hidden from the list of managed pipelines.
timestamp: time, // The time at which this metric was recorded.
operator_id: uint64, // The ID of the `subscribe` operator in the pipeline.
topic: string, // The topic name.
schema: string, // The schema name of the batch.
schema_id: string, // The schema ID of the batch.
events: uint64, // The amount of events that were retrieved from the `topic`.
}

Contains measurements about the number of read calls and the received bytes per TCP connection.

{
pipeline_id: string, // The ID of the pipeline where the associated operator is from.
run: uint64, // The number of the run, starting at 1 for the first run.
hidden: bool, // Indicates whether the corresponding pipeline is hidden from the list of managed pipelines.
timestamp: time, // The time at which this metric was recorded.
operator_id: uint64, // The ID of the `publish` operator in the pipeline.
native: string, // The native handle of the connection (unix: file descriptor).
reads: uint64, // The number of attempted reads since the last metric.
writes: uint64, // The number of attempted writes since the last metric.
bytes_read: uint64, // The number of bytes received since the last metrics.
bytes_written: uint64, // The number of bytes written since the last metrics.
}
metrics "pipeline"
summarize pipeline_id, ingress=sum(ingress.bytes if not ingress.internal)
sort -ingress
{pipeline_id: "demo-node/m57-suricata", ingress: 59327586}
{pipeline_id: "demo-node/m57-zeek", ingress: 43291764}
metrics "cpu"
where timestamp > now() - 1h
select timestamp, percent=loadavg_1m
{timestamp: 2023-12-21T12:00:32.631102, percent: 0.40478515625}
{timestamp: 2023-12-21T11:59:32.626043, percent: 0.357421875}
{timestamp: 2023-12-21T11:58:32.620327, percent: 0.42578125}
{timestamp: 2023-12-21T11:57:32.614810, percent: 0.50390625}
{timestamp: 2023-12-21T11:56:32.609896, percent: 0.32080078125}
{timestamp: 2023-12-21T11:55:32.605871, percent: 0.5458984375}
metrics "memory"
sort -timestamp
tail 1
select current_memory_usage
{current_memory_usage: 1083031552}

Show the inggress for every day over the last week, excluding pipelines that run in the Explorer:

metrics "operator"
where timestamp > now() - 1week
where source and not hidden
timestamp = floor(timestamp, 1day)
summarize timestamp, bytes=sum(output.approx_bytes)
{timestamp: 2023-11-08T00:00:00.000000, bytes: 79927223}
{timestamp: 2023-11-09T00:00:00.000000, bytes: 51788928}
{timestamp: 2023-11-10T00:00:00.000000, bytes: 80740352}
{timestamp: 2023-11-11T00:00:00.000000, bytes: 75497472}
{timestamp: 2023-11-12T00:00:00.000000, bytes: 55497472}
{timestamp: 2023-11-13T00:00:00.000000, bytes: 76546048}
{timestamp: 2023-11-14T00:00:00.000000, bytes: 68643200}

Show the operators that produced the most events

Section titled “Show the operators that produced the most events”

Show the three operator instantiations that produced the most events in total and their pipeline IDs:

metrics "operator"
where output.unit == "events"
summarize pipeline_id, operator_id, events=max(output.elements)
sort -events
head 3
{pipeline_id: "70a25089-b16c-448d-9492-af5566789b99", operator_id: 0, events: 391008694 }
{pipeline_id: "7842733c-06d6-4713-9b80-e20944927207", operator_id: 0, events: 246914949 }
{pipeline_id: "6df003be-0841-45ad-8be0-56ff4b7c19ef", operator_id: 1, events: 83013294 }
metrics "disk"
sort timestamp
select timestamp, used_bytes
{timestamp: 2023-12-21T12:52:32.900086, used_bytes: 461834444800}
{timestamp: 2023-12-21T12:53:32.905548, used_bytes: 461834584064}
{timestamp: 2023-12-21T12:54:32.910918, used_bytes: 461840302080}
{timestamp: 2023-12-21T12:55:32.916200, used_bytes: 461842751488}
metrics "memory"
sort timestamp
select timestamp, used_bytes
{timestamp: 2023-12-21T13:08:32.982083, used_bytes: 48572645376}
{timestamp: 2023-12-21T13:09:32.986962, used_bytes: 48380682240}
{timestamp: 2023-12-21T13:10:32.992494, used_bytes: 48438878208}
{timestamp: 2023-12-21T13:11:32.997889, used_bytes: 48491839488}
{timestamp: 2023-12-21T13:12:33.003323, used_bytes: 48529952768}
metrics "tcp"
sort timestamp
select timestamp, port, handle, reads, bytes
{
timestamp: 2024-09-04T15:43:38.011350,
port: 10000,
handle: "12",
reads: 884,
writes: 0,
bytes_read: 10608,
bytes_written: 0
}
{
timestamp: 2024-09-04T15:43:39.013575,
port: 10000,
handle: "12",
reads: 428,
writes: 0,
bytes_read: 5136,
bytes_written: 0
}
{
timestamp: 2024-09-04T15:43:40.015376,
port: 10000,
handle: "12",
reads: 429,
writes: 0,
bytes_read: 5148,
bytes_written: 0
}

diagnostics

Last updated: