Skip to main content
Version: v4.24

metrics

Retrieves metrics events from a Tenzir node.

Synopsis

metrics [--live] [--retro] [<name>]

Description

The metrics operator retrieves metrics events from a Tenzir node. Metrics events are collected every second.

--live

Work on all metrics events as they are generated in real-time instead of on metrics events persisted at a Tenzir node.

--retro

Work on persisted diagnostic events (first), even when --live is given.

See export operator for more details.

<name>

Show only metrics with the specified name. For example, metrics cpu only shows CPU metrics.

Schemas

Tenzir collects metrics with the following schemas.

tenzir.metrics.api

Contains information about all accessed API endpoints, emitted once per second.

FieldTypeDescription
timestamptimeThe time at which the API request was received.
request_idstringThe unique request ID assigned by the Tenzir Platform.
methoddoubleThe HTTP method used to access the API.
pathdoubleThe path of the accessed API endpoint.
response_timedurationThe time the API endpoint took to respond.
status_codeuint64The HTTP status code of the API response.
paramsrecordThe API endpoints parameters passed inused.

The schema of the record params depends on the API endpoint used. Refer to the API documentation to see the available parameters per endpoint.

tenzir.metrics.buffer

Contains information about the buffer operator's internal buffer.

FieldTypeDescription
pipeline_idstringThe ID of the pipeline where the associated operator is from.
runuint64The number of the run, starting at 1 for the first run.
hiddenboolTrue if the pipeline is running for the explorer.
timestamptimeThe time at which this metric was recorded.
operator_iduint64The ID of the buffer operator in the pipeline.
useduint64The number of events stored in the buffer.
freeuint64The remaining capacity of the buffer.
droppeduint64The number of events dropped by the buffer.

tenzir.metrics.cpu

Contains a measurement of CPU utilization.

FieldTypeDescription
timestamptimeThe time at which this metric was recorded.
loadavg_1mdoubleThe load average over the last minute.
loadavg_5mdoubleThe load average over the last 5 minutes.
loadavg_15mdoubleThe load average over the last 15 minutes.

tenzir.metrics.disk

Contains a measurement of disk space usage.

FieldTypeDescription
timestamptimeThe time at which this metric was recorded.
pathstringThe byte measurements below refer to the filesystem on which this path is located.
total_bytesuint64The total size of the volume, in bytes.
used_bytesuint64The number of bytes occupied on the volume.
free_bytesuint64The number of bytes still free on the volume.

tenzir.metrics.enrich

Contains a measurement the enrich operator, emitted once every second.

FieldTypeDescription
pipeline_idstringThe ID of the pipeline where the associated operator is from.
runuint64The number of the run, starting at 1 for the first run.
hiddenboolTrue if the pipeline is running for the explorer.
timestamptimeThe time at which this metric was recorded.
operator_iduint64The ID of the enrich operator in the pipeline.
contextstringThe name of the context the associated operator is using.
eventsuint64The amount of input events that entered the enrich operator since the last metric.
hitsuint64The amount of successfully enriched events since the last metric.

tenzir.metrics.export

Contains a measurement the export operator, emitted once every second per schema. Note that internal events like metrics or diagnostics do not emit metrics themselves.

FieldTypeDescription
pipeline_idstringThe ID of the pipeline where the associated operator is from.
runuint64The number of the run, starting at 1 for the first run.
hiddenboolTrue if the pipeline is running for the explorer.
timestamptimeThe time at which this metric was recorded.
operator_iduint64The ID of the export operator in the pipeline.
schemastringThe schema name of the batch.
schema_idstringThe schema ID of the batch.
eventsuint64The amount of events that were imported.
queued_eventsuint64The total amount of events that are enqueued in the export.

tenzir.metrics.import

Contains a measurement the import operator, emitted once every second per schema. Note that internal events like metrics or diagnostics do not emit metrics themselves.

FieldTypeDescription
pipeline_idstringThe ID of the pipeline where the associated operator is from.
runuint64The number of the run, starting at 1 for the first run.
hiddenboolTrue if the pipeline is running for the explorer.
timestamptimeThe time at which this metric was recorded.
operator_iduint64The ID of the import operator in the pipeline.
schemastringThe schema name of the batch.
schema_idstringThe schema ID of the batch.
eventsuint64The amount of events that were imported.

tenzir.metrics.ingest

Contains a measurement of all data ingested into the database, emitted once per second and schema.

FieldTypeDescription
timestamptimeThe time at which this metric was recorded.
schemastringThe schema name of the batch.
schema_idstringThe schema ID of the batch.
eventsuint64The amount of events that were ingested.

tenzir.metrics.lookup

Contains a measurement of the lookup operator, emitted once every second.

FieldTypeDescription
pipeline_idstringThe ID of the pipeline where the associated operator is from.
runuint64The number of the run, starting at 1 for the first run.
hiddenboolTrue if the pipeline is running for the explorer.
timestamptimeThe time at which this metric was recorded.
operator_iduint64The ID of the lookup operator in the pipeline.
contextstringThe name of the context the associated operator is using.
liverecordInformation about the live lookup.
retrorecordInformation about the retroactive lookup.
context_updatesuint64The amount of times the underlying context has been updated while the associated lookup is active.

The record live has the following schema:

FieldTypeDescription
eventsuint64The amount of input events used for the live lookup since the last metric.
hitsuint64The amount of live lookup matches since the last metric.

The record retro has the following schema:

FieldTypeDescription
eventsuint64The amount of input events used for the lookup since the last metric.
hitsuint64The amount of lookup matches since the last metric.
queued_eventsuint64The total amount of events that were in the queue for the lookup.

tenzir.metrics.memory

Contains a measurement of the available memory on the host.

FieldTypeDescription
timestamptimeThe time at which this metric was recorded.
total_bytesuint64The total available memory, in bytes.
used_bytesuint64The amount of memory used, in bytes.
free_bytesuint64The amount of free memory, in bytes.

tenzir.metrics.operator

Contains input and output measurements over some amount of time for a single operator instantiation.

FieldTypeDescription
pipeline_idstringThe ID of the pipeline where the associated operator is from.
runuint64The number of the run, starting at 1 for the first run.
hiddenboolTrue if the pipeline is running for the explorer.
timestamptimeThe time when this event was emitted (immediately after the collection period).
operator_iduint64The ID of the operator inside the pipeline referenced above.
sourceboolTrue if this is the first operator in the pipeline.
transformationboolTrue if this is neither the first nor the last operator.
sinkboolTrue if this is the last operator in the pipeline.
internalboolTrue if the data flow is considered to internal to Tenzir.
durationdurationThe timespan over which this data was collected.
starting_durationdurationThe time spent to start the operator.
processing_durationdurationThe time spent processing the data.
scheduled_durationdurationThe time that the operator was scheduled.
running_durationdurationThe time that the operator was running.
paused_durationdurationThe time that the operator was paused.
inputrecordMeasurement of the incoming data stream.
outputrecordMeasurement of the outgoing data stream.

The records input and output have the following schema:

FieldTypeDescription
unitstringThe type of the elements, which is void, bytes or events.
elementsuint64Number of elements that were seen during the collection period.
approx_bytesuint64An approximation for the number of bytes transmitted.

tenzir.metrics.platform

Signals whether the connection to the Tenzir Platform is working from the node's perspective. Emitted once per second.

FieldTypeDescription
timestamptimeThe time at which this metric was recorded.
connectedboolThe connection status.

tenzir.metrics.process

Contains a measurement of the amount of memory used by the tenzir-node process.

FieldTypeDescription
timestamptimeThe time at which this metric was recorded.
current_memory_usageuint64The memory currently used by this process.
peak_memory_usageuint64The peak amount of memory, in bytes.
swap_space_usageuint64The amount of swap space, in bytes. Only available on Linux systems.
open_fdsuint64The amount of open file descriptors by the node. Only available on Linux systems.

tenzir.metrics.publish

Contains a measurement of the publish operator, emitted once every second per schema.

FieldTypeDescription
pipeline_idstringThe ID of the pipeline where the associated operator is from.
runuint64The number of the run, starting at 1 for the first run.
hiddenboolTrue if the pipeline is running for the explorer.
timestamptimeThe time at which this metric was recorded.
operator_iduint64The ID of the publish operator in the pipeline.
topicstringThe topic name.
schemastringThe schema name of the batch.
schema_idstringThe schema ID of the batch.
eventsuint64The amount of events that were published to the topic.

tenzir.metrics.rebuild

Contains a measurement of the partition rebuild process, emitted once every second.

FieldTypeDescription
timestamptimeThe time at which this metric was recorded.
partitionsuint64The number of partitions currently being rebuilt.
queued_partitionsuint64The number of partitions currently queued for rebuilding.

tenzir.metrics.subscribe

Contains a measurement of the subscribe operator, emitted once every second per schema.

FieldTypeDescription
pipeline_idstringThe ID of the pipeline where the associated operator is from.
runuint64The number of the run, starting at 1 for the first run.
hiddenboolTrue if the pipeline is running for the explorer.
timestamptimeThe time at which this metric was recorded.
operator_iduint64The ID of the subscribe operator in the pipeline.
topicstringThe topic name.
schemastringThe schema name of the batch.
schema_idstringThe schema ID of the batch.
eventsuint64The amount of events that were retrieved from the topic.

tenzir.metrics.tcp

Contains measurements about the number of read calls and the received bytes per TCP connection.

FieldTypeDescription
pipeline_idstringThe ID of the pipeline where the associated operator is from.
runuint64The number of the run, starting at 1 for the first run.
hiddenboolTrue if the pipeline is running for the explorer.
timestamptimeThe time at which this metric was recorded.
operator_iduint64The ID of the publish operator in the pipeline.
nativestringThe native handle of the connection (unix: file descriptor).
readsuint64The number of attempted reads since the last metric.
writesuint64The number of attempted writes since the last metric.
bytes_readuint64The number of bytes received since the last metrics.
bytes_writtenuint64The number of bytes written since the last metrics.

tenzir.metrics.actors

Contains measurements about specific actors.

FieldTypeDescription
timestamptimeThe time at which this metric was recorded.
actor_idstringThe internal actor id.
actor_namestringThe name of the actor.
num_messagesuint64The number of messages in the actor's inbox when the metric was taken.

Examples

Show the CPU usage over the last hour:

metrics
| where #schema == "tenzir.metrics.cpu"
| where timestamp > 1 hour ago
| put timestamp, percent=loadavg_1m
Output
{
  "timestamp": "2023-12-21T12:00:32.631102",
  "percent": 0.40478515625
}
{
  "timestamp": "2023-12-21T11:59:32.626043",
  "percent": 0.357421875
}
{
  "timestamp": "2023-12-21T11:58:32.620327",
  "percent": 0.42578125
}
{
  "timestamp": "2023-12-21T11:57:32.614810",
  "percent": 0.50390625
}
{
  "timestamp": "2023-12-21T11:56:32.609896",
  "percent": 0.32080078125
}
{
  "timestamp": "2023-12-21T11:55:32.605871",
  "percent": 0.5458984375
}

Get the current memory usage:

metrics
| where #schema == "tenzir.metrics.memory"
| sort timestamp desc
| tail 1
| put current_memory_usage
Output
{
  "current_memory_usage": 1083031552
}

Show the total pipeline ingress in bytes for every day over the last week, excluding pipelines run in the Explorer:

metrics
| where #schema == "tenzir.metrics.operator"
| where timestamp > 1 week ago
| where hidden == false and source == true
| summarize bytes=sum(output.approx_bytes) by timestamp resolution 1 day
Output
{
  "timestamp": "2023-11-08T00:00:00.000000",
  "bytes": 79927223
}
{
  "timestamp": "2023-11-09T00:00:00.000000",
  "bytes": 51788928
}
{
  "timestamp": "2023-11-10T00:00:00.000000",
  "bytes": 80740352
}
{
  "timestamp": "2023-11-11T00:00:00.000000",
  "bytes": 75497472
}
{
  "timestamp": "2023-11-12T00:00:00.000000",
  "bytes": 55497472
}
{
  "timestamp": "2023-11-13T00:00:00.000000",
  "bytes": 76546048
}
{
  "timestamp": "2023-11-14T00:00:00.000000",
  "bytes": 68643200
}

Show the three operator instantiations that produced the most events in total and their pipeline IDs:

metrics
| where #schema == "tenzir.metrics.operator"
| where output.unit == "events"
| summarize events=max(output.elements) by pipeline_id, operator_id
| sort events desc
| head 3
Output
{
  "pipeline_id": "70a25089-b16c-448d-9492-af5566789b99",
  "operator_id": 0,
  "events": 391008694
}
{
  "pipeline_id": "7842733c-06d6-4713-9b80-e20944927207",
  "operator_id": 0,
  "events": 246914949
}
{
  "pipeline_id": "6df003be-0841-45ad-8be0-56ff4b7c19ef",
  "operator_id": 1,
  "events": 83013294
}

Get the disk usage over time:

metrics
| where #schema == "tenzir.metrics.disk"
| sort timestamp
| put timestamp, used_bytes
Output
{
  "timestamp": "2023-12-21T12:52:32.900086",
  "used_bytes": 461834444800
}
{
  "timestamp": "2023-12-21T12:53:32.905548",
  "used_bytes": 461834584064
}
{
  "timestamp": "2023-12-21T12:54:32.910918",
  "used_bytes": 461840302080
}
{
  "timestamp": "2023-12-21T12:55:32.916200",
  "used_bytes": 461842751488
}

Get the memory usage over time:

metrics
| where #schema == "tenzir.metrics.memory"
| sort timestamp
| put timestamp, used_bytes
Output
{
  "timestamp": "2023-12-21T13:08:32.982083",
  "used_bytes": 48572645376
}
{
  "timestamp": "2023-12-21T13:09:32.986962",
  "used_bytes": 48380682240
}
{
  "timestamp": "2023-12-21T13:10:32.992494",
  "used_bytes": 48438878208
}
{
  "timestamp": "2023-12-21T13:11:32.997889",
  "used_bytes": 48491839488
}
{
  "timestamp": "2023-12-21T13:12:33.003323",
  "used_bytes": 48529952768
}

Get inbound TCP traffic over time:

metrics tcp
| sort timestamp
| put timestamp, port, handle, reads, bytes
Output
{
  "timestamp": "2024-09-04T15:43:38.011350",
  "port": 10000,
  "handle": "12",
  "reads": 884,
  "writes": 0,
  "bytes_read": 10608,
  "bytes_written": 0
}
{
  "timestamp": "2024-09-04T15:43:39.013575",
  "port": 10000,
  "handle": "12",
  "reads": 428,
  "writes": 0,
  "bytes_read": 5136,
  "bytes_written": 0
}
{
  "timestamp": "2024-09-04T15:43:40.015376",
  "port": 10000,
  "handle": "12",
  "reads": 429,
  "writes": 0,
  "bytes_read": 5148,
  "bytes_written": 0
}

Get actor internals:

metrics actor
| where timestamp > 1 hour ago
| sort timestamp
Output
{
  "timestamp": "2024-10-15T12:44:33.234964",
  "id": "13",
  "name": "importer",
  "inbox_size": 1
}
{
  "timestamp": "2024-10-15T12:44:33.234980",
  "id": "8",
  "name": "node",
  "inbox_size": 0
}
{
  "timestamp": "2024-10-15T12:44:33.783078",
  "id": "12",
  "name": "index",
  "inbox_size": 0
}