summarize
Groups events and applies aggregate functions to each group.
Description
The summarize
operator groups events according to certain fields and applies
aggregation functions to each group. The operator consumes the entire input
before producing any output.
The order of the output fields follows the sequence of the provided arguments. Unspecified fields are dropped.
Take care when using this operator with large inputs.
group
To group by a certain field, use the syntax <field>
or <field>=<field>
. For
each unique combination of the group
fields, a single output event will be
returned.
aggregation
The aggregation functions applied to each group are specified with f(…)
or
<field>=f(…)
, where f
is the name of an aggregation function (see below) and
<field>
is an optional name for the result. The aggregation function will
produce a single result for each group.
If no name is specified, it will be automatically generated from the aggregation
function call. If processing continues after summarize
, it is strongly
recommended to specify a custom name.
The following aggregation functions are available and, unless specified differently, take exactly one argument:
sum
: Computes the sum of all grouped values.min
: Computes the minimum of all grouped values.max
: Computes the maximum of all grouped values.any
: Computes the disjunction (OR) of all grouped values. Requires the values to be booleans.all
: Computes the conjunction (AND) of all grouped values. Requires the values to be booleans.first
: Takes the first of all grouped values that is not null.last
: Takes the last of all grouped values that is not null.mean
: Computes the mean of all grouped values.median
: Computes the approximate median of all grouped values with a t-digest algorithm.mode
: Takes the most common of all grouped values that is not null.value_counts
: Returns a list of all grouped values alongside their frequency.quantile
: Computes the quantile specified by the named argumentq
, for example:quantile(x, q=0.2)
.stddev
: Computes the standard deviation of all grouped values.variance
: Computes the variance of all grouped values.distinct
: Creates a sorted list without duplicates of all grouped values that are not null.collect
: Creates a list of all grouped values that are not null, preserving duplicates.count
: When used ascount()
, simply counts the events in the group. When used ascount(x)
, counts all grouped values that are not null.count_distinct
: Counts all distinct grouped values that are not null.
Examples
Compute the sum of x
over all events:
Group over y
and compute the sum of x
for each group:
Group the input by src_ip
and aggregate all unique dest_port
values into a
list:
Same as above, but produce a count of the unique number of values instead of a list:
Compute minimum and maximum of the timestamp
field per src_ip
group:
Compute minimum and maximum of the timestamp
field over all events:
Create a boolean flag originator
that is true
if any value in the src_ip
group is true
:
Create 1-hour groups and produce a summary of network traffic between host pairs: