Skip to content

This guide shows you how to aggregate event streams with summarize and window. You’ll learn to count, group, compute statistics, and build bounded event-time detections over streaming data.

The summarize operator groups events and applies aggregation functions. Its syntax is:

summarize <aggregation>, <aggregation>, ..., <group>, <group>, ...

Where:

  • Aggregations are expressions like fnsum, fncount, fnmean, etc.
  • Groups are field names to group by

Start with fundamental aggregation functions on event streams.

Count total events and unique values with fncount and fncount_distinct:

from {product: "apple", price: 100, category: "fruit"},
{product: "banana", price: 250, category: "fruit"},
{product: "carrot", price: 175, category: "vegetable"},
{product: "apple", price: 120, category: "fruit"},
{product: "banana", price: 225, category: "fruit"}
summarize total_count = count(), unique_products = count_distinct(product)
{
total_count: 5,
unique_products: 3
}

Calculate totals and averages:

from {product: "apple", price: 100, quantity: 2},
{product: "banana", price: 250, quantity: 1},
{product: "carrot", price: 175, quantity: 3},
{product: "apple", price: 120, quantity: 2},
{product: "banana", price: 225, quantity: 1}
summarize total_revenue = sum(price * quantity), avg_price = mean(price), total_quantity = sum(quantity)
{
total_revenue: 1440,
avg_price: 174.0,
total_quantity: 9
}

Find extreme values with fnmin and fnmax:

from {sensor: "A", temperature: 72, timestamp: 2024-01-15T10:00:00},
{sensor: "B", temperature: 68, timestamp: 2024-01-15T10:05:00},
{sensor: "A", temperature: 75, timestamp: 2024-01-15T10:10:00},
{sensor: "B", temperature: 82, timestamp: 2024-01-15T10:15:00},
{sensor: "A", temperature: 71, timestamp: 2024-01-15T10:20:00}
summarize min_temp = min(temperature), max_temp = max(temperature), earliest = min(timestamp), latest = max(timestamp)
{
min_temp: 68,
max_temp: 82,
earliest: 2024-01-15T10:00:00.000000,
latest: 2024-01-15T10:20:00.000000
}

Group events by one or more fields to calculate aggregations per group.

Calculate statistics per category:

from {product: "apple", price: 100, category: "fruit"},
{product: "banana", price: 250, category: "fruit"},
{product: "carrot", price: 175, category: "vegetable"},
{product: "lettuce", price: 125, category: "vegetable"},
{product: "orange", price: 225, category: "fruit"}
summarize avg_price = mean(price), item_count = count(), category
{
avg_price: 191.66666666666666,
item_count: 3,
category: "fruit",
}
{
avg_price: 150.0,
item_count: 2,
category: "vegetable",
}

Group by multiple dimensions:

from {user: "alice", action: "login", duration: 45, date: "2024-01-15"},
{user: "bob", action: "login", duration: 38, date: "2024-01-15"},
{user: "alice", action: "view", duration: 12, date: "2024-01-15"},
{user: "alice", action: "login", duration: 52, date: "2024-01-16"},
{user: "bob", action: "edit", duration: 89, date: "2024-01-16"}
summarize avg_duration = mean(duration), action_count = count(), user, action
{
avg_duration: 38.0,
action_count: 1,
user: "bob",
action: "login",
}
{
avg_duration: 89.0,
action_count: 1,
user: "bob",
action: "edit",
}
{
avg_duration: 48.5,
action_count: 2,
user: "alice",
action: "login",
}
{
avg_duration: 12.0,
action_count: 1,
user: "alice",
action: "view",
}

Use every when the aggregation is tied to a wall-clock schedule, such as polling an inventory API every 10 minutes and publishing a current count of endpoint states. This pattern works well for dashboards, reports, cache refreshes, and external API polling where each run describes the current snapshot.

Don’t use every for streaming detections where delayed or out-of-order events can change the answer. Use window for event-time detections.

every 10min {
from {host: "laptop-1", status: "healthy"},
{host: "server-1", status: "healthy"},
{host: "laptop-2", status: "stale"}
summarize status, endpoint_count=count()
snapshot_time = now()
}
head 2
sort status

Each run fetches a fresh snapshot, groups endpoints by current status, and adds the run time. The from block stands in for an inventory API response; in a long-running pipeline, replace it with from_http, remove head, and publish the compact summary to a channel or destination.

The wall-clock schedule is not an event-time window. If the pipeline starts at 09:00, every runs the subpipeline for [09:00, 09:10), then restarts it for [09:10, 09:20), and so on. The boundaries depend on when the pipeline starts, not on timestamps in the fetched data.

Use window when you need bounded event-time aggregations on a stream. The operator creates one subpipeline per fixed time range. Inside that subpipeline, use summarize to compute counts, distinct values, and statistics for the events in that window.

Unlike summarize with options={frequency: ...}, window assigns events by event time. This lets you tolerate out-of-order data with tolerance and group entities inside each event-time window. Put group outside window only when each key needs its own event-time clock.

The following streaming detection uses OCSF Authentication events. It groups failed logons by user and source IP address, counts failures in 10-minute event-time windows, and emits an alert when a pair exceeds the threshold. The sample records show only the OCSF fields used by the detection:

from {time: 2024-01-01T10:00:00, activity_id: 1, status_id: 2, user: {name: "alice"}, src_endpoint: {ip: 10.0.0.5}, dst_endpoint: {hostname: "vpn-1"}},
{time: 2024-01-01T10:01:00, activity_id: 1, status_id: 2, user: {name: "alice"}, src_endpoint: {ip: 10.0.0.5}, dst_endpoint: {hostname: "vpn-2"}},
{time: 2024-01-01T10:02:00, activity_id: 1, status_id: 2, user: {name: "alice"}, src_endpoint: {ip: 10.0.0.5}, dst_endpoint: {hostname: "vpn-1"}},
{time: 2024-01-01T10:03:00, activity_id: 1, status_id: 2, user: {name: "alice"}, src_endpoint: {ip: 10.0.0.5}, dst_endpoint: {hostname: "vpn-2"}},
{time: 2024-01-01T10:04:00, activity_id: 1, status_id: 2, user: {name: "alice"}, src_endpoint: {ip: 10.0.0.5}, dst_endpoint: {hostname: "vpn-3"}},
{time: 2024-01-01T10:05:00, activity_id: 1, status_id: 2, user: {name: "alice"}, src_endpoint: {ip: 10.0.0.5}, dst_endpoint: {hostname: "vpn-3"}},
{time: 2024-01-01T10:06:00, activity_id: 1, status_id: 2, user: {name: "bob"}, src_endpoint: {ip: 10.0.0.8}, dst_endpoint: {hostname: "vpn-1"}}
where activity_id == 1 and status_id == 2
window size=10min, on=time, tolerance=30s, idle_timeout=5min {
group {user: user.name, src_ip: src_endpoint.ip} {
summarize failures=count(), target_hosts=count_distinct(dst_endpoint.hostname)
user = $group.user
src_ip = $group.src_ip
start = $window.start
end = $window.end
}
}
where failures >= 5
select user, src_ip, failures, target_hosts, start, end
{
user: "alice",
src_ip: 10.0.0.5,
failures: 6,
target_hosts: 3,
start: 2024-01-01T10:00:00Z,
end: 2024-01-01T10:10:00Z,
}

In OCSF Authentication events, activity_id: 1 means logon and status_id: 2 means failure. The outer group gives every user and source IP pair its own event-time clock. The idle_timeout closes sparse keys after wall-clock inactivity so live detections don’t wait indefinitely for the next event from the same key.

This example uses tumbling windows because window omits every. With size=10min, the intervals are [10:00, 10:10), [10:10, 10:20), and so on for each grouped user and source IP pair. An event exactly at 10:10 belongs to the second window because window ends are exclusive.

You can combine window with statistical aggregations to detect SMB traffic spikes. The next example is a lightweight TQL adaptation of Splunk’s SMB Traffic Spike analytic. It starts with pre-aggregated OCSF SMB Activity buckets to keep the detection readable and interleaves the sources by timestamp to match event-time order. The sample records show only the OCSF fields used by the detection.

from {time: 2024-01-01T00:00:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 10}},
{time: 2024-01-01T00:00:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 120}},
{time: 2024-01-01T00:10:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 11}},
{time: 2024-01-01T00:10:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 118}},
{time: 2024-01-01T00:20:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 9}},
{time: 2024-01-01T00:20:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 121}},
{time: 2024-01-01T00:30:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 10}},
{time: 2024-01-01T00:30:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 119}},
{time: 2024-01-01T00:40:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 10}},
{time: 2024-01-01T00:40:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 122}},
{time: 2024-01-01T00:50:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 10}},
{time: 2024-01-01T00:50:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 120}},
{time: 2024-01-01T01:00:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 80}},
{time: 2024-01-01T01:00:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 123}}
window size=70min, every=10min, on=time {
summarize src=src_endpoint.hostname,
samples=count(),
avg_bytes=mean(traffic.bytes),
stdev_bytes=stddev(traffic.bytes),
current_bytes=max(traffic.bytes)
upper_bound = avg_bytes + stdev_bytes * 2
where samples >= 5 and current_bytes > upper_bound
start = $window.start
end = $window.end
}
select src, current_bytes, upper_bound, start, end
sort start
{
src: "workstation-7",
current_bytes: 80,
upper_bound: 69.00145770426485,
start: 2024-01-01T00:00:00Z,
end: 2024-01-01T01:10:00Z,
}
{
src: "workstation-7",
current_bytes: 80,
upper_bound: 73.85436210874914,
start: 2024-01-01T00:10:00Z,
end: 2024-01-01T01:20:00Z,
}

The workstation-7 bucket with 80 bytes exceeds the mean plus two standard deviations in two 70-minute windows, while the steady file-server traffic stays below its bound. The simplified detector uses the maximum bucket in each window as the candidate spike.

This example uses a hopping window: size=70min defines the lookback and every=10min moves the window forward every 10 minutes. Around midnight, Tenzir evaluates intervals such as [00:00, 01:10), [00:10, 01:20), and [00:20, 01:30). A 10-minute traffic bucket can participate in multiple open windows, which gives rolling context but can also produce repeated alerts for the same spike unless you add downstream suppression.

Compare TQL with KQL, SPL, and Cribl Stream

Section titled “Compare TQL with KQL, SPL, and Cribl Stream”

The following sections compare aggregation and windowing capabilities across TQL, Microsoft KQL, Splunk SPL, and Cribl Stream. They cover Tenzir’s summarize, every, group, and window operators. Cribl Stream entries refer to pipeline Functions, especially Aggregations, not Cribl Search.

Complete-input aggregation waits for the full input and then emits final aggregate results. Use it for batch summaries where intermediate updates and event-time boundaries do not matter.

SystemSupportNotes
TQLsummarize consumes the input by default.
KQLUse summarize.
SPLUse stats.
Cribl Stream⚠️Aggregations are time-windowed.

Grouped aggregation computes separate aggregate results for each key, such as one count per user or host. The grouping key partitions aggregate state, but it does not add time boundaries unless you combine it with a windowing mechanism.

SystemSupportNotes
TQLUse summarize with grouping fields.
KQLUse summarize ... by key.
SPLUse stats ... BY key.
Cribl Stream⚠️Use time-windowed Group by fields.

Periodic processing-time aggregation emits results on a wall-clock or processing-time cadence. Use it for live dashboards and operational metrics when arrival time matters more than event timestamps.

SystemSupportNotes
TQLUse summarize options={frequency: ...} or every.
KQL⚠️Use scheduled or continuous query patterns.
SPL⚠️Use real-time or scheduled searches.
Cribl Stream⚠️Use Scheduled Collectors; Aggregations use event-time buckets.

Periodic running stats emit accumulated values repeatedly instead of waiting for the stream to end. They show the total so far, not a bounded rolling lookback.

SystemSupportNotes
TQLUse summarize options={frequency: ..., mode: "cumulative"}.
KQL⚠️Use scheduled or continuous query patterns.
SPL⚠️Use streamstats or scheduled searches.
Cribl Stream⚠️Enable Cumulative aggregations in Aggregations.

Wall-clock subpipeline batches rerun a block of work on a fixed schedule. This is useful when the work fetches or refreshes data, rather than grouping existing events by event time.

SystemSupportNotes
TQLUse every to rerun an arbitrary block.
KQLNo direct query-language block equivalent.
SPLNo direct query-language block equivalent.
Cribl Stream⚠️Scheduled Collectors schedule collection work.

Tumbling event-time windows split events into adjacent, non-overlapping time ranges based on event timestamps. Each event belongs to one window, and each window produces its own result.

SystemSupportNotes
TQLUse window size=1h, on=ts.
KQLUse summarize ... by bin(ts, 1h).
SPLUse timechart span=1h or bin _time.
Cribl StreamUse Aggregations Time window.

Hopping event-time windows use a window size that is larger than the step between window starts. This gives rolling context with repeated overlapping windows, so one event can contribute to several results.

SystemSupportNotes
TQLUse window size=1h, every=5m, on=ts.
KQL⚠️Use manual expansion or plugins for specific metrics.
SPL⚠️Use streamstats workarounds or manual bucketing.
Cribl StreamNo hopping window option.

Per-input-event rolling stats update a trailing calculation for each arriving event. This is different from periodic emission because the output cadence follows events, not a timer.

SystemSupportNotes
TQLNot native; summarize frequency is periodic.
KQL⚠️Use manual expansion or row-window functions.
SPLUse streamstats time_window=....
Cribl StreamAggregations emit window events.

Per-key event-time windows keep separate windowed results for each entity. Events from one noisy key cannot advance or close windows for another key.

SystemSupportNotes
TQLCompose group with window.
KQLUse summarize ... by key, bin(ts, ...).
SPLUse BY key or split-by.
Cribl StreamUse Group by fields with Time window.

Streaming late-event tolerance controls how long a window remains open for out-of-order events. It lets a pipeline accept delayed data without holding every window until the input ends.

SystemSupportNotes
TQLUse window tolerance=....
KQLMostly batch or query-time semantics.
SPL⚠️Depends on search order.
Cribl StreamUse Aggregations Lag tolerance.

Independent per-key clocks let each group advance event time based only on its own events. This prevents a busy key from making sparse keys late, but sparse windows may also close later.

SystemSupportNotes
TQLPut window inside an outer group.
KQLNo streaming clock per key in ad hoc KQL.
SPL⚠️Requires search-order dependent workarounds.
Cribl Stream⚠️Grouped buckets exist, but no documented per-key clock.

An arbitrary per-key subpipeline can do more than compute a fixed aggregate list. It can reshape, filter, enrich, or sink each group’s stream independently.

SystemSupportNotes
TQLUse group key { ... }.
KQLNo direct equivalent.
SPLNo direct equivalent.
Cribl StreamNo grouped Function block.

An arbitrary per-window subpipeline runs custom logic inside each event-time range. It covers detections that need more than a fixed aggregate list, such as post-aggregation filtering or alert formatting.

SystemSupportNotes
TQLUse window { ... }.
KQLNo direct equivalent.
SPLNo direct equivalent.
Cribl StreamAggregations only emit configured outputs.

A sink per key or window sends results from inside the grouped or windowed scope. This supports local alerting or routing decisions without merging all results back into a shared stream first.

SystemSupportNotes
TQLgroup and window blocks can end in sinks.
KQLNo direct query-language equivalent.
SPLNo direct query-language equivalent.
Cribl Stream⚠️Route emitted aggregate events downstream.

Empty window padding emits rows for time ranges that have no events. This is useful for charts and time-series joins that need an explicit zero or null for missing intervals.

SystemSupportNotes
TQLEmpty windows are not emitted.
KQL⚠️Use manual range and union padding.
SPLUse timechart cont=true.
Cribl StreamNo documented empty buckets.

Session windows group events by gaps in activity rather than fixed boundaries. They are useful for user sessions, transactions, and bursts where the end time depends on silence.

SystemSupportNotes
TQLRequires precomputed session IDs or custom stateful logic.
KQLUse row_window_session().
SPLUse transaction maxpause and maxspan.
Cribl StreamNot native.

Custom alignment or calendar windows anchor boundaries to a chosen origin or calendar unit. This matters for reports that must align to local business days, weeks, months, or other non-epoch boundaries.

SystemSupportNotes
TQL⚠️Fixed window buckets are epoch-anchored; no custom origin or calendar alignment.
KQLUse bin_at().
SPLUse aligntime and calendar spans.
Cribl Stream⚠️Fixed Time window buckets; no documented custom origin or calendar alignment.

Legend: ✅ native or covered, ⚠️ partial or workaround, ❌ missing.

Cribl Stream’s Aggregations Function runs independently per Worker Process, so aggregate state is not shared globally across workers. Scheduled Collector jobs can run on a recurrence, but they schedule collection work rather than provide a general pipeline Function block.

Use statistical aggregation functions for deeper analysis.

Calculate distribution statistics with fnquantile:

from {endpoint: "/api/users", latency: 120},
{endpoint: "/api/users", latency: 135},
{endpoint: "/api/users", latency: 110},
{endpoint: "/api/orders", latency: 245},
{endpoint: "/api/orders", latency: 225},
{endpoint: "/api/orders", latency: 280}
summarize p50 = quantile(latency, q=0.5),
p90 = quantile(latency, q=0.9),
p95 = quantile(latency, q=0.95),
endpoint
{
p50: 245.0,
p90: 280.0,
p95: 280.0,
endpoint: "/api/orders",
}
{
p50: 120.0,
p90: 135.0,
p95: 135.0,
endpoint: "/api/users",
}

Measure data spread with fnstddev and fnvariance:

from {server: "web1", cpu: 45},
{server: "web1", cpu: 52},
{server: "web1", cpu: 48},
{server: "web2", cpu: 85},
{server: "web2", cpu: 92},
{server: "web2", cpu: 88}
summarize avg_cpu = mean(cpu),
cpu_stddev = stddev(cpu),
cpu_variance = variance(cpu),
server
{
avg_cpu: 48.333333333333336,
cpu_stddev: 2.8674417556808622,
cpu_variance: 8.222222222222145,
server: "web1",
}
{
avg_cpu: 88.33333333333333,
cpu_stddev: 2.8674417556810217,
cpu_variance: 8.22222222222306,
server: "web2",
}

Find most common values and collect unique values with fnmode, fndistinct, and fncount_if:

from {user: "alice", browser: "chrome", action: "login"},
{user: "bob", browser: "firefox", action: "view"},
{user: "alice", browser: "chrome", action: "edit"},
{user: "charlie", browser: "chrome", action: "login"},
{user: "alice", browser: "safari", action: "login"}
summarize most_common_browser = mode(browser),
unique_browsers = distinct(browser),
login_count = count_if(action, x => x == "login")
{
most_common_browser: "chrome",
unique_browsers: [
"chrome",
"firefox",
"safari",
],
login_count: 3,
}

Analyze value distributions with fnvalue_counts and fnentropy:

from {category: "A", value: 10},
{category: "B", value: 20},
{category: "A", value: 15},
{category: "B", value: 25},
{category: "C", value: 30}
summarize frequencies = value_counts(category),
info_entropy = entropy(category)
{
frequencies: [
{
value: "A",
count: 2,
},
{
value: "B",
count: 2,
},
{
value: "C",
count: 1,
},
],
info_entropy: 1.0549201679861442,
}

Use fncollect and fndistinct to gather values:

from {user: "alice", action: "login", timestamp: 2024-01-15T10:00:00},
{user: "bob", action: "view", timestamp: 2024-01-15T10:01:00},
{user: "alice", action: "edit", timestamp: 2024-01-15T10:02:00},
{user: "charlie", action: "login", timestamp: 2024-01-15T10:03:00},
{user: "alice", action: "logout", timestamp: 2024-01-15T10:04:00}
summarize all_actions = collect(action),
unique_users = distinct(user),
event_count = count()
{
all_actions: [
"login",
"view",
"edit",
"login",
"logout",
],
unique_users: [
"alice",
"charlie",
"bob",
],
event_count: 5,
}

Get boundary values with fnfirst and fnlast:

from {sensor: "temp1", reading: 72, time: 2024-01-15T09:00:00},
{sensor: "temp1", reading: 75, time: 2024-01-15T10:00:00},
{sensor: "temp1", reading: 78, time: 2024-01-15T11:00:00},
{sensor: "temp2", reading: 68, time: 2024-01-15T09:00:00},
{sensor: "temp2", reading: 71, time: 2024-01-15T10:00:00}
summarize first_reading = first(reading),
last_reading = last(reading),
avg_reading = mean(reading),
sensor
{
first_reading: 72,
last_reading: 78,
avg_reading: 75.0,
sensor: "temp1",
}
{
first_reading: 68,
last_reading: 71,
avg_reading: 69.5,
sensor: "temp2",
}

Use fncollect with grouping to build hierarchical structures:

from {dept: "Engineering", team: "Backend", member: "Alice"},
{dept: "Engineering", team: "Backend", member: "Bob"},
{dept: "Engineering", team: "Frontend", member: "Charlie"},
{dept: "Sales", team: "Direct", member: "David"}
summarize dept, team, members=collect(member)
{
dept: "Sales",
team: "Direct",
members: [
"David",
],
}
{
dept: "Engineering",
team: "Frontend",
members: [
"Charlie",
],
}
{
dept: "Engineering",
team: "Backend",
members: [
"Alice",
"Bob",
],
}

Use fnall and fnany for boolean checks:

from {test: "unit", passed: true, duration: 45},
{test: "integration", passed: true, duration: 120},
{test: "e2e", passed: false, duration: 300},
{test: "performance", passed: true, duration: 180}
summarize all_passed = all(passed),
any_failed = any(not passed),
total_duration = sum(duration)
{
all_passed: false,
any_failed: true,
total_duration: 645
}

Collect values conditionally during aggregation:

from {src_ip: 10.0.0.5, dst_port: 22, bytes: 1024},
{src_ip: 192.168.1.10, dst_port: 80, bytes: 2048},
{src_ip: 10.0.0.5, dst_port: 443, bytes: 4096},
{src_ip: 192.168.1.10, dst_port: 22, bytes: 512}
let $critical_ports = [22, 3389, 5985]
summarize src_ip,
total_bytes=sum(bytes),
// Collect all unique ports
all_ports=collect(dst_port),
// Collect with conditional transformation
port_types=collect("HIGH" if dst_port in $critical_ports else "LOW")
{src_ip: 192.168.1.10, total_bytes: 2560, all_ports: [80, 22], port_types: ["LOW", "HIGH"]}
{src_ip: 10.0.0.5, total_bytes: 5120, all_ports: [22, 443], port_types: ["HIGH", "LOW"]}

This pattern enables:

  • Building risk profiles during aggregation
  • Transforming values during collection based on conditions
  • Creating categorical metrics from raw data
from {
requests: [
{endpoint: "/api/users", method: "GET", duration: 45, status: 200},
{endpoint: "/api/users", method: "POST", duration: 120, status: 201},
{endpoint: "/api/orders", method: "GET", duration: 89, status: 200},
{endpoint: "/api/users", method: "GET", duration: 38, status: 200},
{endpoint: "/api/orders", method: "GET", duration: 156, status: 500}
]
}
unroll requests
summarize endpoint=requests.endpoint,
count=count(),
avg_duration=mean(requests.duration)
{
endpoint: "/api/orders",
count: 2,
avg_duration: 122.5,
}
{
endpoint: "/api/users",
count: 3,
avg_duration: 67.66666666666667,
}
from {
sales: [
{date: "2024-01-01", amount: 1200, region: "North"},
{date: "2024-01-01", amount: 800, region: "South"},
{date: "2024-01-02", amount: 1500, region: "North"},
{date: "2024-01-02", amount: 950, region: "South"},
{date: "2024-01-03", amount: 1100, region: "North"}
]
}
// Calculate totals by date
unroll sales
summarize date=sales.date, total=sum(sales.amount)
{date: "2024-01-01", total: 2000}
{date: "2024-01-02", total: 2450}
{date: "2024-01-03", total: 1100}
from {
metrics: [
{timestamp: "10:00", cpu: 45, memory: 62, disk: 78},
{timestamp: "10:01", cpu: 52, memory: 64, disk: 78},
{timestamp: "10:02", cpu: 89, memory: 71, disk: 79},
{timestamp: "10:03", cpu: 67, memory: 68, disk: 79},
{timestamp: "10:04", cpu: 48, memory: 65, disk: 80}
]
}
cpu_alert = metrics.map(m => m.cpu > 80).any()
avg_memory = metrics.map(m => m.memory).mean()
disk_trend = metrics.last().disk - metrics.first().disk
health_summary = {
cpu_max: metrics.map(m => m.cpu).max(),
memory_avg: avg_memory,
disk_growth: disk_trend,
critical: cpu_alert
}
{
metrics: [ ... ],
cpu_alert: true,
avg_memory: 66.0,
disk_trend: 2,
health_summary: {
cpu_max: 89,
memory_avg: 66.0,
disk_growth: 2,
critical: true,
},
}

Combine multiple aggregation functions for comprehensive analysis:

from {method: "GET", endpoint: "/api/users", status: 200, duration: 45},
{method: "POST", endpoint: "/api/users", status: 201, duration: 120},
{method: "GET", endpoint: "/api/orders", status: 200, duration: 89},
{method: "GET", endpoint: "/api/users", status: 200, duration: 38},
{method: "GET", endpoint: "/api/orders", status: 500, duration: 156},
{method: "DELETE", endpoint: "/api/users/123", status: 204, duration: 67}
summarize request_count = count(),
avg_duration = mean(duration),
error_count = count_if(status, s => s >= 400),
unique_endpoints = count_distinct(endpoint),
method
error_rate = error_count / request_count
{
request_count: 1,
avg_duration: 120.0,
error_count: 0,
unique_endpoints: 1,
method: "POST",
error_rate: 0.0,
}
{
request_count: 1,
avg_duration: 67.0,
error_count: 0,
unique_endpoints: 1,
method: "DELETE",
error_rate: 0.0,
}
{
request_count: 4,
avg_duration: 82.0,
error_count: 1,
unique_endpoints: 2,
method: "GET",
error_rate: 0.25,
}
  1. Choose appropriate functions: Use fnmean for averages, fnmedian for skewed data
  2. Handle empty collections: Check if lists are empty before aggregating
  3. Consider memory usage: Large collections can consume significant memory
  4. Combine aggregations: Calculate multiple statistics in one pass for efficiency

Last updated: