This guide shows you how to aggregate event streams with summarize and
window. You’ll learn to count, group, compute statistics, and build
bounded event-time detections over streaming data.
Understanding the summarize operator
Section titled “Understanding the summarize operator”The summarize operator groups events and applies aggregation functions. Its
syntax is:
summarize <aggregation>, <aggregation>, ..., <group>, <group>, ...Where:
Basic aggregations
Section titled “Basic aggregations”Start with fundamental aggregation functions on event streams.
Count events
Section titled “Count events”Count total events and unique values with
count and
count_distinct:
from {product: "apple", price: 100, category: "fruit"}, {product: "banana", price: 250, category: "fruit"}, {product: "carrot", price: 175, category: "vegetable"}, {product: "apple", price: 120, category: "fruit"}, {product: "banana", price: 225, category: "fruit"}summarize total_count = count(), unique_products = count_distinct(product){ total_count: 5, unique_products: 3}Sum and average
Section titled “Sum and average”Calculate totals and averages:
from {product: "apple", price: 100, quantity: 2}, {product: "banana", price: 250, quantity: 1}, {product: "carrot", price: 175, quantity: 3}, {product: "apple", price: 120, quantity: 2}, {product: "banana", price: 225, quantity: 1}summarize total_revenue = sum(price * quantity), avg_price = mean(price), total_quantity = sum(quantity){ total_revenue: 1440, avg_price: 174.0, total_quantity: 9}Min and max
Section titled “Min and max”Find extreme values with min and max:
from {sensor: "A", temperature: 72, timestamp: 2024-01-15T10:00:00}, {sensor: "B", temperature: 68, timestamp: 2024-01-15T10:05:00}, {sensor: "A", temperature: 75, timestamp: 2024-01-15T10:10:00}, {sensor: "B", temperature: 82, timestamp: 2024-01-15T10:15:00}, {sensor: "A", temperature: 71, timestamp: 2024-01-15T10:20:00}summarize min_temp = min(temperature), max_temp = max(temperature), earliest = min(timestamp), latest = max(timestamp){ min_temp: 68, max_temp: 82, earliest: 2024-01-15T10:00:00.000000, latest: 2024-01-15T10:20:00.000000}Grouping data
Section titled “Grouping data”Group events by one or more fields to calculate aggregations per group.
Group by single field
Section titled “Group by single field”Calculate statistics per category:
from {product: "apple", price: 100, category: "fruit"}, {product: "banana", price: 250, category: "fruit"}, {product: "carrot", price: 175, category: "vegetable"}, {product: "lettuce", price: 125, category: "vegetable"}, {product: "orange", price: 225, category: "fruit"}summarize avg_price = mean(price), item_count = count(), category{ avg_price: 191.66666666666666, item_count: 3, category: "fruit",}{ avg_price: 150.0, item_count: 2, category: "vegetable",}Group by multiple fields
Section titled “Group by multiple fields”Group by multiple dimensions:
from {user: "alice", action: "login", duration: 45, date: "2024-01-15"}, {user: "bob", action: "login", duration: 38, date: "2024-01-15"}, {user: "alice", action: "view", duration: 12, date: "2024-01-15"}, {user: "alice", action: "login", duration: 52, date: "2024-01-16"}, {user: "bob", action: "edit", duration: 89, date: "2024-01-16"}summarize avg_duration = mean(duration), action_count = count(), user, action{ avg_duration: 38.0, action_count: 1, user: "bob", action: "login",}{ avg_duration: 89.0, action_count: 1, user: "bob", action: "edit",}{ avg_duration: 48.5, action_count: 2, user: "alice", action: "login",}{ avg_duration: 12.0, action_count: 1, user: "alice", action: "view",}Aggregate periodic snapshots
Section titled “Aggregate periodic snapshots”Use every when the aggregation is tied to a wall-clock schedule, such
as polling an inventory API every 10 minutes and publishing a current count of
endpoint states. This pattern works well for dashboards, reports, cache
refreshes, and external API polling where each run describes the current
snapshot.
Don’t use every for streaming detections where delayed or out-of-order
events can change the answer. Use window for event-time detections.
every 10min { from {host: "laptop-1", status: "healthy"}, {host: "server-1", status: "healthy"}, {host: "laptop-2", status: "stale"} summarize status, endpoint_count=count() snapshot_time = now()}head 2sort statusEach run fetches a fresh snapshot, groups endpoints by current status, and adds
the run time. The from block stands in for an inventory API response; in a
long-running pipeline, replace it with from_http, remove
head, and publish the compact summary to a channel or destination.
The wall-clock schedule is not an event-time window. If the pipeline starts at
09:00, every runs the subpipeline for [09:00, 09:10), then restarts
it for [09:10, 09:20), and so on. The boundaries depend on when the pipeline
starts, not on timestamps in the fetched data.
Window event streams
Section titled “Window event streams”Use window when you need bounded event-time aggregations on a stream.
The operator creates one subpipeline per fixed time range. Inside that
subpipeline, use summarize to compute counts, distinct values, and
statistics for the events in that window.
Unlike summarize with options={frequency: ...}, window assigns
events by event time. This lets you tolerate out-of-order data with tolerance
and group entities inside each event-time window. Put group outside
window only when each key needs its own event-time clock.
Detect brute-force login attempts
Section titled “Detect brute-force login attempts”The following streaming detection uses OCSF Authentication events. It groups failed logons by user and source IP address, counts failures in 10-minute event-time windows, and emits an alert when a pair exceeds the threshold. The sample records show only the OCSF fields used by the detection:
from {time: 2024-01-01T10:00:00, activity_id: 1, status_id: 2, user: {name: "alice"}, src_endpoint: {ip: 10.0.0.5}, dst_endpoint: {hostname: "vpn-1"}}, {time: 2024-01-01T10:01:00, activity_id: 1, status_id: 2, user: {name: "alice"}, src_endpoint: {ip: 10.0.0.5}, dst_endpoint: {hostname: "vpn-2"}}, {time: 2024-01-01T10:02:00, activity_id: 1, status_id: 2, user: {name: "alice"}, src_endpoint: {ip: 10.0.0.5}, dst_endpoint: {hostname: "vpn-1"}}, {time: 2024-01-01T10:03:00, activity_id: 1, status_id: 2, user: {name: "alice"}, src_endpoint: {ip: 10.0.0.5}, dst_endpoint: {hostname: "vpn-2"}}, {time: 2024-01-01T10:04:00, activity_id: 1, status_id: 2, user: {name: "alice"}, src_endpoint: {ip: 10.0.0.5}, dst_endpoint: {hostname: "vpn-3"}}, {time: 2024-01-01T10:05:00, activity_id: 1, status_id: 2, user: {name: "alice"}, src_endpoint: {ip: 10.0.0.5}, dst_endpoint: {hostname: "vpn-3"}}, {time: 2024-01-01T10:06:00, activity_id: 1, status_id: 2, user: {name: "bob"}, src_endpoint: {ip: 10.0.0.8}, dst_endpoint: {hostname: "vpn-1"}}where activity_id == 1 and status_id == 2window size=10min, on=time, tolerance=30s, idle_timeout=5min { group {user: user.name, src_ip: src_endpoint.ip} { summarize failures=count(), target_hosts=count_distinct(dst_endpoint.hostname) user = $group.user src_ip = $group.src_ip start = $window.start end = $window.end }}where failures >= 5select user, src_ip, failures, target_hosts, start, end{ user: "alice", src_ip: 10.0.0.5, failures: 6, target_hosts: 3, start: 2024-01-01T10:00:00Z, end: 2024-01-01T10:10:00Z,}In OCSF Authentication events, activity_id: 1 means logon and status_id: 2
means failure. The outer group gives every user and source IP pair its
own event-time clock. The idle_timeout closes sparse keys after wall-clock
inactivity so live detections don’t wait indefinitely for the next event from
the same key.
This example uses tumbling windows because window omits every. With
size=10min, the intervals are [10:00, 10:10), [10:10, 10:20), and so on
for each grouped user and source IP pair. An event exactly at 10:10 belongs to
the second window because window ends are exclusive.
Detect SMB traffic spikes
Section titled “Detect SMB traffic spikes”You can combine window with statistical aggregations to detect SMB
traffic spikes. The next example is a lightweight TQL adaptation of Splunk’s
SMB Traffic Spike
analytic. It starts with pre-aggregated OCSF SMB Activity buckets to keep the
detection readable and interleaves the sources by timestamp to match event-time
order. The sample records show only the OCSF fields used by the detection.
from {time: 2024-01-01T00:00:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 10}}, {time: 2024-01-01T00:00:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 120}}, {time: 2024-01-01T00:10:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 11}}, {time: 2024-01-01T00:10:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 118}}, {time: 2024-01-01T00:20:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 9}}, {time: 2024-01-01T00:20:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 121}}, {time: 2024-01-01T00:30:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 10}}, {time: 2024-01-01T00:30:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 119}}, {time: 2024-01-01T00:40:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 10}}, {time: 2024-01-01T00:40:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 122}}, {time: 2024-01-01T00:50:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 10}}, {time: 2024-01-01T00:50:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 120}}, {time: 2024-01-01T01:00:00, src_endpoint: {hostname: "workstation-7"}, traffic: {bytes: 80}}, {time: 2024-01-01T01:00:00, src_endpoint: {hostname: "file-server-1"}, traffic: {bytes: 123}}window size=70min, every=10min, on=time { summarize src=src_endpoint.hostname, samples=count(), avg_bytes=mean(traffic.bytes), stdev_bytes=stddev(traffic.bytes), current_bytes=max(traffic.bytes) upper_bound = avg_bytes + stdev_bytes * 2 where samples >= 5 and current_bytes > upper_bound start = $window.start end = $window.end}select src, current_bytes, upper_bound, start, endsort start{ src: "workstation-7", current_bytes: 80, upper_bound: 69.00145770426485, start: 2024-01-01T00:00:00Z, end: 2024-01-01T01:10:00Z,}{ src: "workstation-7", current_bytes: 80, upper_bound: 73.85436210874914, start: 2024-01-01T00:10:00Z, end: 2024-01-01T01:20:00Z,}The workstation-7 bucket with 80 bytes exceeds the mean plus two standard
deviations in two 70-minute windows, while the steady file-server traffic stays
below its bound. The simplified detector uses the maximum bucket in each window
as the candidate spike.
This example uses a hopping window: size=70min defines the lookback and
every=10min moves the window forward every 10 minutes. Around midnight, Tenzir
evaluates intervals such as [00:00, 01:10), [00:10, 01:20), and
[00:20, 01:30). A 10-minute traffic bucket can participate in multiple open
windows, which gives rolling context but can also produce repeated alerts for
the same spike unless you add downstream suppression.
Compare TQL with KQL, SPL, and Cribl Stream
Section titled “Compare TQL with KQL, SPL, and Cribl Stream”The following sections compare aggregation and windowing capabilities across
TQL, Microsoft KQL, Splunk SPL, and Cribl Stream. They cover Tenzir’s
summarize, every, group, and window
operators. Cribl Stream entries refer to pipeline Functions, especially
Aggregations, not Cribl
Search.
Complete-input aggregation
Section titled “Complete-input aggregation”Complete-input aggregation waits for the full input and then emits final aggregate results. Use it for batch summaries where intermediate updates and event-time boundaries do not matter.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | summarize consumes the input by default. |
| KQL | ✅ | Use summarize. |
| SPL | ✅ | Use stats. |
| Cribl Stream | ⚠️ | Aggregations are time-windowed. |
Grouped aggregation
Section titled “Grouped aggregation”Grouped aggregation computes separate aggregate results for each key, such as one count per user or host. The grouping key partitions aggregate state, but it does not add time boundaries unless you combine it with a windowing mechanism.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | Use summarize with grouping fields. |
| KQL | ✅ | Use summarize ... by key. |
| SPL | ✅ | Use stats ... BY key. |
| Cribl Stream | ⚠️ | Use time-windowed Group by fields. |
Periodic processing-time aggregation
Section titled “Periodic processing-time aggregation”Periodic processing-time aggregation emits results on a wall-clock or processing-time cadence. Use it for live dashboards and operational metrics when arrival time matters more than event timestamps.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | Use summarize options={frequency: ...} or every. |
| KQL | ⚠️ | Use scheduled or continuous query patterns. |
| SPL | ⚠️ | Use real-time or scheduled searches. |
| Cribl Stream | ⚠️ | Use Scheduled Collectors; Aggregations use event-time buckets. |
Periodic running stats
Section titled “Periodic running stats”Periodic running stats emit accumulated values repeatedly instead of waiting for the stream to end. They show the total so far, not a bounded rolling lookback.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | Use summarize options={frequency: ..., mode: "cumulative"}. |
| KQL | ⚠️ | Use scheduled or continuous query patterns. |
| SPL | ⚠️ | Use streamstats or scheduled searches. |
| Cribl Stream | ⚠️ | Enable Cumulative aggregations in Aggregations. |
Wall-clock subpipeline batches
Section titled “Wall-clock subpipeline batches”Wall-clock subpipeline batches rerun a block of work on a fixed schedule. This is useful when the work fetches or refreshes data, rather than grouping existing events by event time.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | Use every to rerun an arbitrary block. |
| KQL | ❌ | No direct query-language block equivalent. |
| SPL | ❌ | No direct query-language block equivalent. |
| Cribl Stream | ⚠️ | Scheduled Collectors schedule collection work. |
Tumbling event-time windows
Section titled “Tumbling event-time windows”Tumbling event-time windows split events into adjacent, non-overlapping time ranges based on event timestamps. Each event belongs to one window, and each window produces its own result.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | Use window size=1h, on=ts. |
| KQL | ✅ | Use summarize ... by bin(ts, 1h). |
| SPL | ✅ | Use timechart span=1h or bin _time. |
| Cribl Stream | ✅ | Use Aggregations Time window. |
Hopping event-time windows
Section titled “Hopping event-time windows”Hopping event-time windows use a window size that is larger than the step between window starts. This gives rolling context with repeated overlapping windows, so one event can contribute to several results.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | Use window size=1h, every=5m, on=ts. |
| KQL | ⚠️ | Use manual expansion or plugins for specific metrics. |
| SPL | ⚠️ | Use streamstats workarounds or manual bucketing. |
| Cribl Stream | ❌ | No hopping window option. |
Per-input-event rolling stats
Section titled “Per-input-event rolling stats”Per-input-event rolling stats update a trailing calculation for each arriving event. This is different from periodic emission because the output cadence follows events, not a timer.
| System | Support | Notes |
|---|---|---|
| TQL | ❌ | Not native; summarize frequency is periodic. |
| KQL | ⚠️ | Use manual expansion or row-window functions. |
| SPL | ✅ | Use streamstats time_window=.... |
| Cribl Stream | ❌ | Aggregations emit window events. |
Per-key event-time windows
Section titled “Per-key event-time windows”Per-key event-time windows keep separate windowed results for each entity. Events from one noisy key cannot advance or close windows for another key.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | Compose group with window. |
| KQL | ✅ | Use summarize ... by key, bin(ts, ...). |
| SPL | ✅ | Use BY key or split-by. |
| Cribl Stream | ✅ | Use Group by fields with Time window. |
Streaming late-event tolerance
Section titled “Streaming late-event tolerance”Streaming late-event tolerance controls how long a window remains open for out-of-order events. It lets a pipeline accept delayed data without holding every window until the input ends.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | Use window tolerance=.... |
| KQL | ❌ | Mostly batch or query-time semantics. |
| SPL | ⚠️ | Depends on search order. |
| Cribl Stream | ✅ | Use Aggregations Lag tolerance. |
Independent per-key event-time clocks
Section titled “Independent per-key event-time clocks”Independent per-key clocks let each group advance event time based only on its own events. This prevents a busy key from making sparse keys late, but sparse windows may also close later.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | Put window inside an outer group. |
| KQL | ❌ | No streaming clock per key in ad hoc KQL. |
| SPL | ⚠️ | Requires search-order dependent workarounds. |
| Cribl Stream | ⚠️ | Grouped buckets exist, but no documented per-key clock. |
Arbitrary subpipeline per key
Section titled “Arbitrary subpipeline per key”An arbitrary per-key subpipeline can do more than compute a fixed aggregate list. It can reshape, filter, enrich, or sink each group’s stream independently.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | Use group key { ... }. |
| KQL | ❌ | No direct equivalent. |
| SPL | ❌ | No direct equivalent. |
| Cribl Stream | ❌ | No grouped Function block. |
Arbitrary subpipeline per window
Section titled “Arbitrary subpipeline per window”An arbitrary per-window subpipeline runs custom logic inside each event-time range. It covers detections that need more than a fixed aggregate list, such as post-aggregation filtering or alert formatting.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | Use window { ... }. |
| KQL | ❌ | No direct equivalent. |
| SPL | ❌ | No direct equivalent. |
| Cribl Stream | ❌ | Aggregations only emit configured outputs. |
Sink per key or per window
Section titled “Sink per key or per window”A sink per key or window sends results from inside the grouped or windowed scope. This supports local alerting or routing decisions without merging all results back into a shared stream first.
| System | Support | Notes |
|---|---|---|
| TQL | ✅ | group and window blocks can end in sinks. |
| KQL | ❌ | No direct query-language equivalent. |
| SPL | ❌ | No direct query-language equivalent. |
| Cribl Stream | ⚠️ | Route emitted aggregate events downstream. |
Empty window padding
Section titled “Empty window padding”Empty window padding emits rows for time ranges that have no events. This is useful for charts and time-series joins that need an explicit zero or null for missing intervals.
| System | Support | Notes |
|---|---|---|
| TQL | ❌ | Empty windows are not emitted. |
| KQL | ⚠️ | Use manual range and union padding. |
| SPL | ✅ | Use timechart cont=true. |
| Cribl Stream | ❌ | No documented empty buckets. |
Session windows
Section titled “Session windows”Session windows group events by gaps in activity rather than fixed boundaries. They are useful for user sessions, transactions, and bursts where the end time depends on silence.
| System | Support | Notes |
|---|---|---|
| TQL | ❌ | Requires precomputed session IDs or custom stateful logic. |
| KQL | ✅ | Use row_window_session(). |
| SPL | ✅ | Use transaction maxpause and maxspan. |
| Cribl Stream | ❌ | Not native. |
Custom alignment or calendar windows
Section titled “Custom alignment or calendar windows”Custom alignment or calendar windows anchor boundaries to a chosen origin or calendar unit. This matters for reports that must align to local business days, weeks, months, or other non-epoch boundaries.
| System | Support | Notes |
|---|---|---|
| TQL | ⚠️ | Fixed window buckets are epoch-anchored; no custom origin or calendar alignment. |
| KQL | ✅ | Use bin_at(). |
| SPL | ✅ | Use aligntime and calendar spans. |
| Cribl Stream | ⚠️ | Fixed Time window buckets; no documented custom origin or calendar alignment. |
Legend: ✅ native or covered, ⚠️ partial or workaround, ❌ missing.
Cribl Stream’s Aggregations Function runs independently per Worker Process, so aggregate state is not shared globally across workers. Scheduled Collector jobs can run on a recurrence, but they schedule collection work rather than provide a general pipeline Function block.
Statistical functions
Section titled “Statistical functions”Use statistical aggregation functions for deeper analysis.
Percentiles and median
Section titled “Percentiles and median”Calculate distribution statistics with quantile:
from {endpoint: "/api/users", latency: 120}, {endpoint: "/api/users", latency: 135}, {endpoint: "/api/users", latency: 110}, {endpoint: "/api/orders", latency: 245}, {endpoint: "/api/orders", latency: 225}, {endpoint: "/api/orders", latency: 280}summarize p50 = quantile(latency, q=0.5), p90 = quantile(latency, q=0.9), p95 = quantile(latency, q=0.95), endpoint{ p50: 245.0, p90: 280.0, p95: 280.0, endpoint: "/api/orders",}{ p50: 120.0, p90: 135.0, p95: 135.0, endpoint: "/api/users",}Standard deviation and variance
Section titled “Standard deviation and variance”Measure data spread with stddev and
variance:
from {server: "web1", cpu: 45}, {server: "web1", cpu: 52}, {server: "web1", cpu: 48}, {server: "web2", cpu: 85}, {server: "web2", cpu: 92}, {server: "web2", cpu: 88}summarize avg_cpu = mean(cpu), cpu_stddev = stddev(cpu), cpu_variance = variance(cpu), server{ avg_cpu: 48.333333333333336, cpu_stddev: 2.8674417556808622, cpu_variance: 8.222222222222145, server: "web1",}{ avg_cpu: 88.33333333333333, cpu_stddev: 2.8674417556810217, cpu_variance: 8.22222222222306, server: "web2",}Mode and distinct values
Section titled “Mode and distinct values”Find most common values and collect unique values with
mode,
distinct, and
count_if:
from {user: "alice", browser: "chrome", action: "login"}, {user: "bob", browser: "firefox", action: "view"}, {user: "alice", browser: "chrome", action: "edit"}, {user: "charlie", browser: "chrome", action: "login"}, {user: "alice", browser: "safari", action: "login"}summarize most_common_browser = mode(browser), unique_browsers = distinct(browser), login_count = count_if(action, x => x == "login"){ most_common_browser: "chrome", unique_browsers: [ "chrome", "firefox", "safari", ], login_count: 3,}Value frequencies and entropy
Section titled “Value frequencies and entropy”Analyze value distributions with
value_counts
and entropy:
from {category: "A", value: 10}, {category: "B", value: 20}, {category: "A", value: 15}, {category: "B", value: 25}, {category: "C", value: 30}summarize frequencies = value_counts(category), info_entropy = entropy(category){ frequencies: [ { value: "A", count: 2, }, { value: "B", count: 2, }, { value: "C", count: 1, }, ], info_entropy: 1.0549201679861442,}Collecting values
Section titled “Collecting values”Use collect and
distinct to gather values:
from {user: "alice", action: "login", timestamp: 2024-01-15T10:00:00}, {user: "bob", action: "view", timestamp: 2024-01-15T10:01:00}, {user: "alice", action: "edit", timestamp: 2024-01-15T10:02:00}, {user: "charlie", action: "login", timestamp: 2024-01-15T10:03:00}, {user: "alice", action: "logout", timestamp: 2024-01-15T10:04:00}summarize all_actions = collect(action), unique_users = distinct(user), event_count = count(){ all_actions: [ "login", "view", "edit", "login", "logout", ], unique_users: [ "alice", "charlie", "bob", ], event_count: 5,}First and last values
Section titled “First and last values”Get boundary values with first and last:
from {sensor: "temp1", reading: 72, time: 2024-01-15T09:00:00}, {sensor: "temp1", reading: 75, time: 2024-01-15T10:00:00}, {sensor: "temp1", reading: 78, time: 2024-01-15T11:00:00}, {sensor: "temp2", reading: 68, time: 2024-01-15T09:00:00}, {sensor: "temp2", reading: 71, time: 2024-01-15T10:00:00}summarize first_reading = first(reading), last_reading = last(reading), avg_reading = mean(reading), sensor{ first_reading: 72, last_reading: 78, avg_reading: 75.0, sensor: "temp1",}{ first_reading: 68, last_reading: 71, avg_reading: 69.5, sensor: "temp2",}Group and collect values
Section titled “Group and collect values”Use collect with grouping to build hierarchical structures:
from {dept: "Engineering", team: "Backend", member: "Alice"}, {dept: "Engineering", team: "Backend", member: "Bob"}, {dept: "Engineering", team: "Frontend", member: "Charlie"}, {dept: "Sales", team: "Direct", member: "David"}summarize dept, team, members=collect(member){ dept: "Sales", team: "Direct", members: [ "David", ],}{ dept: "Engineering", team: "Frontend", members: [ "Charlie", ],}{ dept: "Engineering", team: "Backend", members: [ "Alice", "Bob", ],}Boolean aggregations
Section titled “Boolean aggregations”Use all and any for boolean checks:
from {test: "unit", passed: true, duration: 45}, {test: "integration", passed: true, duration: 120}, {test: "e2e", passed: false, duration: 300}, {test: "performance", passed: true, duration: 180}summarize all_passed = all(passed), any_failed = any(not passed), total_duration = sum(duration){ all_passed: false, any_failed: true, total_duration: 645}Conditional aggregation
Section titled “Conditional aggregation”Collect values conditionally during aggregation:
from {src_ip: 10.0.0.5, dst_port: 22, bytes: 1024}, {src_ip: 192.168.1.10, dst_port: 80, bytes: 2048}, {src_ip: 10.0.0.5, dst_port: 443, bytes: 4096}, {src_ip: 192.168.1.10, dst_port: 22, bytes: 512}
let $critical_ports = [22, 3389, 5985]
summarize src_ip, total_bytes=sum(bytes), // Collect all unique ports all_ports=collect(dst_port), // Collect with conditional transformation port_types=collect("HIGH" if dst_port in $critical_ports else "LOW"){src_ip: 192.168.1.10, total_bytes: 2560, all_ports: [80, 22], port_types: ["LOW", "HIGH"]}{src_ip: 10.0.0.5, total_bytes: 5120, all_ports: [22, 443], port_types: ["HIGH", "LOW"]}This pattern enables:
- Building risk profiles during aggregation
- Transforming values during collection based on conditions
- Creating categorical metrics from raw data
Practical examples
Section titled “Practical examples”Analyze API response times
Section titled “Analyze API response times”from { requests: [ {endpoint: "/api/users", method: "GET", duration: 45, status: 200}, {endpoint: "/api/users", method: "POST", duration: 120, status: 201}, {endpoint: "/api/orders", method: "GET", duration: 89, status: 200}, {endpoint: "/api/users", method: "GET", duration: 38, status: 200}, {endpoint: "/api/orders", method: "GET", duration: 156, status: 500} ]}unroll requestssummarize endpoint=requests.endpoint, count=count(), avg_duration=mean(requests.duration){ endpoint: "/api/orders", count: 2, avg_duration: 122.5,}{ endpoint: "/api/users", count: 3, avg_duration: 67.66666666666667,}Calculate sales metrics
Section titled “Calculate sales metrics”from { sales: [ {date: "2024-01-01", amount: 1200, region: "North"}, {date: "2024-01-01", amount: 800, region: "South"}, {date: "2024-01-02", amount: 1500, region: "North"}, {date: "2024-01-02", amount: 950, region: "South"}, {date: "2024-01-03", amount: 1100, region: "North"} ]}// Calculate totals by dateunroll salessummarize date=sales.date, total=sum(sales.amount){date: "2024-01-01", total: 2000}{date: "2024-01-02", total: 2450}{date: "2024-01-03", total: 1100}Monitor system health
Section titled “Monitor system health”from { metrics: [ {timestamp: "10:00", cpu: 45, memory: 62, disk: 78}, {timestamp: "10:01", cpu: 52, memory: 64, disk: 78}, {timestamp: "10:02", cpu: 89, memory: 71, disk: 79}, {timestamp: "10:03", cpu: 67, memory: 68, disk: 79}, {timestamp: "10:04", cpu: 48, memory: 65, disk: 80} ]}cpu_alert = metrics.map(m => m.cpu > 80).any()avg_memory = metrics.map(m => m.memory).mean()disk_trend = metrics.last().disk - metrics.first().diskhealth_summary = { cpu_max: metrics.map(m => m.cpu).max(), memory_avg: avg_memory, disk_growth: disk_trend, critical: cpu_alert}{ metrics: [ ... ], cpu_alert: true, avg_memory: 66.0, disk_trend: 2, health_summary: { cpu_max: 89, memory_avg: 66.0, disk_growth: 2, critical: true, },}Complex aggregations
Section titled “Complex aggregations”Combine multiple aggregation functions for comprehensive analysis:
from {method: "GET", endpoint: "/api/users", status: 200, duration: 45}, {method: "POST", endpoint: "/api/users", status: 201, duration: 120}, {method: "GET", endpoint: "/api/orders", status: 200, duration: 89}, {method: "GET", endpoint: "/api/users", status: 200, duration: 38}, {method: "GET", endpoint: "/api/orders", status: 500, duration: 156}, {method: "DELETE", endpoint: "/api/users/123", status: 204, duration: 67}summarize request_count = count(), avg_duration = mean(duration), error_count = count_if(status, s => s >= 400), unique_endpoints = count_distinct(endpoint), methoderror_rate = error_count / request_count{ request_count: 1, avg_duration: 120.0, error_count: 0, unique_endpoints: 1, method: "POST", error_rate: 0.0,}{ request_count: 1, avg_duration: 67.0, error_count: 0, unique_endpoints: 1, method: "DELETE", error_rate: 0.0,}{ request_count: 4, avg_duration: 82.0, error_count: 1, unique_endpoints: 2, method: "GET", error_rate: 0.25,}Best practices
Section titled “Best practices”- Choose appropriate functions: Use
meanfor averages,medianfor skewed data - Handle empty collections: Check if lists are empty before aggregating
- Consider memory usage: Large collections can consume significant memory
- Combine aggregations: Calculate multiple statistics in one pass for efficiency