Skip to content

Filter and select data

Filtering and selecting are fundamental operations when working with data streams. This guide shows you how to filter events based on conditions and select specific fields from your data.

Before we dive in, it’s important to understand a key distinction in TQL:

  • Operators like where, select, and drop work on entire event streams
  • Functions like starts_with() or mathematical comparisons work on individual values within events

You’ll see both in action throughout this guide.

Use the where operator to keep only events that match specific conditions.

Filter events based on a simple condition. This example keeps only successful HTTP requests (status code 200):

from {status: 200, path: "/api/users"},
{status: 404, path: "/api/missing"},
{status: 200, path: "/home"}
where status == 200
{status: 200, path: "/api/users"}
{status: 200, path: "/home"}

The event with status: 404 is filtered out because it doesn’t match our condition.

Use logical operators (and, or, not) to combine multiple conditions:

from {status: 200, path: "/api/users", size: 1024},
{status: 404, path: "/api/missing", size: 512},
{status: 200, path: "/home", size: 2048}
where status == 200 and size > 1000
{status: 200, path: "/api/users", size: 1024}
{status: 200, path: "/home", size: 2048}

You can also use or and not with functions like starts_with():

from {status: 200, path: "/api/users"},
{status: 404, path: "/api/missing"},
{status: 500, path: "/api/error"}
where status == 200 or not path.starts_with("/api/m")
{status: 200, path: "/api/users"}
{status: 500, path: "/api/error"}

Functions work on values to create more sophisticated filters. For example, ends_with() checks string suffixes:

from {user: "alice", email: "alice@example.com"},
{user: "bob", email: "bob@gmail.com"},
{user: "charlie", email: "charlie@example.com"}
where email.ends_with("example.com")
{user: "alice", email: "alice@example.com"}
{user: "charlie", email: "charlie@example.com"}

Match patterns using regular expressions with match_regex():

from {log: "Error: Connection timeout"},
{log: "Info: Request processed"},
{log: "Error: Invalid input"}
where log.match_regex("Error:")
{log: "Error: Connection timeout"}
{log: "Error: Invalid input"}

The select operator lets you pick which fields to keep in your output.

Select only the fields you need:

from {name: "alice", age: 30, city: "NYC"},
{name: "bob", age: 25, city: "SF"}
select name, age
{name: "alice", age: 30}
{name: "bob", age: 25}

You can rename fields while selecting them. This is useful when standardizing field names from different data sources:

from {first_name: "alice", years: 30},
{first_name: "bob", years: 25}
select name=first_name, age=years
{name: "alice", age: 30}
{name: "bob", age: 25}

Here first_name becomes name and years becomes age in the output.

Create new fields with expressions during selection:

from {price: 100, tax_rate: 0.08},
{price: 50, tax_rate: 0.08}
select price, tax=price * tax_rate, total=price * (1 + tax_rate)
{price: 100, tax: 8.0, total: 108.0}
{price: 50, tax: 4.0, total: 54.0}

The drop operator removes specified fields, keeping everything else.

Remove fields you don’t need:

from {user: "alice", password: "secret", email: "alice@example.com"},
{user: "bob", password: "hidden", email: "bob@example.com"}
drop password
{user: "alice", email: "alice@example.com"}
{user: "bob", email: "bob@example.com"}

Remove several fields at once:

from {id: 1, internal_id: "xyz", debug: true, name: "alice"},
{id: 2, internal_id: "abc", debug: false, name: "bob"}
drop internal_id, debug
{id: 1, name: "alice"}
{id: 2, name: "bob"}

Use the set operator to override existing fields and add new fields without removing existing ones.

Add a constant field to all events:

from {user: "alice"},
{user: "bob"}
set source = "api"
{user: "alice", source: "api"}
{user: "bob", source: "api"}

Add fields based on calculations:

from {bytes_sent: 1024, bytes_received: 2048},
{bytes_sent: 512, bytes_received: 1024}
set total_bytes = bytes_sent + bytes_received
{bytes_sent: 1024, bytes_received: 2048, total_bytes: 3072}
{bytes_sent: 512, bytes_received: 1024, total_bytes: 1536}

Use the else word to provide default values for null fields:

from {name: "alice", score: 85},
{name: "bob"},
{name: "charlie", score: 95}
score = score? else 0
{name: "alice", score: 85}
{name: "bob", score: 0}
{name: "charlie", score: 95}

Real-world pipelines often combine multiple operations:

from {method: "GET", path: "/api/users", status: 200, duration_ms: 45},
{method: "POST", path: "/api/users", status: 201, duration_ms: 120},
{method: "GET", path: "/api/users/123", status: 404, duration_ms: 15}
where status >= 200 and status < 300
select method, path, duration = duration_ms.milliseconds()
{method: "GET", path: "/api/users", duration: 45ms}
{method: "POST", path: "/api/users", duration: 120ms}
  1. Filter early: Apply where conditions as early as possible to reduce the amount of data flowing through your pipeline.

  2. Select only what you need: Use select to keep only necessary fields, especially when dealing with large events.

  3. Choose the right operator:

    • Use select when you want to restrict the output to specific fields.
    • Use drop when you want to remove a few fields from many.
    • Use set when you want to add/override fields without changing existing ones.
  4. Understand null handling: The where operator skips events where the condition evaluates to null or false. Use the question mark operator (?) to suppress warnings when accessing fields that may not exist.

Last updated: