Programs

TQL programs compose statements into complete data processing workflows that can execute. Valid TQL programs adhere to the following rules:

Adjacent operators must have identical types.
A pipeline must be closed, i.e., begin with void input and end with void output.

Statement chaining

You chain statements with either a newline (\n) or pipe symbol (|). We purposefully offer choice to cater to two primary styles:

Vertical structuring with newlines for full-text editing
Horizontal inline pipe composition for command-line usage

Prefer the vertical approach for readability in files and documentation. Throughout this documentation, we only use the vertical style for clarity and consistency.

Let’s juxtapose the two styles. Here’s a vertical TQL program:

let $ports = [22, 443]

from_file "/tmp/logs.json"
where port in $ports
select src_ip, dst_ip, bytes
summarize src_ip, total=sum(bytes)

And here a horziontal one:

let $ports = [22, 443] | from "/tmp/logs.json" | where port in $ports | select src_ip, dst_ip, bytes | summarize src_ip, total=sum(bytes)

In theory, you can combine pipes and newlines to write programs that resemble Kusto and similar languages. However, we discourage this practice because it can make the code harder to read and maintain, especially when adding nested pipelines that increase the level of indentation.

Diagnostics

TQL’s diagnostic system is designed to give you insights into what happens during data processing. There exist two types of diagnostics:

Errors: Stop pipeline execution immediately (critical failures)
Warnings: Signal data quality issues but continue processing

When a pipeline emits an error, it stops execution. Unless you configured the pipeline to restart on error, it now requires human intervention to resolve the issue and resume execution.

Warnings do not cause a screeching halt of the pipeline. They are useful for identifying potential issues that may impact the quality of the processed data, such as missing or unexpected values.

Pipeline nesting

Operators can contain entire subpipelines that execute based on the operator’s semantics. You define subpipelines syntactically within a block of curly braces ({}).

There are three types of subpipelines based on what they expect and produce:

Closed subpipelines (void-to-void): Complete programs that run independently, used by operators like every and subscribe.
Parsing subpipelines (bytes-to-events): Transform raw bytes into structured events, used by input operators like from_file and from_http.
Printing subpipelines (events-to-bytes): Transform structured events into raw bytes, used by output operators like to.

Closed subpipelines

The every operator executes a closed subpipeline at regular intervals:

every 1h {
  from_http "api.example.com"
  select domain, risk
  context::update "domains", key=domain, value=risk
}

Parsing subpipelines

Input operators like from_file or from_http that read raw bytes use parsing subpipelines to convert bytes into events. This pattern separates where data comes from (the outer operator) from how it’s parsed (the subpipeline):

from_file "data.log" {
  read_lines
}

The subpipeline contains read_* operators that perform the actual parsing. You can also chain bytes-to-bytes transformations like decompress_* before parsing:

from_file "logs.gz" {
  decompress_gzip
  read_json
}

When the input operator can infer the format automatically (e.g., from the file extension), you can omit the subpipeline:

from_file "data.json"  // Automatically uses read_json

Operators that produce events directly, like from_kafka or from_udp, don’t take a parsing subpipeline because the data format is inherent to the source.

Comments

Comments make implicit choices and assumptions explicit. They have no semantic effect and the compiler ignores them during parsing.

TQL features C-style comments, both single and multi-line.

Single-line comments

Use a double slash (//) to comment until the end of the line.

Here’s an example where a comment spans a full line:

// the app only supports lower-case user names
let $user = "jane"

Here’s an example where a comment starts in the middle of a line:

let $users = [
  "jane", // NB: also admin!
  "john", // Been here since day 1.
]

Multi-line comments

Use a slash-star (/*) to start a multi-line comment and a star-slash (*/) to end it.

Here’s an example where a comment spans multiple lines:

/*
 * User validation logic
 * ---------------------
 * Validate user input against a set of rules.
 * If any rule fails, the user is rejected.
 * If all rules pass, the user is accepted.
 */
let $user = "jane"

Execution Model

TQL pipelines execute on a streaming engine that processes data incrementally. Understanding the execution model helps you write efficient pipelines and predict performance characteristics.

Key execution principles:

Stream processing by default: Data flows through operators as it arrives
Lazy evaluation: Operations execute only when data flows through them
Back-pressure handling: Automatic flow control prevents memory exhaustion
Network transparency: Pipelines can span multiple nodes seamlessly

Streaming vs blocking

Understanding operator behavior helps write efficient pipelines:

Streaming operators process events incrementally:

where: Filters one event at a time
select: Transforms fields immediately
drop: Removes fields as events flow

Blocking operators need all input before producing output:

sort: Must see all events to order them
summarize: Aggregates across the entire stream
reverse: Needs complete input to reverse order

Efficient: streaming operations first:

from "large_file.json"
where severity == "critical"    // Streaming: reduces data early
select relevant_fields          // Streaming: drops unnecessary data
sort timestamp                  // Blocking: but on reduced dataset

L Less efficient: blocking operation on full data:

from "large_file.json"
sort timestamp                  // Blocking: processes everything
where severity == "critical"    // Then filters

Constant vs runtime evaluation

Understanding when expressions evaluate helps write efficient pipelines:

let $threshold = 1Ki
let $start_time = 2024-01-15T09:00:00  // Would be now() - 1h in real usage
let $config = {
  ports: [80, 443, 8080],
  networks: [10.0.0.0/8, 192.168.0.0/16],
}

// Runtime: evaluated per event
from {bytes: 2Ki, timestamp: 2024-01-15T09:30:00},
     {bytes: 512, timestamp: 2024-01-15T09:45:00},
     {bytes: 3Ki, timestamp: 2024-01-15T10:00:00}
where bytes > $threshold            // Constant comparison
where timestamp > $start_time       // Constant comparison
current_time = 2024-01-15T10:30:00  // Would be now() in real usage
age = current_time - timestamp      // Runtime calculation

Network transparency

TQL pipelines can span network boundaries seamlessly. For example, the import operator implicitly performs a network connection based on where it runs. If the tenzir binary executes the pipeline, the executor establishesa transparent network connection. If the pipeline runs within a node, the executor passes the data directly to the next operator in the same process.