TQL programs compose statements into complete data processing workflows that can execute. Valid TQL programs adhere to the following rules:
- Adjacent operators must have identical types.
- A pipeline must be closed, i.e., begin with void input and end with void output.
Statement chaining
Section titled “Statement chaining”You chain statements with either a newline (\n) or pipe symbol (|). We
purposefully offer choice to cater to two primary styles:
- Vertical structuring with newlines for full-text editing
- Horizontal inline pipe composition for command-line usage
Prefer the vertical approach for readability in files and documentation. Throughout this documentation, we only use the vertical style for clarity and consistency.
Let’s juxtapose the two styles. Here’s a vertical TQL program:
let $ports = [22, 443]
from_file "/tmp/logs.json"where port in $portsselect src_ip, dst_ip, bytessummarize src_ip, total=sum(bytes)And here a horziontal one:
let $ports = [22, 443] | from "/tmp/logs.json" | where port in $ports | select src_ip, dst_ip, bytes | summarize src_ip, total=sum(bytes)In theory, you can combine pipes and newlines to write programs that resemble Kusto and similar languages. However, we discourage this practice because it can make the code harder to read and maintain, especially when adding nested pipelines that increase the level of indentation.
Diagnostics
Section titled “Diagnostics”TQL’s diagnostic system is designed to give you insights into what happens during data processing. There exist two types of diagnostics:
- Errors: Stop pipeline execution immediately (critical failures)
- Warnings: Signal data quality issues but continue processing
When a pipeline emits an error, it stops execution. Unless you configured the pipeline to restart on error, it now requires human intervention to resolve the issue and resume execution.
Warnings do not cause a screeching halt of the pipeline. They are useful for identifying potential issues that may impact the quality of the processed data, such as missing or unexpected values.
Pipeline nesting
Section titled “Pipeline nesting”Operators can contain entire subpipelines that execute based on the operator’s
semantics. You define subpipelines syntactically within a block of curly braces
({}).
There are three types of subpipelines based on what they expect and produce:
-
Closed subpipelines (void-to-void): Complete programs that run independently, used by operators like
everyandsubscribe. -
Parsing subpipelines (bytes-to-events): Transform raw bytes into structured events, used by input operators like
from_fileandfrom_http. -
Printing subpipelines (events-to-bytes): Transform structured events into raw bytes, used by output operators like
to.
Closed subpipelines
Section titled “Closed subpipelines”The every operator executes a closed subpipeline
at regular intervals:
every 1h { from_http "api.example.com" select domain, risk context::update "domains", key=domain, value=risk}Parsing subpipelines
Section titled “Parsing subpipelines”Input operators like from_file or
from_http that read raw bytes use parsing
subpipelines to convert bytes into events. This pattern separates where data
comes from (the outer operator) from how it’s parsed (the subpipeline):
from_file "data.log" { read_lines}The subpipeline contains read_* operators that perform the actual parsing. You
can also chain bytes-to-bytes transformations like decompress_* before
parsing:
from_file "logs.gz" { decompress_gzip read_json}When the input operator can infer the format automatically (e.g., from the file extension), you can omit the subpipeline:
from_file "data.json" // Automatically uses read_jsonOperators that produce events directly, like
from_kafka or
from_udp, don’t take a parsing subpipeline
because the data format is inherent to the source.
Comments
Section titled “Comments”Comments make implicit choices and assumptions explicit. They have no semantic effect and the compiler ignores them during parsing.
TQL features C-style comments, both single and multi-line.
Single-line comments
Section titled “Single-line comments”Use a double slash (//) to comment until the end of the line.
Here’s an example where a comment spans a full line:
// the app only supports lower-case user nameslet $user = "jane"Here’s an example where a comment starts in the middle of a line:
let $users = [ "jane", // NB: also admin! "john", // Been here since day 1.]Multi-line comments
Section titled “Multi-line comments”Use a slash-star (/*) to start a multi-line comment and a star-slash (*/)
to end it.
Here’s an example where a comment spans multiple lines:
/* * User validation logic * --------------------- * Validate user input against a set of rules. * If any rule fails, the user is rejected. * If all rules pass, the user is accepted. */let $user = "jane"Execution Model
Section titled “Execution Model”TQL pipelines execute on a streaming engine that processes data incrementally. Understanding the execution model helps you write efficient pipelines and predict performance characteristics.
Key execution principles:
- Stream processing by default: Data flows through operators as it arrives
- Lazy evaluation: Operations execute only when data flows through them
- Back-pressure handling: Automatic flow control prevents memory exhaustion
- Network transparency: Pipelines can span multiple nodes seamlessly
Streaming vs blocking
Section titled “Streaming vs blocking”Understanding operator behavior helps write efficient pipelines:
Streaming operators process events incrementally:
where: Filters one event at a timeselect: Transforms fields immediatelydrop: Removes fields as events flow
Blocking operators need all input before producing output:
sort: Must see all events to order themsummarize: Aggregates across the entire streamreverse: Needs complete input to reverse order
Efficient: streaming operations first:
from "large_file.json"where severity == "critical" // Streaming: reduces data earlyselect relevant_fields // Streaming: drops unnecessary datasort timestamp // Blocking: but on reduced datasetL Less efficient: blocking operation on full data:
from "large_file.json"sort timestamp // Blocking: processes everythingwhere severity == "critical" // Then filtersConstant vs runtime evaluation
Section titled “Constant vs runtime evaluation”Understanding when expressions evaluate helps write efficient pipelines:
let $threshold = 1Kilet $start_time = 2024-01-15T09:00:00 // Would be now() - 1h in real usagelet $config = { ports: [80, 443, 8080], networks: [10.0.0.0/8, 192.168.0.0/16],}
// Runtime: evaluated per eventfrom {bytes: 2Ki, timestamp: 2024-01-15T09:30:00}, {bytes: 512, timestamp: 2024-01-15T09:45:00}, {bytes: 3Ki, timestamp: 2024-01-15T10:00:00}where bytes > $threshold // Constant comparisonwhere timestamp > $start_time // Constant comparisoncurrent_time = 2024-01-15T10:30:00 // Would be now() in real usageage = current_time - timestamp // Runtime calculationNetwork transparency
Section titled “Network transparency”TQL pipelines can span network boundaries seamlessly. For example, the
import operator implicitly performs a network
connection based on where it runs. If the tenzir binary executes the pipeline,
the executor establishesa transparent network connection. If the pipeline runs
within a node, the executor passes the data directly to the next operator in the
same process.