Skip to content

String manipulation is essential for cleaning, formatting, and transforming text data. This guide covers TQL’s comprehensive string functions, from simple case changes to complex pattern matching and encoding operations.

Transform strings to different cases for consistency and formatting:

from {name: "john smith", title: "data ENGINEER", code: "xyz-123"}
lower_name = name.to_lower()
upper_code = code.to_upper()
title_case = title.to_title()
cap_name = name.capitalize()
{
name: "john smith",
title: "data ENGINEER",
code: "xyz-123",
lower_name: "john smith",
upper_code: "XYZ-123",
title_case: "Data Engineer",
cap_name: "John smith",
}

Functions explained:

Clean up strings by removing unwanted whitespace:

from {
raw: " hello world ",
prefix: "\t\tdata",
suffix: "value \n"
}
trimmed = raw.trim()
no_prefix = prefix.trim_start()
no_suffix = suffix.trim_end()
{
raw: " hello world ",
prefix: "\t\tdata",
suffix: "value \n",
trimmed: "hello world",
no_prefix: "data",
no_suffix: "value"
}

Functions:

Break strings apart and combine them back together:

from {
path: "/home/user/documents/report.pdf",
tags: "security,network,alert"
}
parts = path.split("/")
tag_list = tags.split(",")
rejoined = parts.join("-")
{
path: "/home/user/documents/report.pdf",
tags: "security,network,alert",
parts: [
"",
"home",
"user",
"documents",
"report.pdf",
],
tag_list: [
"security",
"network",
"alert",
],
rejoined: "-home-user-documents-report.pdf",
}

Use fnsplit_regex for complex splitting:

from {text: "error:42|warning:7|info:125"}
entries = text.split_regex("[:|]")
{
text: "error:42|warning:7|info:125",
entries: [
"error",
"42",
"warning",
"7",
"info",
"125",
],
}

Replace specific text or patterns within strings:

from {
log: "User 192.168.1.1 accessed /admin",
template: "Hello {name}, welcome to {place}"
}
masked = log.replace("192.168.1.1", "xxx.xxx.xxx.xxx")
filled = template.replace("{name}", "Alice").replace("{place}", "Tenzir")
{
log: "User 192.168.1.1 accessed /admin",
template: "Hello {name}, welcome to {place}",
masked: "User xxx.xxx.xxx.xxx accessed /admin",
filled: "Hello Alice, welcome to Tenzir",
}

Use fnreplace_regex for complex replacements:

from {
text: "Contact us at 555-1234 or 555-5678",
log: "Error at 2024-01-15 10:30:45: Connection failed"
}
redacted = text.replace_regex("\\d{3}-\\d{4}", "XXX-XXXX")
simple_log = log.replace_regex(
"\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}",
"TIMESTAMP"
)
{
text: "Contact us at 555-1234 or 555-5678",
log: "Error at 2024-01-15 10:30:45: Connection failed",
redacted: "Contact us at XXX-XXXX or XXX-XXXX",
simple_log: "Error at TIMESTAMP: Connection failed",
}

Check if strings match specific patterns:

from {
email: "alice@example.com",
url: "https://tenzir.com",
file: "report_2024.pdf"
}
is_email = email.match_regex("^[^@]+@[^@]+\\.[^@]+$")
is_https = url.starts_with("https://")
is_pdf = file.ends_with(".pdf")
{
email: "alice@example.com",
url: "https://tenzir.com",
file: "report_2024.pdf",
is_email: true,
is_https: true,
is_pdf: true,
}

Pattern matching functions:

Check what type of characters a string contains:

from {
id: "12345",
name: "Alice",
code: "abc123",
mixed: "Hello World!",
spaces: "hello world"
}
id_numeric = id.is_numeric()
name_alpha = name.is_alpha()
code_alnum = code.is_alnum()
mixed_alpha = mixed.is_alpha()
has_lower = spaces.is_lower()
has_upper = name.is_title()
{
id: "12345",
name: "Alice",
code: "abc123",
mixed: "Hello World!",
spaces: "hello world",
id_numeric: true,
name_alpha: true,
code_alnum: true,
mixed_alpha: false,
has_lower: true,
has_upper: true,
}

Validation functions:

Get information about string characteristics:

from {
text: "Hello 世界",
emoji: "👋 Hello!",
path: "/var/log/system.log"
}
char_count = text.length_chars()
byte_count = text.length_bytes()
reversed = emoji.reverse()
filename = path.file_name()
directory = path.parent_dir()
{
text: "Hello 世界",
emoji: "👋 Hello!",
path: "/var/log/system.log",
char_count: 8,
byte_count: 12,
reversed: "!olleH 👋",
filename: "system.log",
directory: "/var/log",
}

String property functions:

Use fnslice to extract portions of strings:

from {
text: "Hello, World!",
id: "USER-12345-ACTIVE",
timestamp: "2024-01-15T10:30:45"
}
greeting = text.slice(begin=0, end=5)
user_num = id.slice(begin=5, end=10)
date_part = timestamp.slice(begin=0, end=10)
status = id.slice(begin=11)
{
text: "Hello, World!",
id: "USER-12345-ACTIVE",
timestamp: "2024-01-15T10:30:45",
greeting: "Hello",
user_num: "12345",
date_part: "2024-01-15",
status: "ACTIVE",
}

The slice() function parameters:

  • begin - Starting position (0-based, negative counts from end)
  • end - Ending position (exclusive, optional)
  • stride - Step between characters (optional, can be negative)

Transform strings between different encodings:

from {secret: "my-api-key-12345"}
encoded = secret.encode_base64()
decoded = encoded.decode_base64()
{
secret: "my-api-key-12345",
encoded: "bXktYXBpLWtleS0xMjM0NQ==",
decoded: b"my-api-key-12345",
}

Use fnencode_hex and fndecode_hex:

from {data: "Hello", hex_string: "48656c6c6f"}
hex = data.encode_hex()
decoded = hex.decode_hex()
decoded_blob = hex_string.decode_hex()
{
data: "Hello",
hex_string: "48656c6c6f",
hex: "48656C6C6F",
decoded: b"Hello",
decoded_blob: b"Hello",
}
from {query: "search term with spaces & special=characters"}
encoded = query.encode_url()
decoded = encoded.decode_url()
{
query: "search term with spaces & special=characters",
encoded: "search%20term%20with%20spaces%20%26%20special%3Dcharacters",
decoded: b"search term with spaces & special=characters",
}

Encoding functions:

Add characters to reach a specific length:

from {
id: "42",
code: "ABC"
}
padded_id = id.pad_start(5, "0")
padded_code = code.pad_end(10, "-")
{
id: "42",
code: "ABC",
padded_id: "00042",
padded_code: "ABC-------"
}

Padding functions:

Access text from files during processing:

from {}
hostname = file_contents("/etc/hostname")
{
hostname: "my-server\n",
}

The fnfile_contents function reads the entire file as a string. The file path must be a constant expression. Use with caution on large files.

from {
user_input: " JOHN.SMITH@EXAMPLE.COM ",
phone: "(555) 123-4567"
}
email = user_input.trim().to_lower()
clean_phone = phone.replace_regex("[^0-9]", "")
{
user_input: " JOHN.SMITH@EXAMPLE.COM ",
phone: "(555) 123-4567",
email: "john.smith@example.com",
clean_phone: "5551234567"
}
from {
log: "User ID: ABC-123-XYZ performed action",
url: "https://api.example.com/v2/users/42"
}
user_id = log.split("User ID: ")[1].split(" ")[0]
valid_id = user_id.match_regex("^[A-Z]{3}-\\d{3}-[A-Z]{3}$")
api_version = url.split("/")[3]
user_num = url.split("/").last()
{
log: "User ID: ABC-123-XYZ performed action",
url: "https://api.example.com/v2/users/42",
user_id: "ABC-123-XYZ",
valid_id: true,
api_version: "v2",
user_num: "42",
}
from {
first: "alice",
last: "smith",
dept: "engineering",
id: 42
}
full_name = first.capitalize() + " " + last.to_upper()
email = first + "." + last + "@company.com"
badge = dept.to_upper().slice(begin=0, end=3) + "-" + id.string()
{
first: "alice",
last: "smith",
dept: "engineering",
id: 42,
full_name: "Alice SMITH",
email: "alice.smith@company.com",
badge: "ENG-42",
}

Create checksums and identifiers using hash functions:

from {data: "Hello, World!"}
md5 = data.hash_md5()
sha1 = data.hash_sha1()
sha256 = data.hash_sha256()
xxh3 = data.hash_xxh3()
{
data: "Hello, World!",
md5: "65a8e27d8879283831b664bd8b7f0ad4",
sha1: "0a0a9f2a6772942557ab5355d76af442f8f65e01",
sha256: "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f",
xxh3: "60415d5f616602aa",
}

Hash functions:

Use hashes to generate identifiers from multiple fields:

from {
user_id: "alice123",
timestamp: "2024-01-15T10:30:00",
action: "login"
}
event_id = f"{user_id}-{timestamp}-{action}".hash_sha256().slice(begin=0, end=16)
short_hash = f"{user_id}{action}".hash_md5().slice(begin=0, end=8)
numeric_id = user_id.hash_xxh3()
{
user_id: "alice123",
timestamp: "2024-01-15T10:30:00",
action: "login",
event_id: "d5f456083b8fee43",
short_hash: "1616f7f2",
numeric_id: "ac6dfe13bd512d81",
}
from {
file_content: "Important document content here...",
expected_checksum: "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f"
}
actual_checksum = file_content.hash_sha256()
valid = actual_checksum == expected_checksum
{
file_content: "Important document content here...",
expected_checksum: "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f",
actual_checksum: "25f898ef7be64ead26e775e41778c6b5b5e5fe135d1b6658b6a27f9334c4f085",
valid: false,
}

Process network data with specialized security functions:

Use fncommunity_id to create standardized flow hashes:

from {
src_ip: 192.168.1.100,
dst_ip: 10.0.0.1,
src_port: 54321,
dst_port: 443,
proto: "tcp"
}
flow_id = community_id(
src_ip=src_ip,
dst_ip=dst_ip,
src_port=src_port,
dst_port=dst_port,
proto=proto
)
{
src_ip: 192.168.1.100,
dst_ip: 10.0.0.1,
src_port: 54321,
dst_port: 443,
proto: "tcp",
flow_id: "1:ZSU9hCO1tdr7pj3SCLkQ0XS3uvI=",
}

Use fnencrypt_cryptopan for consistent IP anonymization:

from {
client_ip: 192.168.1.100,
server_ip: 8.8.8.8,
internal_ip: 10.0.0.5
}
anon_client = client_ip.encrypt_cryptopan(seed="mysecretkey12345")
anon_server = server_ip.encrypt_cryptopan(seed="mysecretkey12345")
anon_internal = internal_ip.encrypt_cryptopan(seed="mysecretkey12345")
{
client_ip: 192.168.1.100,
server_ip: 8.8.8.8,
internal_ip: 10.0.0.5,
anon_client: 206.216.1.132,
anon_server: 110.0.51.203,
anon_internal: 109.255.195.194,
}
  1. Chain operations efficiently: Combine multiple string operations in one expression
  2. Validate before transforming: Check string content before applying operations
  3. Handle edge cases: Empty strings, null values, and special characters
  4. Use appropriate functions: Choose length_chars() vs length_bytes() based on needs
  5. Be mindful of encoding: Ensure correct encoding when dealing with international text

Last updated: