Manipulate strings

String manipulation is essential for cleaning, formatting, and transforming text data. This guide covers TQL’s comprehensive string functions, from simple case changes to complex pattern matching and encoding operations.

Change text case

Transform strings to different cases for consistency and formatting:

from {name: "john smith", title: "data ENGINEER", code: "xyz-123"}
lower_name = name.to_lower()
upper_code = code.to_upper()
title_case = title.to_title()
cap_name = name.capitalize()

{
  name: "john smith",
  title: "data ENGINEER",
  code: "xyz-123",
  lower_name: "john smith",
  upper_code: "XYZ-123",
  title_case: "Data Engineer",
  cap_name: "John smith",
}

Functions explained:

fnto_lower - Converts all characters to lowercase
fnto_upper - Converts all characters to uppercase
fnto_title - Capitalizes first letter of each word
fncapitalize - Capitalizes only the first letter

Trim whitespace

Clean up strings by removing unwanted whitespace:

from {
  raw: "  hello world  ",
  prefix: "\t\tdata",
  suffix: "value   \n"
}
trimmed = raw.trim()
no_prefix = prefix.trim_start()
no_suffix = suffix.trim_end()

{
  raw: "  hello world  ",
  prefix: "\t\tdata",
  suffix: "value   \n",
  trimmed: "hello world",
  no_prefix: "data",
  no_suffix: "value"
}

Functions:

fntrim - Removes whitespace from both ends
fntrim_start - Removes whitespace from beginning
fntrim_end - Removes whitespace from end

Split and join strings

Break strings apart and combine them back together:

from {
  path: "/home/user/documents/report.pdf",
  tags: "security,network,alert"
}
parts = path.split("/")
tag_list = tags.split(",")
rejoined = parts.join("-")

{
  path: "/home/user/documents/report.pdf",
  tags: "security,network,alert",
  parts: [
    "",
    "home",
    "user",
    "documents",
    "report.pdf",
  ],
  tag_list: [
    "security",
    "network",
    "alert",
  ],
  rejoined: "-home-user-documents-report.pdf",
}

Split with regular expressions

Use fnsplit_regex for complex splitting:

from {text: "error:42|warning:7|info:125"}
entries = text.split_regex("[:|]")

{
  text: "error:42|warning:7|info:125",
  entries: [
    "error",
    "42",
    "warning",
    "7",
    "info",
    "125",
  ],
}

Find and replace text

Replace specific text or patterns within strings:

Simple replacement

from {
  log: "User 192.168.1.1 accessed /admin",
  template: "Hello {name}, welcome to {place}"
}
masked = log.replace("192.168.1.1", "xxx.xxx.xxx.xxx")
filled = template.replace("{name}", "Alice").replace("{place}", "Tenzir")

{
  log: "User 192.168.1.1 accessed /admin",
  template: "Hello {name}, welcome to {place}",
  masked: "User xxx.xxx.xxx.xxx accessed /admin",
  filled: "Hello Alice, welcome to Tenzir",
}

Pattern-based replacement

Use fnreplace_regex for complex replacements:

from {
  text: "Contact us at 555-1234 or 555-5678",
  log: "Error at 2024-01-15 10:30:45: Connection failed"
}
redacted = text.replace_regex("\\d{3}-\\d{4}", "XXX-XXXX")
simple_log = log.replace_regex(
  "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}",
  "TIMESTAMP"
)

{
  text: "Contact us at 555-1234 or 555-5678",
  log: "Error at 2024-01-15 10:30:45: Connection failed",
  redacted: "Contact us at XXX-XXXX or XXX-XXXX",
  simple_log: "Error at TIMESTAMP: Connection failed",
}

Match patterns

Check if strings match specific patterns:

from {
  email: "alice@example.com",
  url: "https://tenzir.com",
  file: "report_2024.pdf"
}
is_email = email.match_regex("^[^@]+@[^@]+\\.[^@]+$")
is_https = url.starts_with("https://")
is_pdf = file.ends_with(".pdf")

{
  email: "alice@example.com",
  url: "https://tenzir.com",
  file: "report_2024.pdf",
  is_email: true,
  is_https: true,
  is_pdf: true,
}

Pattern matching functions:

fnmatch_regex - Test against regular expression
fnstarts_with - Check string prefix
fnends_with - Check string suffix

Validate string content

Check what type of characters a string contains:

from {
  id: "12345",
  name: "Alice",
  code: "abc123",
  mixed: "Hello World!",
  spaces: "hello world"
}
id_numeric = id.is_numeric()
name_alpha = name.is_alpha()
code_alnum = code.is_alnum()
mixed_alpha = mixed.is_alpha()
has_lower = spaces.is_lower()
has_upper = name.is_title()

{
  id: "12345",
  name: "Alice",
  code: "abc123",
  mixed: "Hello World!",
  spaces: "hello world",
  id_numeric: true,
  name_alpha: true,
  code_alnum: true,
  mixed_alpha: false,
  has_lower: true,
  has_upper: true,
}

Validation functions:

fnis_numeric - Contains only digits
fnis_alpha - Contains only letters
fnis_alnum - Contains only letters and digits
fnis_lower - All cased characters are lowercase
fnis_upper - All cased characters are uppercase
fnis_title - String is in title case
fnis_printable - Contains only printable characters

Measure string properties

Get information about string characteristics:

from {
  text: "Hello 世界",
  emoji: "👋 Hello!",
  path: "/var/log/system.log"
}
char_count = text.length_chars()
byte_count = text.length_bytes()
reversed = emoji.reverse()
filename = path.file_name()
directory = path.parent_dir()

{
  text: "Hello 世界",
  emoji: "👋 Hello!",
  path: "/var/log/system.log",
  char_count: 8,
  byte_count: 12,
  reversed: "!olleH 👋",
  filename: "system.log",
  directory: "/var/log",
}

String property functions:

fnlength_chars - Count Unicode characters
fnlength_bytes - Count bytes
fnreverse - Reverse character order
fnfile_name - Extract filename from path
fnparent_dir - Extract directory from path

Extract substrings

Use fnslice to extract portions of strings:

from {
  text: "Hello, World!",
  id: "USER-12345-ACTIVE",
  timestamp: "2024-01-15T10:30:45"
}
greeting = text.slice(begin=0, end=5)
user_num = id.slice(begin=5, end=10)
date_part = timestamp.slice(begin=0, end=10)
status = id.slice(begin=11)

{
  text: "Hello, World!",
  id: "USER-12345-ACTIVE",
  timestamp: "2024-01-15T10:30:45",
  greeting: "Hello",
  user_num: "12345",
  date_part: "2024-01-15",
  status: "ACTIVE",
}

The slice() function parameters:

begin - Starting position (0-based, negative counts from end)
end - Ending position (exclusive, optional)
stride - Step between characters (optional, can be negative)

Encode and decode strings

Transform strings between different encodings:

Base64 encoding

from {secret: "my-api-key-12345"}
encoded = secret.encode_base64()
decoded = encoded.decode_base64()

{
  secret: "my-api-key-12345",
  encoded: "bXktYXBpLWtleS0xMjM0NQ==",
  decoded: b"my-api-key-12345",
}

Hex encoding

Use fnencode_hex and fndecode_hex:

from {data: "Hello", hex_string: "48656c6c6f"}
hex = data.encode_hex()
decoded = hex.decode_hex()
decoded_blob = hex_string.decode_hex()

{
  data: "Hello",
  hex_string: "48656c6c6f",
  hex: "48656C6C6F",
  decoded: b"Hello",
  decoded_blob: b"Hello",
}

URL encoding

from {query: "search term with spaces & special=characters"}
encoded = query.encode_url()
decoded = encoded.decode_url()

{
  query: "search term with spaces & special=characters",
  encoded: "search%20term%20with%20spaces%20%26%20special%3Dcharacters",
  decoded: b"search term with spaces & special=characters",
}

Encoding functions:

Pad strings

Add characters to reach a specific length:

from {
  id: "42",
  code: "ABC"
}
padded_id = id.pad_start(5, "0")
padded_code = code.pad_end(10, "-")

{
  id: "42",
  code: "ABC",
  padded_id: "00042",
  padded_code: "ABC-------"
}

Padding functions:

fnpad_start - Add characters to the beginning
fnpad_end - Add characters to the end

Read file contents

Access text from files during processing:

from {}
hostname = file_contents("/etc/hostname")

{
  hostname: "my-server\n",
}

The fnfile_contents function reads the entire file as a string. The file path must be a constant expression. Use with caution on large files.

Practical examples

Clean and normalize user input

from {
  user_input: "  JOHN.SMITH@EXAMPLE.COM  ",
  phone: "(555) 123-4567"
}
email = user_input.trim().to_lower()
clean_phone = phone.replace_regex("[^0-9]", "")

{
  user_input: "  JOHN.SMITH@EXAMPLE.COM  ",
  phone: "(555) 123-4567",
  email: "john.smith@example.com",
  clean_phone: "5551234567"
}

Extract and validate identifiers

from {
  log: "User ID: ABC-123-XYZ performed action",
  url: "https://api.example.com/v2/users/42"
}
user_id = log.split("User ID: ")[1].split(" ")[0]
valid_id = user_id.match_regex("^[A-Z]{3}-\\d{3}-[A-Z]{3}$")
api_version = url.split("/")[3]
user_num = url.split("/").last()

{
  log: "User ID: ABC-123-XYZ performed action",
  url: "https://api.example.com/v2/users/42",
  user_id: "ABC-123-XYZ",
  valid_id: true,
  api_version: "v2",
  user_num: "42",
}

Build formatted output

from {
  first: "alice",
  last: "smith",
  dept: "engineering",
  id: 42
}
full_name = first.capitalize() + " " + last.to_upper()
email = first + "." + last + "@company.com"
badge = dept.to_upper().slice(begin=0, end=3) + "-" + id.string()

{
  first: "alice",
  last: "smith",
  dept: "engineering",
  id: 42,
  full_name: "Alice SMITH",
  email: "alice.smith@company.com",
  badge: "ENG-42",
}

Generate hash values

Create checksums and identifiers using hash functions:

Common hash algorithms

from {data: "Hello, World!"}
md5 = data.hash_md5()
sha1 = data.hash_sha1()
sha256 = data.hash_sha256()
xxh3 = data.hash_xxh3()

{
  data: "Hello, World!",
  md5: "65a8e27d8879283831b664bd8b7f0ad4",
  sha1: "0a0a9f2a6772942557ab5355d76af442f8f65e01",
  sha256: "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f",
  xxh3: "60415d5f616602aa",
}

Hash functions:

fnhash_md5 - MD5 digest (128-bit)
fnhash_sha1 - SHA-1 digest (160-bit)
fnhash_sha224 - SHA-224 digest
fnhash_sha256 - SHA-256 digest
fnhash_sha384 - SHA-384 digest
fnhash_sha512 - SHA-512 digest
fnhash_sha3_224 - SHA3-224 digest
fnhash_sha3_256 - SHA3-256 digest
fnhash_sha3_384 - SHA3-384 digest
fnhash_sha3_512 - SHA3-512 digest
fnhash_xxh3 - XXH3 digest (fast, 64-bit)

Create unique identifiers

Use hashes to generate identifiers from multiple fields:

from {
  user_id: "alice123",
  timestamp: "2024-01-15T10:30:00",
  action: "login"
}
event_id = f"{user_id}-{timestamp}-{action}".hash_sha256().slice(begin=0, end=16)
short_hash = f"{user_id}{action}".hash_md5().slice(begin=0, end=8)
numeric_id = user_id.hash_xxh3()

{
  user_id: "alice123",
  timestamp: "2024-01-15T10:30:00",
  action: "login",
  event_id: "d5f456083b8fee43",
  short_hash: "1616f7f2",
  numeric_id: "ac6dfe13bd512d81",
}

Verify data integrity

from {
  file_content: "Important document content here...",
  expected_checksum: "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f"
}
actual_checksum = file_content.hash_sha256()
valid = actual_checksum == expected_checksum

{
  file_content: "Important document content here...",
  expected_checksum: "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f",
  actual_checksum: "25f898ef7be64ead26e775e41778c6b5b5e5fe135d1b6658b6a27f9334c4f085",
  valid: false,
}

Network security functions

Process network data with specialized security functions:

Generate Community IDs

Use fncommunity_id to create standardized flow hashes:

from {
  src_ip: 192.168.1.100,
  dst_ip: 10.0.0.1,
  src_port: 54321,
  dst_port: 443,
  proto: "tcp"
}
flow_id = community_id(
  src_ip=src_ip,
  dst_ip=dst_ip,
  src_port=src_port,
  dst_port=dst_port,
  proto=proto
)

{
  src_ip: 192.168.1.100,
  dst_ip: 10.0.0.1,
  src_port: 54321,
  dst_port: 443,
  proto: "tcp",
  flow_id: "1:ZSU9hCO1tdr7pj3SCLkQ0XS3uvI=",
}

Anonymize IP addresses

Use fnencrypt_cryptopan for consistent IP anonymization:

from {
  client_ip: 192.168.1.100,
  server_ip: 8.8.8.8,
  internal_ip: 10.0.0.5
}
anon_client = client_ip.encrypt_cryptopan(seed="mysecretkey12345")
anon_server = server_ip.encrypt_cryptopan(seed="mysecretkey12345")
anon_internal = internal_ip.encrypt_cryptopan(seed="mysecretkey12345")

{
  client_ip: 192.168.1.100,
  server_ip: 8.8.8.8,
  internal_ip: 10.0.0.5,
  anon_client: 206.216.1.132,
  anon_server: 110.0.51.203,
  anon_internal: 109.255.195.194,
}

Best practices

Chain operations efficiently: Combine multiple string operations in one expression
Validate before transforming: Check string content before applying operations
Handle edge cases: Empty strings, null values, and special characters
Use appropriate functions: Choose length_chars() vs length_bytes() based on needs
Be mindful of encoding: Ensure correct encoding when dealing with international text