Manipulate strings

String manipulation is essential for cleaning, formatting, and transforming text data. This guide covers TQL’s comprehensive string functions, from simple case changes to complex pattern matching and encoding operations.

Change text case

Transform strings to different cases for consistency and formatting:

from {name: "john smith", title: "data ENGINEER", code: "xyz-123"}
lower_name = name.to_lower()
upper_code = code.to_upper()
title_case = title.to_title()
cap_name = name.capitalize()

{
  name: "john smith",
  title: "data ENGINEER",
  code: "xyz-123",
  lower_name: "john smith",
  upper_code: "XYZ-123",
  title_case: "Data Engineer",
  cap_name: "John smith",
}

Functions explained:

to_lower() - Converts all characters to lowercase
to_upper() - Converts all characters to uppercase
to_title() - Capitalizes first letter of each word
capitalize() - Capitalizes only the first letter

Trim whitespace

Clean up strings by removing unwanted whitespace:

from {
  raw: "  hello world  ",
  prefix: "\t\tdata",
  suffix: "value   \n"
}
trimmed = raw.trim()
no_prefix = prefix.trim_start()
no_suffix = suffix.trim_end()

{
  raw: "  hello world  ",
  prefix: "\t\tdata",
  suffix: "value   \n",
  trimmed: "hello world",
  no_prefix: "data",
  no_suffix: "value"
}

Functions:

trim() - Removes whitespace from both ends
trim_start() - Removes whitespace from beginning
trim_end() - Removes whitespace from end

Split and join strings

Break strings apart and combine them back together:

from {
  path: "/home/user/documents/report.pdf",
  tags: "security,network,alert"
}
parts = path.split("/")
tag_list = tags.split(",")
rejoined = parts.join("-")

{
  path: "/home/user/documents/report.pdf",
  tags: "security,network,alert",
  parts: [
    "",
    "home",
    "user",
    "documents",
    "report.pdf",
  ],
  tag_list: [
    "security",
    "network",
    "alert",
  ],
  rejoined: "-home-user-documents-report.pdf",
}

Split with regular expressions

Use split_regex() for complex splitting:

from {text: "error:42|warning:7|info:125"}
entries = text.split_regex("[:|]")

{
  text: "error:42|warning:7|info:125",
  entries: [
    "error",
    "42",
    "warning",
    "7",
    "info",
    "125",
  ],
}

Find and replace text

Replace specific text or patterns within strings:

Simple replacement

from {
  log: "User 192.168.1.1 accessed /admin",
  template: "Hello {name}, welcome to {place}"
}
masked = log.replace("192.168.1.1", "xxx.xxx.xxx.xxx")
filled = template.replace("{name}", "Alice").replace("{place}", "Tenzir")

{
  log: "User 192.168.1.1 accessed /admin",
  template: "Hello {name}, welcome to {place}",
  masked: "User xxx.xxx.xxx.xxx accessed /admin",
  filled: "Hello Alice, welcome to Tenzir",
}

Pattern-based replacement

Use replace_regex() for complex replacements:

from {
  text: "Contact us at 555-1234 or 555-5678",
  log: "Error at 2024-01-15 10:30:45: Connection failed"
}
redacted = text.replace_regex("\\d{3}-\\d{4}", "XXX-XXXX")
simple_log = log.replace_regex(
  "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}",
  "TIMESTAMP"
)

{
  text: "Contact us at 555-1234 or 555-5678",
  log: "Error at 2024-01-15 10:30:45: Connection failed",
  redacted: "Contact us at XXX-XXXX or XXX-XXXX",
  simple_log: "Error at TIMESTAMP: Connection failed",
}

Match patterns

Check if strings match specific patterns:

from {
  email: "alice@example.com",
  url: "https://tenzir.com",
  file: "report_2024.pdf"
}
is_email = email.match_regex("^[^@]+@[^@]+\\.[^@]+$")
is_https = url.starts_with("https://")
is_pdf = file.ends_with(".pdf")

{
  email: "alice@example.com",
  url: "https://tenzir.com",
  file: "report_2024.pdf",
  is_email: true,
  is_https: true,
  is_pdf: true,
}

Pattern matching functions:

match_regex() - Test against regular expression
starts_with() - Check string prefix
ends_with() - Check string suffix

Validate string content

Check what type of characters a string contains:

from {
  id: "12345",
  name: "Alice",
  code: "abc123",
  mixed: "Hello World!",
  spaces: "hello world"
}
id_numeric = id.is_numeric()
name_alpha = name.is_alpha()
code_alnum = code.is_alnum()
mixed_alpha = mixed.is_alpha()
has_lower = spaces.is_lower()
has_upper = name.is_title()

{
  id: "12345",
  name: "Alice",
  code: "abc123",
  mixed: "Hello World!",
  spaces: "hello world",
  id_numeric: true,
  name_alpha: true,
  code_alnum: true,
  mixed_alpha: false,
  has_lower: true,
  has_upper: true,
}

Validation functions:

is_numeric() - Contains only digits
is_alpha() - Contains only letters
is_alnum() - Contains only letters and digits
is_lower() - All cased characters are lowercase
is_upper() - All cased characters are uppercase
is_title() - String is in title case
is_printable() - Contains only printable characters

Measure string properties

Get information about string characteristics:

from {
  text: "Hello 世界",
  emoji: "👋 Hello!",
  path: "/var/log/system.log"
}
char_count = text.length_chars()
byte_count = text.length_bytes()
reversed = emoji.reverse()
filename = path.file_name()
directory = path.parent_dir()

{
  text: "Hello 世界",
  emoji: "👋 Hello!",
  path: "/var/log/system.log",
  char_count: 8,
  byte_count: 12,
  reversed: "!olleH 👋",
  filename: "system.log",
  directory: "/var/log",
}

String property functions:

length_chars() - Count Unicode characters
length_bytes() - Count bytes
reverse() - Reverse character order
file_name() - Extract filename from path
parent_dir() - Extract directory from path

Extract substrings

Use slice() to extract portions of strings:

from {
  text: "Hello, World!",
  id: "USER-12345-ACTIVE",
  timestamp: "2024-01-15T10:30:45"
}
greeting = text.slice(begin=0, end=5)
user_num = id.slice(begin=5, end=10)
date_part = timestamp.slice(begin=0, end=10)
status = id.slice(begin=11)

{
  text: "Hello, World!",
  id: "USER-12345-ACTIVE",
  timestamp: "2024-01-15T10:30:45",
  greeting: "Hello",
  user_num: "12345",
  date_part: "2024-01-15",
  status: "ACTIVE",
}

The slice() function parameters:

begin - Starting position (0-based, negative counts from end)
end - Ending position (exclusive, optional)
stride - Step between characters (optional, can be negative)

Encode and decode strings

Transform strings between different encodings:

Base64 encoding

from {secret: "my-api-key-12345"}
encoded = secret.encode_base64()
decoded = encoded.decode_base64()

{
  secret: "my-api-key-12345",
  encoded: "bXktYXBpLWtleS0xMjM0NQ==",
  decoded: b"my-api-key-12345",
}

Hex encoding

Use encode_hex() and decode_hex():

from {data: "Hello", hex_string: "48656c6c6f"}
hex = data.encode_hex()
decoded = hex.decode_hex()
decoded_blob = hex_string.decode_hex()

{
  data: "Hello",
  hex_string: "48656c6c6f",
  hex: "48656c6c6f",
  decoded: b"Hello",
  decoded_blob: b"Hello",
}

URL encoding

from {query: "search term with spaces & special=characters"}
encoded = query.encode_url()
decoded = encoded.decode_url()

{
  query: "search term with spaces & special=characters",
  encoded: "search%20term%20with%20spaces%20%26%20special%3Dcharacters",
  decoded: b"search term with spaces & special=characters",
}

Encoding functions:

Pad strings

Add characters to reach a specific length:

from {
  id: "42",
  code: "ABC"
}
padded_id = id.pad_start(5, "0")
padded_code = code.pad_end(10, "-")

{
  id: "42",
  code: "ABC",
  padded_id: "00042",
  padded_code: "ABC-------"
}

Padding functions:

pad_start() - Add characters to the beginning
pad_end() - Add characters to the end

Read file contents

Access text from files during processing:

from {}
hostname = file_contents("/etc/hostname")

{
  hostname: "my-server\n",
}

The file_contents() function reads the entire file as a string. The file path must be a constant expression. Use with caution on large files.

Practical examples

Clean and normalize user input

from {
  user_input: "  JOHN.SMITH@EXAMPLE.COM  ",
  phone: "(555) 123-4567"
}
email = user_input.trim().to_lower()
clean_phone = phone.replace_regex("[^0-9]", "")

{
  user_input: "  JOHN.SMITH@EXAMPLE.COM  ",
  phone: "(555) 123-4567",
  email: "john.smith@example.com",
  clean_phone: "5551234567"
}

Extract and validate identifiers

from {
  log: "User ID: ABC-123-XYZ performed action",
  url: "https://api.example.com/v2/users/42"
}
user_id = log.split("User ID: ")[1].split(" ")[0]
valid_id = user_id.match_regex("^[A-Z]{3}-\\d{3}-[A-Z]{3}$")
api_version = url.split("/")[4]
user_num = url.split("/").last()

{
  log: "User ID: ABC-123-XYZ performed action",
  url: "https://api.example.com/v2/users/42",
  user_id: "ABC-123-XYZ",
  valid_id: true,
  api_version: "v2",
  user_num: "42"
}

Build formatted output

from {
  first: "alice",
  last: "smith",
  dept: "engineering",
  id: 42
}
full_name = first.capitalize() + " " + last.to_upper()
email = first + "." + last + "@company.com"
badge = dept.to_upper().slice(begin=0, end=3) + "-" + id.string()

{
  first: "alice",
  last: "smith",
  dept: "engineering",
  id: 42,
  full_name: "Alice SMITH",
  email: "alice.smith@company.com",
  badge: "ENG-42",
}

Generate hash values

Create checksums and identifiers using hash functions:

Common hash algorithms

from {
  data: "Hello, World!",
  secret: "my-api-key-123"
}
md5 = data.hash_md5()
sha1 = data.hash_sha1()
sha224 = data.hash_sha224()
sha256 = data.hash_sha256()
sha384 = data.hash_sha384()
sha512 = data.hash_sha512()
xxh3 = data.hash_xxh3()

{
  data: "Hello, World!",
  secret: "my-api-key-123",
  md5: "65a8e27d8879283831b664bd8b7f0ad4",
  sha1: "0a0a9f2a6772942557ab5355d76af442f8f65e01",
  sha224: "72a23dfa411ba6fde01dbfabf3b00a709c93ebf273dc29e2d8b261ff",
  sha256: "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f",
  sha384: "5485cc9b3365b4305dfb4e8337e0a598a574f8242bf17289e0dd6c20a3cd44a089de16ab4ab308f63e44b1170eb5f515",
  sha512: "374d794a95cdcfd8b35993185fef9ba368f160d8daf432d08ba9f1ed1e5abe6cc69291e0fa2fe0006a52570ef18c19def4e617c33ce52ef0a6e5fbe318cb0387",
  xxh3: "c7269dc5f8602ca5",
}

Create unique identifiers

Use hashes to generate identifiers from multiple fields:

from {
  user_id: "alice123",
  timestamp: "2024-01-15T10:30:00",
  action: "login"
}
event_id = f"{user_id}-{timestamp}-{action}".hash_sha256().slice(begin=0, end=16)
short_hash = f"{user_id}{action}".hash_md5().slice(begin=0, end=8)
numeric_id = user_id.hash_xxh3()

{
  user_id: "alice123",
  timestamp: "2024-01-15T10:30:00",
  action: "login",
  event_id: "d5f456083b8fee43",
  short_hash: "1616f7f2",
  numeric_id: "ac6dfe13bd512d81",
}

Verify data integrity

from {
  file_content: "Important document content here...",
  expected_checksum: "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f"
}
actual_checksum = file_content.hash_sha256()
valid = actual_checksum == expected_checksum

{
  file_content: "Important document content here...",
  expected_checksum: "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f",
  actual_checksum: "25f898ef7be64ead26e775e41778c6b5b5e5fe135d1b6658b6a27f9334c4f085",
  valid: false,
}

Network security functions

Process network data with specialized security functions:

Generate Community IDs

Use community_id() to create standardized flow hashes:

from {
  src_ip: 192.168.1.100,
  dst_ip: 10.0.0.1,
  src_port: 54321,
  dst_port: 443,
  proto: "tcp"
}
flow_id = community_id(
  src_ip=src_ip,
  dst_ip=dst_ip,
  src_port=src_port,
  dst_port=dst_port,
  proto=proto
)

{
  src_ip: 192.168.1.100,
  dst_ip: 10.0.0.1,
  src_port: 54321,
  dst_port: 443,
  proto: "tcp",
  flow_id: "1:ZSU9hCO1tdr7pj3SCLkQ0XS3uvI=",
}

Anonymize IP addresses

Use encrypt_cryptopan() for consistent IP anonymization:

from {
  client_ip: 192.168.1.100,
  server_ip: 8.8.8.8,
  internal_ip: 10.0.0.5
}
anon_client = client_ip.encrypt_cryptopan(seed="mysecretkey12345")
anon_server = server_ip.encrypt_cryptopan(seed="mysecretkey12345")
anon_internal = internal_ip.encrypt_cryptopan(seed="mysecretkey12345")

{
  client_ip: 192.168.1.100,
  server_ip: 8.8.8.8,
  internal_ip: 10.0.0.5,
  anon_client: 206.216.1.132,
  anon_server: 110.0.51.203,
  anon_internal: 109.255.195.194,
}

Best practices

Chain operations efficiently: Combine multiple string operations in one expression
Validate before transforming: Check string content before applying operations
Handle edge cases: Empty strings, null values, and special characters
Use appropriate functions: Choose length_chars() vs length_bytes() based on needs
Be mindful of encoding: Ensure correct encoding when dealing with international text

Extract structured data from text - Parse complex text formats
Transform basic values - Convert between data types
Filter and select data - Use string functions in filters