Create a Tenzir package that parses raw log data into structured events.
Prerequisites: Read these pages before starting:
/tutorials/write-a-package- Package structure, operators, testing patterns/explanations/packages- Package concepts/reference/test-framework- Test framework reference
Execute the phases below in order. Do not skip phases.
Phase 1: Input Schema Analysis
Section titled “Phase 1: Input Schema Analysis”Objective: Learn about the input data and understand its structure.
Steps:
- Ask the user to provide sample log data (file path or pasted content)
- Identify the data source format (CSV, JSON, YAML, syslog, etc.)
- Identify vendor and product that may have generated this data
- Document the complete input schema in terms of fields and types
Completion: State “Phase 1 complete” before proceeding.
Phase 2: Package Scaffolding
Section titled “Phase 2: Package Scaffolding”Objective: Create the package structure for iterative development.
Steps:
-
Confirm the package ID with the user (typically vendor name, e.g.,
fortinet,cisco,microsoft). In the instructions below, replace<pkg>with the chosen package ID. -
Create the package directory structure as described in the Write a Package tutorial. Include
operators/parse.tql,tests/parse.tql, andtests/inputs/sample.txt. -
Create
package.yamlwith the package metadata. -
Create the initial
parseoperator inoperators/parse.tqlwith justread_linesas a starting point. -
Create a test file
tests/parse.tqlthat reads from the input:from_file f"{env("TENZIR_INPUTS")}/sample.txt" {<pkg>::parse} -
Save the sample log data to
tests/inputs/sample.txt. -
Create the initial baseline:
uvx tenzir-test --root <pkg> -u --summary
Completion: State “Phase 2 complete” before proceeding.
Phase 3: Iterate and Test
Section titled “Phase 3: Iterate and Test”Objective: Refine the parser until all fields are parsed and properly typed.
Prerequisites: Read these guides for transformation patterns:
/guides/data-shaping/transform-basic-values- Type conversion, null handling, sentinel values/guides/data-shaping/extract-structured-data-from-text- Nested structures, delimited data/guides/data-shaping/manipulate-strings- String cleanup, splitting, extraction
Loop until all fields are properly parsed:
- Make ONE modification to the
parseoperator. Work through these categories:- Type conversion: Parse timestamps, IPs, subnets (see
transform-basic-values) - Structure extraction: Parse nested JSON, CSV, key-value pairs (see
extract-structured-data-from-text) - Data cleaning: Normalize sentinel values to null, trim whitespace, extract substrings (see
transform-basic-valuesandmanipulate-strings)
- Type conversion: Parse timestamps, IPs, subnets (see
- Observe the impact of your change by re-running:
uvx tenzir-test --root <pkg> --summary
- If the diff looks good, update the baseline:
uvx tenzir-test --root <pkg> -u --summary - Go back to Step 1 and continue with the next modification
Completion: State “Phase 3 complete” before proceeding.
Phase 4: Summarize
Section titled “Phase 4: Summarize”Provide a final summary of the parser’s functionality:
- Input: Description of the input format and source
- TQL: The complete parsing logic
- Output: Description of the parsed schema with types
- Package structure: Tree view of the package
- Noteworthy findings: Any interesting discoveries or caveats