This guide provides an overview of data normalization in TQL. Normalization transforms raw, inconsistent data into a clean, standardized format that’s ready for analysis, storage, and sharing.
What is normalization?
Section titled “What is normalization?”Normalization involves several key transformations:
- Clean up values - Replace nulls, normalize sentinels, fix types
- Map to schemas - Translate fields to a standard schema like OCSF
- Package mappings - Create reusable, tested mapping operators
Each step builds on the previous. Start with clean data, then map to your target schema, and finally package your mappings for production use.
Why normalize?
Section titled “Why normalize?”Raw data from different sources varies in:
- Field names:
src_ipvssource_addressvsclient.ip - Value formats:
"true"vstruevs1vs"yes" - Missing values:
nullvs""vs"-"vs"N/A" - Timestamps: Unix epochs vs ISO strings vs custom formats
Normalization solves these inconsistencies, enabling:
- Unified queries across data sources
- Reliable enrichment and correlation
- Consistent analytics and dashboards
- Interoperability with external tools
The normalization pipeline
Section titled “The normalization pipeline”A typical normalization pipeline follows this structure:
// 1. Collect raw datafrom_kafka "raw-events"
// 2. Parse into structured eventsthis = message.parse_json()
// 3. Clean up valuesreplace what="N/A", with=nullreplace what="-", with=null
// 4. Map to target schemamy_source::ocsf::map
// 5. Output normalized eventspublish "normalized-events"Normalization guides
Section titled “Normalization guides”Work through these guides in order for a complete normalization workflow:
Start by fixing data quality issues:
- Replace null placeholders (
"None","N/A","-") - Normalize sentinel values
- Fix types (strings to timestamps, IPs, numbers)
- Provide default values for missing fields
Learn the comprehensive approach to OCSF mapping:
- Identify the correct event class
- Map fields by attribute group
- Handle unmapped fields
- Validate with
ocsf::cast
Brief guidance on alternative schemas:
- Elastic Common Schema (ECS)
- Google UDM
- Microsoft ASIM
When to normalize
Section titled “When to normalize”Normalize data at the ingestion point in your pipeline:
Collection → Parsing → Normalization → Storage/Forwarding ↑ You are hereNormalizing early ensures all downstream consumers work with consistent data. Avoid normalizing the same data multiple times by storing normalized events.