Skip to main content
Version: Next

Shape data

Tenzir comes with numerous transformation operators that do change the the shape of their input and produce a new output. Here is a visual overview of transformations that you can perform over a data frame:

We'll walk through examples for each depicted operator, using the M57 dataset. All examples assume that you have imported the M57 sample data into a node, as explained in the quickstart. We therefore start every pipeline with export.

Filter events with where

Use where to filter events in the input with an expression:

from [
  {x: 1, y: "foo"},
  {x: 2, y: "bar"},
  {x: 3, y: "baz"},
]
where x != 2 and y.starts_with("b")
{x: 3, y: "baz"}

Slice events with head, tail, and slice

Use the head and tail operators to get the first or last N records of the input.

Get the first event:

from [
  {x: 1, y: "foo"},
  {x: 2, y: "bar"},
  {x: 3, y: "baz"},
]
head 1
{x: 1, y: "foo"}

Get the last two events:

from [
  {x: 1, y: "foo"},
  {x: 2, y: "bar"},
  {x: 3, y: "baz"},
]
tail 2
{x: 2, y: "bar"}
{x: 3, y: "baz"}
tail is blocking

The tail operator must wait for its entire input, whereas head N terminates immediately after the first N records have arrived. Use head for the majority of use cases and tail only when you have to.

The slice operator generalizes head and tail by allowing for more flexible slicing. For example, to return every other event starting from the third:

from [
  {x: 1, y: "foo"},
  {x: 2, y: "bar"},
  {x: 3, y: "baz"},
  {x: 4, y: "qux"},
  {x: 5, y: "corge"},
  {x: 6, y: "grault"},
]
slice begin=3, stride=2
{x: 4, y: "qux"}
{x: 6, y: "grault"}

Pick fields with select and drop

Use the select operator to pick fields:

from [
  {x: 1, y: "foo"},
  {x: 2, y: "bar"},
  {x: 3, y: "baz"},
]
select x
{x: 1}
{x: 2}
{x: 3}

The drop operator is the dual to select and removes the specified fields:

from [
  {x: 1, y: "foo"},
  {x: 2, y: "bar"},
  {x: 3, y: "baz"},
]
drop x
{y: "foo"}
{y: "bar"}
{y: "baz"}

Sample schemas with taste

The taste operator provides a sample of the first N events of every unique schemas. For example, to get 3 unique samples:

from [
  {x: 1, y: "foo"},
  {x: 2, y: "bar"},
  {x: 1},
  {x: 2},
  {y: "foo"},
]
taste 1
{x: 1, y: "foo"}
{x: 1}
{y: "foo"}

Add and rename fields with set assignment

Use the set operator to add new fields to the output.

from [
  {x: 1},
  {x: 2},
]
set y = x + 1
{x: 1, y: 2}
{x: 2, y: 3}

Rename fields by combining set with drop:

from [
  {x: 1},
  {x: 2},
]
set y=x
drop x
{y: 1}
{y: 2}

Similarly, you can rename and project at the same time with select:

from [
  {x: 1, y: "foo"},
  {x: 2, y: "bar"},
]
select y=x
{y: 1}
{y: 2}

Aggreate events with summarize

Use summarize to group and aggregate data.

from [
  {x: 0, y: 0, z: 1},
  {x: 1, y: 1, z: 2},
  {x: 1, y: 1, z: 3},
]
summarize y, x=sum(x)
{y: 0, x: 0}
{y: 1, x: 2}

A variety of aggregation functions make it possible to combine grouped data.

Reorder events with sort

Use sort to arrange the output records according to the order of a specific field.

from [
  {x: 2, y: "bar"},
  {x: 3, y: "baz"},
  {x: 1, y: "foo"},
]
sort -x
{x: 3, y: "baz"}
{x: 2, y: "bar"}
{x: 1, y: "foo"}

Prepending the field with - reverses the sort order.