ClickHouse is an open-source analytical database that enables its users to generate powerful analytics, using SQL queries, in real-time.
How Tenzir Connects to ClickHouse
Section titled “How Tenzir Connects to ClickHouse”Tenzir connects to ClickHouse over the network using the native ClickHouse TCP
protocol using the official
clickhouse-cpp library. Tenzir
communicates with ClickHouse via the host and port you specify in the
to_clickhouse
operator. This means:
- Network: Tenzir and ClickHouse can run on the same machine (using
localhost
) or on different machines in the same network. You just need to make sure that Tenzir can reach the ClickHouse server. - IPC: There is no direct inter-process communication (IPC) mechanism; all communication uses ClickHouse’s network protocol.
- Co-deployment: For best performance and security, deploy Tenzir and ClickHouse in the same trusted network or use secure tunnels if needed.
Setting Up ClickHouse
Section titled “Setting Up ClickHouse”To get started with ClickHouse, follow the official quick start guide:
Native Binary
Section titled “Native Binary”-
Download the binary:
Terminal window curl https://clickhouse.com/ | sh -
Start the server:
Terminal window ./clickhouse serverThis downloads the ClickHouse binary and starts the server. You can then connect to ClickHouse at
localhost:9000
(native protocol) orlocalhost:8123
(HTTP interface). -
(Optionally) Start CLI client:
Terminal window ./clickhouse clientWith this client, you can now run SQL queries on your ClickHouse server.
Docker
Section titled “Docker”-
Run Docker:
Terminal window docker run -d --name clickhouse-server --ulimit nofile=262144:262144 \-p 9000:9000 -p 8123:8123 clickhouse/clickhouse-server
You can now connect to ClickHouse at localhost:9000
(native protocol) or
localhost:8123
(HTTP interface).
Usage Examples
Section titled “Usage Examples”These examples assume that the ClickHouse server is running on the same host as
Tenzir and that it allows non-TLS connections (hence using tls=false
in the
pipelines).
You can find out more about how to configure TLS on the
to_clickhouse
documentation and the
Clickhouse SSL-TLS configuration
guide
1. Easy Mode: Automatic table creation
Section titled “1. Easy Mode: Automatic table creation”Tenzir can automatically create tables in ClickHouse based on the incoming data schema. For example, to ingest OCSF network activity data:
from "ocsf_network_activity.json"ocsf::applyto_clickhouse table="ocsf.network_activity", primary=timestamp, tls=false
When creating a table, the to_clickhouse
operator uses the first event to determine the schema. You must take care that
there are no untyped nulls in this event, as the operator cannot transmit
those.
In this example, we use the ocsf::apply
operator, which will automatically align events with the correct OCSF schema,
giving all fields the correct types and adding all fields that should be in
ocsf.network_activity
. This ensures that we create a complete table without
missing or incorrectly typed columns.
You can now query the data in ClickHouse, e.g.:
SELECT median(traffic.bytes_in), median(traffic.bytes_out)FROM ocsf.network_activityGROUP BY *
2. Advanced: Explicit Table Creation
Section titled “2. Advanced: Explicit Table Creation”For more control, you can create the table in ClickHouse first. Use this approach when you know the full schema of your table, but not all events contain all fields and as such the operator would not create the correct table.
-
Create the table in ClickHouse:
CREATE TABLE my_table (id Int64,name String,mice_caught Nullable(Int64)) ENGINE = MergeTree() ORDER BY id; -
Ingest data from Tenzir:
my_file.csv id,name,mice_caught0,Jerry,1,Tom,0from "my_file.csv"to_clickhouse table="my_table", mode="append", tls=falseWe use the explicit
mode="append"
to ensure that the table already exists.In this example Jerry, being a mouse, has no value for
mice_caught
. Since we created a table with the expected type, this is not an issue.