Size a node
To better understand what resources you need to run a node, we provide guidance on sizing and a calculator to derive concrete CPU, RAM, and storage requirements.
Considerations
Several factors have an impact on sizing. Since you can run many types of workloads in pipelines, it is difficult to make a one-size-fits-all recommendation. The following considerations affect your resource requirements:
Workloads
Depending on what you do with pipelines, you may generate a different resource profile.
Data Shaping
Shaping operation changes the form of the data, e.g., filtering events, removing columns, or changing values. This workload predominantly incurs CPU load.
Aggregation
Performing in-stream or historical aggregations often requires extensive buffering of the aggregation groups, which adds to your RAM requirements. If you run intricate custom aggregation functions, you also may see an additional increase CPU usage.
Enrichment
Enriching dataflows with contexts requires holding in-memory state proportional to the context size. Therefore, enrichment affects your RAM requirements. Bloom filters are a fixed-size space-efficient structure for representing large sets, and lookup tables grow linearly with the number of entries.
Data Diversity
The more data sources you have, the more pipelines you run. In the simplest scenario where you just import all data into a node, you deploy one pipeline per data source. The number of data sources is a thus a lower bound for the number of pipelines.
Data Volume
The throughput of pipeline has an impact on performance. Pipelines with low data volume do not strain the system much, but high-volume pipelines substantially affect CPU and RAM requirements. Therefore, understanding your ingress volume, either as events per second or bytes per day, is helpful for sizing your node proportionally.
Retention
When you leverage the node's built-in storage engine by importing and exporting data, you need persistent storage. To assess your retention span, you need to understand your data volume and your capacity.
Tenzir storage engine builds sparse indexes to accelerate historical queries. Based on how aggressively configure indexing, your RAM requirements may vary.