The index is the heart of VAST's architecture, as it brings together a novel actor-based streaming architecture for high-speed data ingestion and low-latency query processing.
Before describing the data flow, it is important to understand the following concepts:
Partition: This is the horizontal scaling unit of the index. There exists always one active partition, which is mutable. All remaining partitions are passive partitions and immutable. Passive partitions come into play during query processing.
Meta Index: The meta index is the first-level index that every query hits. A lookup produces a set of candidate partitions in which the query can execute in a meaningful way. For example, if the user asks about IP addresses, but a partition has no data of such type, then the partition is not relevant for the query and doesn't need to be looked at.
The meta index relies on space-efficient probabilistic data structures and is therefore lossy, i.e., lookups can produce false positives that result in evaluation of irrelevant partitions for a given query. This was a deliberate design choice in order to meet the low-latency export requirements, especially for point and membership queries. For example, knowing whether a certain IP or string exists in the entire data set can be answered very efficiently with the meta index alone.
Value Index: The value index is a type-specific implementation for a single value, such as an IP address, string, or boolean. It operates on a well-defined domain. For example, VAST uses a sliced bitmap index for IP addresses to answer top-k and subnet membership queries efficiently.
Indexer: The actor that owns a value index. The partition manages a set of indexers, exactly one per unique record field in a table slice data. For example, if a table slice has a layout with 7 columns, then there exist 7 indexers, each of which are responsible for their respective column.
To meet the goal of high-throughput data import, the index seperates the read and write path. The figure below shows the involved actors during ingestion.
The data flow proceeds as follows:
A source parses input and generates batches of tables slices.
The importer in the remote VAST node accepts the tables slices, stamps every row with a unique 2^64 bit ID, and forwards the slice to the index.
The index forwards the current batch to the meta index.
The index moves the table slice to the active partition, which takes a view of each column and distributes each of those to a dedicated indexer.
The indexer adds the data to its contained value index.
If the active partition has reached it's capacity, the index moves it into a passive partition and creates a new active partition afterwards.
To meet the goal of low-latency data export, the index has a highly concurrent query execution engine. The figure below illustrates the data retrieval path.
There are multiple places in VAST that can spawn queries. Most commonly, the user issues a query. To simplify the discussion, we now assume that the node received a query expression from the user and that it has spawned a dedicated query actor for that expression. Thereafter, the query actor begins to interact with the index as follows:
The query sends the expression to the index, asking for hits.
The index asks the meta index to identify the set of relevant partitions for the given query.
The meta index returns a list of partition IDs.
The index looks in its in-memory cache for partitions. If it finds cached passive partitions, it spawns an evaluator for each partition and forwards the expression to all of them. If the query spans on-disk partitions, the index materializes them in order and then proceeds with spawning evaluators as above.
The evaluator chops the expression into predicates and sends those predicates to the corresponding indexers.
The indexers perform a lookup in the value index and ship results back to the evaluator.
The evaluator receives a stream of ID sets, and for every arriving set, it triggers an AST evaluation. After it processed all hits, it sends the merged ID set to the query actor.
They query actor then takes the index hits in the form of an ID set and relays it to the archive.
The archive materializes the corresponding table slices and sends it back to the query.
The query optionally performs a candidate check by applying the expression on the table slice.
The query sends the table slice to the sink, where the data is processed, e.g., rendered on the console.
To avoid that every index lookup scans the entire set of candidate partitions when a "taste" of hits would already suffice, the index schedules partition lookups lazily based on polling by the query.