VAST v2.4.1 improves the performance of queries when VAST
is under high load, and significantly reduces the time to first result for
queries with a low selectivity.
Reading Feather Files Incrementally VAST's Feather store naïvely used the Feather reader from the
Apache Arrow C++ library in its initial implementation. However, its API is
rather limited: It does not support reading record batches incrementally. We've
swapped this out with a more efficient implementation that does.
This is best explained visually:
eyJ2ZXJzaW9uIjoiMSIsImVuY29kaW5nIjoiYnN0cmluZyIsImNvbXByZXNzZWQiOnRydWUsImVuY29kZWQiOiJ4nO2cXVPiSFx1MDAxYobP/Vx1MDAxNZbv6er298ecoaCCjCOCirO7NVx1MDAxNZNcdTAwMDBcdTAwMTlDXHUwMDAyIYiwNf99m+hcdTAwMGIhJJJcZoJhd6iaUZJuuju57+fqp+n4997+/oE/7plcdTAwMDef9lx1MDAwZsxnXbMtw9NGXHUwMDA3v02PP5newHJcdTAwMWR1XG5cdTAwMDXvXHUwMDA37tDTg5JcdTAwMWTf71xyPv3+u9brXHUwMDFkzWtcdTAwMWTpbvelpmmbXdPxXHUwMDA3quxcdTAwMWbq/f7+38H/obY8U/c1p22bQYXgVKg5QKJHL10naFx1MDAxYVEhsOSCz1xuWIOias43XHJ1tqXZXHUwMDAzc35meujAh/7ZY6VTOX9cdTAwMDB+37P14nmvaM9bbVm2XffHdtAr3XNcdTAwMDeDw47m6515iYHvuY/mnWX4XHUwMDFkVVx1MDAwNkaOz+pcdTAwMGVcXHVcdTAwMTXmtTx32O445mCwUMftabrlj6fHXHUwMDAwmFx1MDAxZH25XHUwMDEwn/bnR57VO1x1MDAxMioxrYFD71/aPnFt15u2/T9cdTAwMTC85q0/aPpjW3XBMeZldNMghjYvM3pcdTAwMWRRuJ2OabU7vjpI5bwtM7i2XHUwMDEwXHUwMDExIVx1MDAwMaSIzc5MW+iVjeA2/1x1MDAxNVx1MDAxZbljvI78//d9fufx65FcdTAwMWbzvk7Ll6KKXHSrZkE5vvnsz1x1MDAwNlx1MDAxMbrP999vv5a9ZvVcdTAwMTJ34Vx1MDAxNSs9jlx1MDAxZVxuXHUwMDFkejAr9+P1t3k/hz1De5FccmRcdTAwMWNcdTAwMTLEXHUwMDAwXHUwMDA1mNDZedtyXHUwMDFl1UlnaNvzY67+OFfaXmggS1xuX+hnSNwwdFx1MDAwMSPiVqewlFx1MDAxMlx1MDAxMZla3fGjjld3R9M7Q8/8eGVTglx1MDAxN5VccsU6yvY9zVx1MDAxOfQ0T+lnWd1cdTAwMTCSZXUjXHUwMDFjVTdSUYUhzlBcdTAwMDZxLyhjScWJQqNcdTAwMDJRpTOSQWjz++k6ft2aXHUwMDA0kVx1MDAxMCxcdTAwMWM91bqWPV64JYFcdTAwMDLV9am6mrFf993wrX/QXHUwMDA2pmo3UFx1MDAxZF+oUbCttlx1MDAxM1x1MDAwNEM1XHUwMDEy01uQsG+pWD8r0LVcZiNcdTAwMWPAddVcck19pldOXHUwMDEzeF3PaluOZjeWe/mmr94mXHUwMDA3gknmwpJcdTAwMTJJXHUwMDEwxKm9hcRt3WeTXHUwMDFltlx1MDAxYtpxzat2T1x1MDAwMFx1MDAxMLtAXHUwMDBlgd6XXHUwMDFjLe1cdTAwMDFcdTAwMDBcdTAwMWHjLZSOXHUwMDFjXHUwMDEycCV5XHUwMDAwsphri+TQyUXpXHUwMDE51SrmRF5fY+ZeXHUwMDBl6fdeXHUwMDFlyYF5krghXHUwMDA2WIVcdTAwMTUmaWp1x4865+RcdTAwMTA4qmx2RNfR9tvsiJtcdTAwMTlcdTAwMTFcdTAwMTbVN8JcdTAwMDJcblx1MDAwNjHJIO/12JFFauuxo/Sk2UPVg/0/ndrQ9MaxXHUwMDA0IWCh3rtcdTAwMTFkRVx1MDAwMI5cdTAwMTJk1teXnq5BkdCAokaTKppcdTAwMDHBKExttM5I61x1MDAwM7NZbJBcdTAwMWWTvta8v61f8F3AiGT5wlxihlRcIlxiSE5cdTAwMTOQSqNcbvBhsWmPrHJT3J6DwX2lkkeMhMJHRN2SQqlSXHUwMDEwlj67jlx1MDAxZnTOKVwieVx1MDAwZSlCXHUwMDAwlYIjvj2KsFxmStthiqyIv5ujXGJLXFzF4lRcdTAwMTKs/qWHiH3hVyvP16IjOZ6U6VW3XHUwMDA1tcNdgFxihO+8jLUmRVx1MDAwNFNpPqcwi822XGJcdTAwMTHo3Fx1MDAxNHm3jY9bXHUwMDE3LVm6vnionDyX8lxiXHUwMDExLpLErS6vQFxc4vSpSPygc1x1MDAwZVx1MDAxMaiUnT+KIFx1MDAwMZAgXHUwMDA0i1xm8v5FkVx1MDAxNFx1MDAxNFlcdTAwMTGA16GI5nnuKN5kKMlkXHUwMDEwXHQup9+EpF8qflx1MDAxYVg3h/LUfVx1MDAwNvjssz85/2zeQCPvLsNcXFx1MDAxZVx0Tlx1MDAxMMZMMiRCup7Wp1x1MDAxMG7QcoLFWFx1MDAwZURcdTAwMWSnwlx1MDAwMKecqrC7XHUwMDE2UdCmiHLXdz7Xai33cnSq343A1bDhUTtcdTAwMGJRqKSMsvmFz2Dzga95/rHlXHUwMDE4ltOOVjFcdTAwMWQj4YytXHL8XHUwMDEzt9u1fNWNK9dy/GiJ4HNcdTAwMGJT23RMzYj55PA5dZOtyCStN/3QxWv6R+jKgfBlXHUwMDA0s9//+i22dEgmkfJ74Z+ZXHRcdTAwMGKSXHSr7MCF5OlcdFx1MDAxYi+CnHtfXHLxXGKHvD/PSlx1MDAwM++vN4980/mhiflb31x1MDAxOUFcdTAwMTV/Iebbg62Y92HTsPWt7na/LlqBpyhgg/69aay3kzOYjFYoXHUwMDE4wSzLXHUwMDEy3/N3QL5cdTAwMTVcdTAwMWYvXHUwMDBm21x1MDAxNubFYuuYjVh1XHUwMDE3srPoXHUwMDFlXHUwMDAzsqE9XHUwMDA2aZf4XGJhSOEmk6O2mJ2xplU2Srxy0zfGXHUwMDE1ODY4r5yP8pidhb7lXFxSN1x1MDAxMERcdTAwMDUunH6NL37UOYdcdTAwMDdBMjQ1fNH2JvMzXHUwMDE2g4zl9FxmUlx0XHUwMDA0koBlWcPelfws+Fx1MDAwNv9P53gxtG0hN1tcdTAwMTF+Y3dcdTAwMWG89HJccoAwmWgxLCSj6pXeYnj85XvnvlRptI59u4jwydfONdlcdTAwMDWAMJEvgEAhKZU8r1tcckjhrDdcdTAwMWHVeJd1XHUwMDFkf+ifXHUwMDE0a/zuNo9cdTAwMDB5Q91cblx1MDAxZpSo6VH67CN+1DlcdTAwMDdcYlx1MDAwN7lcdTAwMDSIgEIl5Fx1MDAwMP5cdTAwMWJcdTAwMTf4PlxmICvC72ZcdTAwMDCSvIJcdTAwMGVcdTAwMTEjkGFcdTAwMDDTL+595uKy1KreNZ9OXHUwMDBiZ+VuRVx1MDAxZt+eol1cdTAwMDBIdJPBXHUwMDA3XHUwMDAzROVcdTAwMWWUXHUwMDAymWkvz1x1MDAxNvmBSdXUm1+cOsDPza5cdTAwMGXhWVx0dfLIXHUwMDBmnrhVXHIzoFwiXHUwMDBilOn3YcZcdTAwMGY65/iQXCKP+JBcdTAwMTJyRoX8RY93pMeK2Luh9CN5L6iamyGMXHUwMDEwSk9cdTAwMGaDNs4nl4VcdTAwMTKu993ypFxcOC30r+Uu0IOS96XHulvUXHUwMDA0YmqGpuJsPvHRutZPmrVJreJXXHUwMDBiXHUwMDBlXHUwMDEzNzdDbZzLZ2RE4uosZ0Sq0yB9blx1MDAxZD/onOOD0qiwc7C5XHUwMDAwMqVthDDfXHUwMDFlPnhcdTAwMDaprYePXHUwMDBm3V2wXCJcdTAwMDBvbo9cdTAwMWFPfF5cdTAwMDZBXGKlgCRDXHUwMDEyYre7hbMv4PQ7v2hcdEooYjWjsFx1MDAwYlx1MDAxOIk+MPPRXHUwMDE4XHUwMDAxhEhKeaZp2lx1MDAxNjFyU2XXV1x1MDAxMlfoIeUnXdio6da4kEuMJO50xlxiYFxuWIbniONcdTAwMDedc4xEn5fJXHUwMDA1RiRT0zZcYrb4vMx/hFwiK+Lv5iiS7LMpRTBcdTAwMTYsQ7rf+DbpmoX+/ejaLVx1MDAwZYfWZe8rbzd2gVwiS1udP1x1MDAxYSNcdTAwMTBcdTAwMDNcdTAwMDRcdTAwMThcdTAwMTE5fWCmO6JW88tcdTAwMDSf9StcdTAwMGZ1XqqeWfpE5lx1MDAxMSMycaWWI1x1MDAwZZBEJD1G4lx1MDAwN51zjCztdc5cdTAwMDVHIEWISSgz7eX/XHUwMDA1klx1MDAxNCBZXHUwMDExgtdcdTAwMDFJkslYcsbPhbrHXHUwMDE4p89DXHUwMDFlYdU6q/OJjUqVXreASLVcdTAwMDJZ3j2GQpu9X/4oXHUwMDA2Ws9iyfiIs1doNjbL9lWyLyjOhI+ftJdcbuVcdTAwMTBcYvBTm4zD9lx1MDAxMvH2wkv2ekJH5FxixJpcbiVscPSCK/WznlpcdTAwMThJ1ECvvfkp44jkrV5cbv1cdTAwMDBKkuFxzlx1MDAxNmpcZk/748Pa/UVfjtpG+VvFONk155CPdlx1MDAwZeGAY1wi0Vx1MDAxNv6YzNQ4VEi0XePAXFxcdTAwMTlcdTAwMDe+XHUwMDFhZ+91rnig9Xp1X12l2XT24MkyR8dxNzp4TetcdTAwMDe2m1xu3FxmZsE/9n78XHUwMDAzrVM9ICJ9Load Store Evaluate Query Evaluate Query Evaluate Query time Load Batch Load Batch Load Batch Evaluate Query Evaluate Query Evaluate Query v2.4.0 v2.4.1
Within the scope of a single Feather store file, a single query takes the same
amount of time overall, but there exist two distinct advantages of this
approach:
The first result arrives much faster at the client. Stores do less work for cancelled queries. One additional benefit that is not immediately obvious comes into play when
queries arrives at multiple stores in parallel: disk reads are more evenly
spread out now, making them less likely to overlap between stores. For
deployments with slower I/O paths this can lead to a significant query
performance improvement.
To verify and test this, we've created a VAST database with 300M Zeek events
(33GB on disk) from a Corelight sensor. All tests were performed on a cold start
of VAST, i.e., we stopped and started VAST after every repetition of each test.
We performed three tests:
Export a single event (20 times) Export all events (20 times) Rebuild the entire database (3 times)The results are astonishingly good:
Test Benchmark v2.4.0 v2.4.1 Improvement (1) Avg. store load time 55.1ms 4.2ms 13.1x Time to first result/Total time 19.8ms 14.5ms 1.4x (2) Avg. store load time 386.5ms 7.3ms 52.9x Time to first result 69.2ms 25.4ms 2.7x Total time 39.38s 33.30s 1.2x (3) Avg. store load time 480.3ms 9.1ms 52.7x Total time 210.5s 198.0s 1.1x
If you're using the Feather store backend (the default as of v2.4.0), you will
see an immediate improvement with VAST v2.4.1. There are no other changes
between the two releases.
VAST also offers an experimental Parquet store backend, for which we plan to
make a similar improvement in a coming release.