Threat Intel Matching

PRO

This feature is only available in the pro version of VAST. Please contact us if you are interested in trying it out.

Live intelligence matching is a mechanism that allows VAST to check if the value of a specified field in imported data is contained in a set of known values. Each such value is called an threat indicator or IoC (indicator of compromise), and each match is called a sighting.

The matching is implemented using high-performance probabilistic data structures to minimize the performance overhead of this feature and enable usage even on large, high-volume data sets. As a trade-off, matching is restricted to equality comparisons: For advanced queries like conjunctions, disjunctions or combining data from several sources, the regular export command needs to be used.

Usage

The basic usage pattern is to pick a source of threat indicators and a set of record fields, and to then spawn a vast matcher to perform the live matching between the two sources.

The easiest way to do this is to use the vast matcher start subcommand, which spawns a matcher and attaches to its result stream and outputs one line of JSON data for each sighting. The --match-fields and --match-attributes options can be used to select which fields of the input stream should be matched against the indicators of this matcher. By default, all fields with the #ioc attribute are matched.

Exporting Sightings

The vast matcher start command will attach to the output stream of the started matcher and output one line of json data for every sighting, and remove the matcher when the command is interrupted.

When more flexibility is required, one can use the vast spawn matcher --name=<matcher-name> command, which takes the same arguments as vast matcher start, to spawn a matcher that will live independently of the command invocation. The sightings of that matcher can then be exported by a query like vast export --continuous json "intel.sighting.matcher == <matcher-name>", in any format - in fact, that's how vast matcher start is implemented internally.

Adding IoCs

To add new indicators to a matcher, simply use vast import to import new records of the appropriate type. If no type was explicitly specified, the default intel.indicator type is used:

type intel.indicator = record{
ioc: string,
type: string,
reference: string,
}

The supported values for the "type" field are currently "domain", "url", "ip" and "ipv6".

For example, if you have a file indicators.csv containing indicators according to the above type description, you could import the indicators as follows:

$ head -3 indicators.csv
ioc,type,reference
evil.example.org,domain,Tenzir Documentation Sample IoC
127.0.0.1,ip,Tenzir Documentation Sample IoC
$ vast import csv --type=intel.indicator --read=indicators.csv
Coming Soon

It is currently not possible to configure a matcher to ignore new indicators that are imported to vast after the matcher was started.

Removing IoCs

To remove single iocs from a matcher, use the vast matcher ioc-remove subcommand, which requires a matcher name and the ioc string and type of the indicator to be removed.

vast matcher ioc-remove matcher-42 evil.org domain

Note that removing iocs in this way slightly increases the matcher overhead. To remove many iocs in bulk, it is recommended to start a new matcher with a ioc-query yielding the desired remaining iocs instead.

When the same indicator is added multiple times and subsequently removed, all of the previously added indicators are removed from the matcher. It is not necessary to remove the indicators multiple times as well.

note

This command only removes the specified indicators from a matcher, not from the database itself.

Custom IoC types

You can use any type you like as ioc type for a matcher, as long as it has fields named 'ioc' and 'type' with the same semantics as the intel.indicator type.

Coming Soon

The field names are currently hard-coded, a more flexible way of specifying them is on our roadmap.

Example

Let's assume we want to match IP's listed on the Feodo Tracker, a list of active C2 servers maintained by the popular anti-malware site abuse.ch. We first download the raw data:

wget https://feodotracker.abuse.ch/downloads/ipblocklist.txt

To use the data in a matcher, we need to import it into VAST as an indicator type. The generic indicator type intel.indicator requires the "type" and "reference" field to be set in addition to the raw ioc value from the block list

type intel.indicator = record{
ioc: string,
type: string,
reference: string
}

In addition, the downloaded block list contains commented-out lines starting with the '#' character and windows-style "\r\n" line breaks, so we add some pre-processing before piping the enriched json data to the vast import command:

cat ipblocklist.txt
| dos2unix \
| grep -V "^#" \
| jq -cR '{ ioc: ., type: "ip", reference: "Feodo Tracker" }' \
| vast import json --type=intel.indicator

Next, let's assume we have a zeek source continously importing data from zeek, and we want to get alerted if any connection to one of the C2 servers is detected. So we start the following matcher:

vast matcher start \
--ioc-type=intel.indicator \
--match-field=zeek.conn_id.orig_h \
--match-field=zeek.conn_id.resp_h

Note that intel.indicator is the default ioc type, so we would not have needed to specify it in the example above.

Also, this matcher will use all indicators of type intel.indicators to match against. If we wanted to restrict that, we could pass a custom ioc query to select the indicators to be loaded at startup.

vast matcher start \
--ioc-type=intel.indicator \
--ioc-query="intel.indicator.reference == \"Feodo Tracker\""
--match-fields=zeek.conn_id.resp_h

It is possible to specify multiple fields at once for a given matcher, e.g., --match-fields=zeek.conn_id.orig_h,zeek.conn_id.resp_h.

The matcher will print all confirmed sightings to its standard output. To test it, we can run zeek in a different terminal window to monitor connections and continuously import the results into VAST:

zeek -i eth0 &
tail -F conn.log | vast import zeek

Now, every time you visit one of the compromised servers listed in the block list (be careful!), one line of output should be printed by the matcher command.

Performance Tuning

The main tuning parameter for a matcher is its state size: The more space it is allowed to allocate, the lower the false positive rate will be and the less the overall system will be affected. With more and more data points being imported, having a low false positive rate becomes more and more important because the absolute number of false positives will increase linearly with the number of match queries, on top of the base system already being higher due to the various running importers.

The following table show some example values for appropriate state sizes:

Deployment sizeMatcher dataAppropriate State Size
Small500 Indicators, 10K data points/s input8 MiB
Medium200K Indicators, 500K data points/s input128 MiB
Large30M Indicators, 1M data points/s input600 MiB

Note that "input" in the table above above refers to the amount of matches that are performed, not the total amount of data that is imported to VAST. Additionally, even though the number of indicators and the matches per second are combined in the table above, each of them independently increases the appropriate state size for a matcher.

The matchers spawned by VAST are configured to use a state size of 128MiB each. If the number of false positives by a matcher exceeds the acceptable threshold, we recommend to try to divide the input space to multiple parallel matchers. If that is not possible, it will be necessary to rebuild VAST from source after adjusting the amount of memory used by each matcher.

Coming Soon

A way to configure the state size for each individual matcher from the command line or configuration is in progress.