Skip to main content
Version: Next

http

Sends HTTP/1.1 requests and forwards the response.

http url:string, [method=string, payload=string, headers=record,
    response_field=field, metadata_field=field, paginate=string,
    paginate_delay=duration, parallel=int, tls=bool, certfile=string,
    keyfile=string, password=string, connection_timeout=duration,
    max_retry_count=int, retry_delay=duration] { … }

Description

The http operator issues HTTP/1.1 requests and forwards received responses as events.

url: string

URL to connect to.

method = string (optional)

One of the following HTTP method to use when using the client:

  • get
  • head
  • post
  • put
  • del
  • connect
  • options
  • trace

Defaults to get, or post if payload is specified.

payload = string (optional)

Payload to send with the HTTP request.

headers = record (optional)

Record of headers to send with the request.

response_field = field (optional)

Field to insert the response into.

Defaults to this.

metadata_field = field (optional)

Field to insert metadata into when using the parsing pipeline.

The metadata has the following schema:

FieldTypeDescription
codeuint64The HTTP status code of the response.
headersrecordThe response headers.

paginate = string (optional)

An expression to evaluate against the result of the request (optionally parsed by the given pipeline). If the expression evaluation is successful and non-null, the resulting string is used as the URL for a new GET request with the same headers.

paginate_delay = duration (optional)

The duration to wait between consecutive paginatation requests.

Defaults to 0s.

parallel = int (optional)

Maximum amount of requests that can be in progress at any time.

Defaults to 1.

tls = bool (optional)

Enables TLS.

certfile = string (optional)

Path to the client certificate.

keyfile = string (optional)

Path to the key for the client certificate.

password = string (optional)

Password file for keyfile.

connection_timeout = duration (optional)

Timeout for the connection.

Defaults to 5s.

max_retry_count = int (optional)

The maximum times to retry a failed request. Every request has its own retry count.

Defaults to 0.

retry_delay = duration (optional)

The duration to wait between each retry.

Defaults to 1s.

{ … }

A pipeline that receives the response body as bytes, allowing parsing per request. This is especially useful in scenarios where the response body can be parsed into multiple events.

Examples

Make a GET request

Here we make a request to urlscan.io to search for scans for tenzir.com and get the first result.

from {}
http "https://urlscan.io/api/v1/search?q=tenzir.com" {read_json}
unroll results
head 1
{
  results: {
    submitter: { ... },
    task: { ... },
    stats: { ... },
    page: { ... },
    _id: "0196edb1-521e-761f-9d62-1ca4cfad5b30",
    _score: null,
    sort: [ "1747744570133", "\"0196edb1-521e-761f-9d62-1ca4cfad5b30\"" ],
    result: "https://urlscan.io/api/v1/result/0196edb1-521e-761f-9d62-1ca4cfad5b30/",
    screenshot: "https://urlscan.io/screenshots/0196edb1-521e-761f-9d62-1ca4cfad5b30.png",
  },
  total: 9,
  took: 296,
  has_more: false,
}

Keeping input context

Frequently, the purpose of making real-time requests in a pipeline is to enrich the incoming data with additional context. In these cases, we want to keep the original event around. This can be done simply by specifying the response_field and metadata_field options as appropriate.

E.g. in the above example, let's assume we had some initial context that we want to keep around:

from { ctx: {severity: "HIGH"}, domain: "tenzir.com", ip: 0.0.0.0 }
http "https://urlscan.io/api/v1/search?q=" + domain, response_field=scan {read_json}
scan.results = scan.results[0]
{
  ctx: {
    severity: "HIGH",
  },
  domain: "tenzir.com",
  ip: 0.0.0.0,
  scan: {
    results: {
      submitter: { ... },
      task: { ... },
      stats: { ... },
      page: { ... },
      _id: "0196edb1-521e-761f-9d62-1ca4cfad5b30",
      _score: null,
      sort: [ "1747744570133", "\"0196edb1-521e-761f-9d62-1ca4cfad5b30\"" ],
      result: "https://urlscan.io/api/v1/result/0196edb1-521e-761f-9d62-1ca4cfad5b30/",
      screenshot: "https://urlscan.io/screenshots/0196edb1-521e-761f-9d62-1ca4cfad5b30.png",
    },
    total: 9,
    took: 88,
    has_more: false,
  },
}

Paginate an API

We can utilize the sort and has_more fields to get more pages from the API.

let $URL = "https://urlscan.io/api/v1/search?q=example.com"
from {}
http $URL, paginate=$URL + "&search_after=" + results.last().sort.first() + "," + results.last().sort.last().slice(begin=1, end=-1) if has_more? {
  read_json
}
head 10

Here we construct the next url for pagination by extracting values from the responses.

The query parameter search_after expects the two values from the sort key in the response to be joined with a ,. Thus forming a URL like https://urlscan.io/api/v1/search?q=example.com&search_after=1747796723608,0196f0cd-6fda-761a-81a6-ae1b18914e61.

The if has_more? ensures pagination only continues till we have a has_more field that is true.

Additionally, we limit the maximum pages by a simple head 10.