Skip to content

Sends and receives HTTP/1.1 requests.

from_http url:string, [method=string, body=record|string|blob, encode=string,
headers=record, metadata_field=field, error_field=field,
paginate=record->string|string, paginate_delay=duration,
connection_timeout=duration, max_retry_count=int,
retry_delay=duration, tls=record]
from_http url:string, server=true, [metadata_field=field, responses=record,
max_request_size=int, max_connections=int, tls=record]

The from_http operator issues HTTP requests or spins up an HTTP/1.1 server on a given address and forwards received requests as events.

URL to listen on or to connect to.

Must have the form <host>:<port> when server=true.

One of the following HTTP methods to use when using the client:

  • get
  • head
  • post
  • put
  • del
  • connect
  • options
  • trace

Defaults to get, or post if body is specified.

Body to send with the HTTP request.

If the value is a record, then the body is encoded according to the encode option and an appropriate Content-Type is set for the request.

Specifies how to encode record bodies. Supported values:

  • json
  • form

Defaults to json.

Record of headers to send with the request.

paginate = record -> string | string (optional)

Section titled “paginate = record -> string | string (optional)”

Controls automatic pagination of HTTP responses.

Lambda mode: A lambda expression to evaluate against the result of the request (optionally parsed by the given pipeline). If the expression evaluation is successful and non-null, the resulting string is used as the URL for a new GET request with the same headers.

Link mode: The string "link" to automatically follow pagination links in the HTTP Link response header per RFC 8288. The operator parses Link headers and follows the rel=next relation to fetch the next page. Pagination stops when the response no longer contains a rel=next link.

The duration to wait between consecutive pagination requests.

Defaults to 0s.

Timeout for the connection.

Defaults to 5s.

The maximum times to retry a failed request. Every request has its own retry count.

Defaults to 0.

The duration to wait between each retry.

Defaults to 1s.

Field to insert metadata into when using the parsing pipeline.

The response metadata (when using the client mode) has the following schema:

FieldTypeDescription
codeuint64The HTTP status code of the response.
headersrecordThe response headers.

The request metadata (when using the server mode) has the following schema:

FieldTypeDescription
headersrecordThe request headers.
queryrecordThe query parameters of the request.
pathstringThe path requested.
fragmentstringThe URI fragment of the request.
methodstringThe HTTP method of the request.
versionstringThe HTTP version of the request.

Field to insert the response body for HTTP error responses (status codes not in the 2xx or 3xx range).

When set, any HTTP response with a status code outside the 200–399 range will have its body stored in this field as a blob. Otherwise, error responses, alongside the original event, are skipped and an error is emitted.

Whether to spin up an HTTP server or act as an HTTP client.

Defaults to false, i.e., the HTTP client.

Specify custom responses for endpoints on the server. For example,

responses = {
"/resource/create": { code: 200, content_type: "text/html", body: "Created!" },
"/resource/delete": { code: 401, content_type: "text/html", body: "Unauthorized!" }
}

creates two special routes on the server with different responses.

Requests to an unspecified endpoint are responded with HTTP Status 200 OK.

The maximum size of an incoming request to accept.

Defaults to 10MiB.

The maximum number of simultaneous incoming connections to accept.

Defaults to 10.

TLS configuration. Provide an empty record (tls={}) to enable TLS with defaults or set fields to customize it.

{
skip_peer_verification: bool, // skip certificate verification.
cacert: string, // CA bundle to verify peers.
certfile: string, // client certificate to present.
keyfile: string, // private key for the client certificate.
min_version: string, // minimum TLS version (`"1.0"`, `"1.1"`, `"1.2"`, "1.3"`).
ciphers: string, // OpenSSL cipher list string.
client_ca: string, // CA to validate client certificates.
require_client_cert, // require clients to present a certificate.
}

The client_ca and require_client_cert options are only applied for operators that accept incoming client connections, and otherwise ignored.

Any value not specified in the record will either be picked up from the configuration or if not configured will not be used by the operator.

See the Node TLS Setup guide for more details.

A pipeline that receives the response body as bytes, allowing parsing per request. This is especially useful in scenarios where the response body can be parsed into multiple events.

If not provided, the operator will attempt to infer the parsing operator from the Content-Type header. Should this inference fail (e.g., unsupported or missing Content-Type), the operator raises an error.

Make a request to urlscan.io to search for scans for tenzir.com and get the first result.

from_http "https://urlscan.io/api/v1/search?q=tenzir.com"
unroll results
head 1
{
results: {
submitter: { ... },
task: { ... },
stats: { ... },
page: { ... },
_id: "0196edb1-521e-761f-9d62-1ca4cfad5b30",
_score: null,
sort: [ "1747744570133", "\"0196edb1-521e-761f-9d62-1ca4cfad5b30\"" ],
result: "https://urlscan.io/api/v1/result/0196edb1-521e-761f-9d62-1ca4cfad5b30/",
screenshot: "https://urlscan.io/screenshots/0196edb1-521e-761f-9d62-1ca4cfad5b30.png",
},
total: 9,
took: 296,
has_more: false,
}

Use the paginate parameter with a lambda to extract the next page URL from the response body:

from_http "https://api.example.com/data", paginate=(x => x.next_url?)

This sends a GET request to the initial URL and evaluates the x.next_url field in the response to determine the next URL for subsequent requests.

Use paginate="link" to automatically follow RFC 8288 Link headers with rel=next:

from_http "https://api.github.com/repos/tenzir/tenzir/issues?per_page=10",
paginate="link"

Many APIs (such as GitHub, GitLab, and Jira) use the Link header for pagination. The operator extracts the rel=next URL from the header and continues fetching until no more pages are available.

Configure retries for failed requests:

from_http "https://api.example.com/data", max_retry_count=3, retry_delay=2s

This tries up to 3 times, waiting 2 seconds between each retry.

Spin up a server with:

from_http "0.0.0.0:8080", server=true, metadata_field=metadata

Send a request to the HTTP endpoint via curl:

Terminal window
echo '{"key": "value"}' | gzip | curl localhost:8080 --data-binary @- -H 'Content-Encoding: gzip' -H 'Content-Type: application/json'

Observe the request in the Tenzir pipeline, parsed and decompressed:

{
key: "value",
metadata: {
headers: {
Host: "localhost:8080",
"User-Agent": "curl/8.13.0",
Accept: "*/*",
"Content-Encoding": "gzip",
"Content-Length": "37",
"Content-Type": "application/json",
},
path: "/",
method: "post",
version: "HTTP/1.1",
},
}

Last updated: