This guide covers how to fetch data from HTTP APIs using the
from_http
and
http
operators. Whether you make simple
GET requests, handle authentication, or implement pagination, the operators
provide flexible HTTP client capabilities for API integration.
Basic API Requests
Section titled “Basic API Requests”Start with these fundamental patterns for making HTTP requests to APIs.
Simple GET Requests
Section titled “Simple GET Requests”To fetch data from an API endpoint, pass the URL as the first parameter to the
from_http
operator:
from_http "https://api.example.com/data"
The operator makes a GET request by default and forwards the response as an
event. The from_http
operator is an input operator, i.e., it starts a
pipeline. The companion operator http
is a transformation, allowing you to
specify the URL as a field by referencing an event field that contains the URL:
from {url: "https://api.example.com/data"}http url
This pattern is useful when processing multiple URLs or when URLs are generated
dynamically. Most of our subsequent examples use from_http
, as the operator
options are very similar.
Parsing the HTTP Rsponse Body
Section titled “Parsing the HTTP Rsponse Body”The from_http
and http
operators attempt to find the right parser for the
HTTP response body by inspecting the Content-Type
header. For example, if the
Content-Type
is application/json
, then Tenzir applies the
read_json
operator.
You can manually override the parser for the response body by specifying a
parsing pipeline, i.e., a pipeline that transforms bytes to events. For example,
if an API returns CSV data without a proper Conten-Type
, you can specify the
parsing pipeline as follows:
from_http "https://api.example.com/users" { read_csv}
This parses the response from CSV into structured events that you can process further.
POST Requests with Data
Section titled “POST Requests with Data”Send data to APIs by specifying the method
parameter as “post” and providing
the request body in the payload
parameter:
from_http "https://api.example.com/users", method="post", payload={"name": "John", "email": "john@example.com"}
Similarly, with the http
operator you can also parameterize the entire HTTP
request using event fields by referencing field values for each parameter:
from { url: "https://api.example.com/users", method: "post", data: { name: "John", email: "john@example.com" }}http url, method=method, payload=data
The operators automatically uses POST method when you specify a payload.
Request Configuration
Section titled “Request Configuration”Configure requests with headers, authentication, and other options for different API requirements.
Adding Headers
Section titled “Adding Headers”Include custom headers by providing the headers
parameter as a record
containing key-value pairs:
from_http "https://api.example.com/data", headers={ "Authorization": "Bearer your-token-here", "Content-Type": "application/json" }
Headers help you authenticate with APIs and specify request formats.
TLS and Security
Section titled “TLS and Security”Enable TLS by setting the tls
parameter to true
and configure client
certificates using the certfile
and keyfile
parameters:
from_http "https://secure-api.example.com/data", tls=true, certfile="/path/to/client.crt", keyfile="/path/to/client.key"
Use these options when APIs require client certificate authentication.
Timeout and Retry Configuration
Section titled “Timeout and Retry Configuration”Configure timeouts and retry behavior by setting the connection_timeout
,
max_retry_count
, and retry_delay
parameters:
from_http "https://api.example.com/data", timeout=10s, max_retries=3, retry_delay=2s
These settings help handle network issues and API rate limiting gracefully.
Data Enrichment
Section titled “Data Enrichment”Use HTTP requests to enrich existing data with information from external APIs.
Preserving Input Context
Section titled “Preserving Input Context”Keep original event data while adding API responses by specifying the
response_field
parameter to control where the response is stored:
from { domain: "example.com", severity: "HIGH", api_url: "https://threat-intel.example.com/lookup", response_field: "threat_data",}http f"{api_url}?domain={domain}", response_field=response_field
This approach preserves your original data and adds API responses in a specific field.
Adding Metadata
Section titled “Adding Metadata”Capture HTTP response metadata by specifying the metadata_field
parameter to
store status codes and headers separately from the response body:
from_http "https://api.example.com/status", response_field=data, metadata_field=http_meta
The metadata includes status codes and response headers for debugging and monitoring.
Pagination and Bulk Processing
Section titled “Pagination and Bulk Processing”Handle APIs that return large datasets across multiple pages.
Simple Pagination
Section titled “Simple Pagination”Implement automatic pagination by providing a lambda function to the paginate
parameter that extracts the next page URL from the response:
from_http "https://api.example.com/search?q=query", paginate=(response => "next_page_url" if response.has_more)
The operator continues making requests as long as the pagination lambda function returns a valid URL.
Complex Pagination Logic
Section titled “Complex Pagination Logic”Handle APIs with custom pagination schemes by building pagination URLs dynamically using expressions that reference response data:
let $base_url = "https://api.example.com/items"from_http f"{$base_url}?page=1", paginate=(x => f"{$base_url}?page={x.page + 1}" if x.page < x.total_pages,
This example builds pagination URLs dynamically based on response data.
Rate Limiting
Section titled “Rate Limiting”Control request frequency by configuring the paginate_delay
parameter to add
delays between requests and the parallel
parameter to limit concurrent
requests:
from { url: "https://api.example.com/data", paginate_delay: 500ms, parallel: 2}http url, paginate="next_url" if has_next, paginate_delay=paginate_delay, parallel=parallel
Use paginate_delay
and parallel
to manage request rates appropriately.
Practical Examples
Section titled “Practical Examples”These examples demonstrate typical use cases for API integration in real-world scenarios.
API Monitoring
Section titled “API Monitoring”Monitor API health and response times:
from_http "https://api.example.com/health", metadata_field=metadataselect date=metadata.headers["Date"].parse_time("%a, %d %b %Y %H:%M:%S %Z")latency = now() - date
The above example parses the Date
header from the HTTP response via
parse_time
into a timestamp and then
compares it to the current wallclock time using the
now
function.
Nit: %T
is a shortcut for %H:%M:%S
.
Error Handling
Section titled “Error Handling”Handle API errors and failures gracefully in your data pipelines.
Retry Configuration
Section titled “Retry Configuration”Configure automatic retries by setting the max_retry_count
parameter to
specify the number of retry attempts and retry_delay
to control the time
between retries:
from_http "https://unreliable-api.example.com/data", max_retries=5, retry_delay=2s
Status Code Handling
Section titled “Status Code Handling”Check HTTP status codes by capturing metadata and filtering based on the
code
field to handle different response types:
from_http "https://api.example.com/data", metadata_field=metadatawhere metadata.code >= 200 and metadata.code < 300
Best Practices
Section titled “Best Practices”Follow these practices for reliable and efficient API integration:
- Use appropriate timeouts. Set a reasonable
connection_timeout
for your use case. - Implement retry logic. Configure
max_retry_count
andretry_delay
for handling transient failures. - Respect rate limits. Use
parallel
andpaginate_delay
to control request rates. - Handle errors gracefully. Check status codes in metdata
(
metadata_field
) and implement fallback logic. - Secure credentials. Access API keys and tokens via secrets, not in code.
- Monitor API usage. Track response times and error rates for performance.