This release introduces format and compression inference from URLs for HTTP data sources, streamlining data loading workflows. It also includes bug fixes for secret resolution and HTTP server mode.
Download the release on GitHub.
Features
Section titled “Features”HTTP format and compression inference
Section titled “HTTP format and compression inference”The from_http
and http
operators now automatically infer the file format
(such as JSON, CSV, Parquet, etc.) and compression type (such as gzip, zstd,
etc.) directly from the URL’s file extension, just like the generic from
operator. This makes it easier to load data from HTTP sources without manually
specifying the format or decompression step.
If the format or compression cannot be determined from the URL, the operators
will fall back to using the HTTP Content-Type
and Content-Encoding
response
headers to determine how to parse and decompress the data.
Examples
Inference Succeeds
from_http "https://example.org/data/events.csv.zst"
The operator infers both the zstd
compression and the CSV
format from the
file extension, decompresses, and parses accordingly.
Inference Fails, Fallback to Headers
from_http "https://example.org/download"
If the URL does not contain a recognizable file extension, the operator will use
the HTTP Content-Type
and Content-Encoding
headers from the response to
determine the format and compression.
Manual Specification Required
from_http "https://example.org/archive" { decompress_gzip read_json}
If neither the URL nor the HTTP headers provide enough information, you can explicitly specify the decompression and parsing steps using a pipeline argument.
Bug Fixes
Section titled “Bug Fixes”Fix crash in from secret
Section titled “Fix crash in from secret”We fixed a crash in from secret("key")
. This is now gracefully rejected, as
generic from
cannot resolve secrets.
By @IyeOnline in #5321.
from_http server=true
assertion failures
Section titled “from_http server=true assertion failures”from_http server=true
failed with internal assertions and stopped the pipeline on
receiving requests when metadata_field
was specified.