Skip to content

Writes events to one or multiple objects in Amazon S3.

to_s3 url:string, [anonymous=bool, aws_iam=record, max_size=int,
timeout=duration, partition_by=list<field>] {}

The to_s3 operator writes events to Amazon S3, automatically opening new objects when a rotation condition triggers. It supports hive-style partitioning through a ** placeholder in the URL and per-partition unique object keys through a {uuid} placeholder.

By default, authentication is handled by AWS’s default credentials provider chain, which may read from multiple environment variables and credential files:

  • ~/.aws/credentials and ~/.aws/config
  • AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  • AWS_SESSION_TOKEN
  • EC2 instance metadata service
  • ECS container credentials

URL identifying the S3 location where data should be written.

Supported URI format: s3://[<access-key>:<secret-key>@]<bucket-name>/<full-path-to-object>(?<options>)

Options can be appended to the path as query parameters:

  • region: AWS region (e.g., us-east-1)
  • scheme: Connection scheme (http or https)
  • endpoint_override: Custom S3-compatible endpoint
  • allow_bucket_creation: Allow creating buckets if they don’t exist
  • allow_bucket_deletion: Allow deleting buckets

The path portion may contain two placeholders:

  • ** marks the location where hive partition segments are inserted. When present, partition_by must also be set, and vice versa.
  • {uuid} expands to a fresh UUIDv7 for every object. This is required when partitioning or when rotation can produce multiple objects, so that rotated or per-partition objects do not overwrite each other.

Use anonymous credentials instead of any configured authentication.

Defaults to false.

Configures explicit AWS credentials or IAM role assumption. If not specified, the operator uses the AWS SDK's default credential chain.

{
region: string, // AWS region for API requests.
access_key_id: string, // AWS access key ID.
secret_access_key: string, // AWS secret access key.
session_token: string, // session token for temporary credentials.
assume_role: string, // ARN of IAM role to assume.
session_name: string, // session name for role assumption.
external_id: string, // external ID for role assumption.
web_identity: record, // OIDC web identity token configuration.
}

See AWS Authentication for a description of every field, the default credential chain, web identity configuration, and local authentication with the AWS CLI.

Rotates to a new file after the current file exceeds this size in bytes. Because rotation only fires after the threshold is crossed, individual files may be slightly larger than max_size.

Defaults to 100M.

Rotates to a new file after the current file has been open for this duration. Rotation is measured from the time the file was first opened.

Defaults to 5min.

A list of fields used to partition events into separate files. For every distinct combination of partition-field values, a separate file (or group of rotated files) is written. The URL must contain a ** placeholder, which is replaced by the hive-style path field1=value1/field2=value2/….

Unlike to_hive, the partitioning fields are not stripped from the written events — they remain in each record.

Pipeline that transforms the incoming events into the byte stream that is written to each file. The pipeline must return bytes, so it must end with a writer such as write_ndjson, write_parquet, or write_csv, optionally followed by a compressor such as compress_gzip.

The subpipeline runs once per output file. When rotation or partitioning produces a new file, a new instance of the subpipeline is spawned for it.

to_s3 "s3://my-bucket/events/out_{uuid}.json" {
write_ndjson
}

Partition events by date into Parquet objects

Section titled “Partition events by date into Parquet objects”
to_s3 "s3://my-bucket/events/**/data_{uuid}.parquet",
partition_by=[year, month, day] {
write_parquet
}
// Produces objects under:
// s3://my-bucket/events/year=<year>/month=<month>/day=<day>/data_<uuid>.parquet
to_s3 "s3://my-bucket/data/out_{uuid}.json",
aws_iam={
access_key_id: "AKIAIOSFODNN7EXAMPLE",
secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
} {
write_ndjson
}
to_s3 "s3://my-bucket/data/out_{uuid}.json?endpoint_override=minio.example.com:9000&scheme=http" {
write_ndjson
}
to_s3 "s3://my-bucket/logs/events_{uuid}.json.gz", max_size=10M {
write_ndjson
compress_gzip
}

Last updated: