Network & Infrastructure2026-04-07

Amazon S3 Tables Complete Guide — Build Iceberg-Native Analytics Platform on S3 [2026]

Amazon S3 Tables (announced Dec 2024) is an Apache Iceberg-native table storage delivering up to 3x query throughput and 10x TPS over self-managed tables. This guide covers table buckets, supported engines, 2025 feature updates, pricing, cost considerations, and setup procedures.

AWS S3 S3 Tables Apache Iceberg データ分析データレイク

What is Amazon S3 Tables?

Amazon S3 Tables is an Apache Iceberg-native table storage delivering up to 3x query throughput and up to 10x transactions per second (TPS) compared to self-managed Iceberg tables. Announced by AWS in December 2024, it introduces a new bucket type called "table buckets" that natively manage Iceberg metadata, snapshots, and automated maintenance.

The key innovation is that AWS fully manages the operational overhead that data engineers previously handled manually — small-file compaction, orphan file cleanup, and snapshot expiration. This lets teams focus on analytics rather than infrastructure maintenance.

Table Buckets vs General-Purpose Buckets

Feature	Table Bucket	General-Purpose Bucket
Primary use	Structured analytics data	General object storage
Iceberg support	Native	Manual configuration
Auto maintenance	Compaction, snapshots	None
Query throughput	Up to 3x (vs self-managed)	Standard
TPS	Up to 10x (vs self-managed)	Standard
Metadata management	AWS Glue Catalog integrated	Manual
Pricing	Storage + maintenance fees	Storage + request fees

S3 Tables Architecture

Loading diagram...

Automatic Maintenance Features

S3 Tables automates three critical maintenance operations:

Compaction: Merges many small Parquet files into larger, optimally-sized files to maintain query performance. Previously required scheduled Spark jobs.

Snapshot Management: Automatically expires old Iceberg snapshots beyond the configured retention window, preventing unbounded metadata growth.

Orphan File Cleanup: Periodically deletes data files no longer referenced by any table snapshot, preventing storage cost bloat over time.

Supported Query Engines

Because S3 Tables is built on the Apache Iceberg open standard, it works with all major analytics engines:

Engine	Type	Primary Use Case
Amazon Athena	AWS Managed	Serverless SQL analytics
Amazon Redshift	AWS Managed	DWH integration
Amazon EMR	AWS Managed	Large-scale batch processing
Apache Spark	Open Source	ETL and ML pipelines
Apache Flink	Open Source	Streaming analytics
Trino	Open Source	Interactive SQL
DuckDB	Embedded OLAP	Local analytics and prototyping
PyIceberg	Python Library	Python workflow integration

2025 re:Invent Feature Additions

AWS announced several major enhancements at re:Invent 2025:

Intelligent-Tiering Support: Automatically moves data between storage tiers based on access patterns, enabling up to 80% cost reduction on infrequently accessed historical data.

Cross-Region and Cross-Account Replication: Replicate tables across regions for disaster recovery, or share data across AWS accounts at the table level.

Apache Iceberg V3 Support: Includes improved row-level deletes and enhanced MVCC (multi-version concurrency control) for high-concurrency workloads.

S3 Storage Lens Metrics Export: Directly export table bucket access patterns and cost analytics to Storage Lens dashboards.

Pricing Overview

S3 Tables pricing has four main components (US East reference prices):

Component	Approximate Price	Notes
Storage	~$0.023 per GB/month	Can be reduced with Intelligent-Tiering
PUT requests	~$0.005 per 1,000	Data write operations
GET requests	~$0.0004 per 1,000	Data read operations
Maintenance	Based on data processed	Compaction and snapshot deletion

Always verify current pricing on the official AWS pricing page, as rates vary by region.

Cost Warning — Managed Maintenance Costs

An analysis by Onehouse.ai (an Apache Hudi spinoff) found that S3 Tables automatic compaction costs can run up to 20x higher than expected for write-heavy tables. Key mitigation strategies include:

- Run cost modeling in AWS Cost Explorer before production migration
- Adjust compaction frequency thresholds away from defaults
- Combine with Intelligent-Tiering to offset storage costs
- Set budget alerts and review S3 Storage Lens reports monthly

Setup Guide — From Table Bucket to Athena Query

Step 1: Create a Table Bucket

bash

aws s3tables create-table-bucket \
  --name my-analytics-table-bucket \
  --region us-east-1

Step 2: Define a Table with boto3

python

import boto3

client = boto3.client('s3tables', region_name='us-east-1')

response = client.create_table(
    tableBucketARN='arn:aws:s3tables:us-east-1:123456789012:bucket/my-analytics-table-bucket',
    namespace='analytics',
    name='sales_events',
    format='ICEBERG'
)
print(response)

Step 3: Run SQL in Athena

sql

CREATE TABLE sales_events (
  event_id    BIGINT,
  user_id     BIGINT,
  product_id  STRING,
  amount      DOUBLE,
  event_time  TIMESTAMP
)
LOCATION 's3tables://my-analytics-table-bucket/analytics/sales_events'
TBLPROPERTIES ('table_type'='ICEBERG');

INSERT INTO sales_events VALUES
  (1, 101, 'P-001', 9800.0, TIMESTAMP '2026-04-01 10:00:00');

SELECT product_id, SUM(amount) AS total
FROM sales_events
WHERE event_time >= TIMESTAMP '2026-04-01 00:00:00'
GROUP BY product_id
ORDER BY total DESC;

Data Lakehouse Use Cases

S3 Tables excels in three core scenarios:

Real-Time Analytics: Combine Apache Flink for stream ingestion with S3 Tables for durable Iceberg storage, then query with Athena for near-real-time dashboards.

ML Feature Store: Use PyIceberg to manage versioned feature datasets with time-travel capabilities, enabling reproducible model training.

Compliance-Ready Data Management: Row-level deletes (GDPR/right to erasure) combined with snapshot-based audit trails provide a compliant data management foundation.

Migrating from Existing Data Lakes

To migrate from S3 + Glue Data Catalog to S3 Tables:

1. Export Glue Definitions: Export existing table schemas from Glue Catalog as JSON
2. Create Table Buckets: Provision table buckets for production and staging
3. Convert to Iceberg: Run ALTER TABLE SET TBLPROPERTIES ('table_type'='ICEBERG')
4. Copy Data: Use AWS Glue ETL to copy Parquet data into the table bucket
5. Validate: Verify row counts and aggregate values match before switching traffic
6. Decommission: Archive or delete old general-purpose buckets after validation

Best Practices

Partition Design: Use hierarchical time partitioning (year/month/day) for timestamp columns. Avoid over-partitioning as it worsens small-file problems.

Compression: Default to Parquet + Zstd compression with target file sizes of 128 MB to 512 MB at write time.

Maintenance Frequency: High-write tables benefit from weekly compaction; read-only tables need only monthly runs. Carefully tune frequency to control costs.

Intelligent-Tiering: Configure policies to move data inactive for 90+ days to Deep Archive automatically for up to 80% storage cost savings.

Frequently Asked Questions

Q1. Can I use S3 Tables with engines other than Athena?
Yes. S3 Tables is built on Apache Iceberg, so it works with Spark, Trino, Flink, DuckDB, PyIceberg, Redshift, and EMR out of the box.

Q2. How do I migrate existing S3 data to S3 Tables?
Use AWS Glue ETL to copy Parquet files into the table bucket and redefine them as Iceberg tables. The process is non-destructive and allows parallel validation.

Q3. Do S3 lifecycle policies apply to table buckets?
No. Table buckets use Iceberg snapshot retention policies instead of standard S3 lifecycle rules.

Q4. How do I control compaction costs?
Raise the minimum file size threshold for compaction triggers, reduce frequency, and ensure write workflows produce files close to 256 MB or larger.

Q5. What is the latency for cross-region replication?
Same as standard S3 cross-region replication — typically seconds to minutes. Not suitable for real-time synchronization; primarily for DR use.

Q6. What changed with Apache Iceberg V3 support?
V3 makes row-level deletes dramatically more efficient, simplifying GDPR compliance. It also improves MVCC for high-concurrency write workloads.

How Oflight Can Help

Building an S3 Tables data lakehouse, migrating from an existing data lake, or optimizing your Iceberg architecture? Oflight provides end-to-end data platform design, implementation, and cost optimization services leveraging the latest AWS capabilities. Visit our Network & Infrastructure Services page to learn more.

Feel free to contact us