Amazon S3 Tables Complete Guide — Build Iceberg-Native Analytics Platform on S3 [2026]
Amazon S3 Tables (announced Dec 2024) is an Apache Iceberg-native table storage delivering up to 3x query throughput and 10x TPS over self-managed tables. This guide covers table buckets, supported engines, 2025 feature updates, pricing, cost considerations, and setup procedures.
What is Amazon S3 Tables?
Amazon S3 Tables is an Apache Iceberg-native table storage delivering up to 3x query throughput and up to 10x transactions per second (TPS) compared to self-managed Iceberg tables. Announced by AWS in December 2024, it introduces a new bucket type called "table buckets" that natively manage Iceberg metadata, snapshots, and automated maintenance.
The key innovation is that AWS fully manages the operational overhead that data engineers previously handled manually — small-file compaction, orphan file cleanup, and snapshot expiration. This lets teams focus on analytics rather than infrastructure maintenance.
Table Buckets vs General-Purpose Buckets
| Feature | Table Bucket | General-Purpose Bucket |
|---|---|---|
| Primary use | Structured analytics data | General object storage |
| Iceberg support | Native | Manual configuration |
| Auto maintenance | Compaction, snapshots | None |
| Query throughput | Up to 3x (vs self-managed) | Standard |
| TPS | Up to 10x (vs self-managed) | Standard |
| Metadata management | AWS Glue Catalog integrated | Manual |
| Pricing | Storage + maintenance fees | Storage + request fees |
S3 Tables Architecture
Automatic Maintenance Features
S3 Tables automates three critical maintenance operations: Compaction: Merges many small Parquet files into larger, optimally-sized files to maintain query performance. Previously required scheduled Spark jobs. Snapshot Management: Automatically expires old Iceberg snapshots beyond the configured retention window, preventing unbounded metadata growth. Orphan File Cleanup: Periodically deletes data files no longer referenced by any table snapshot, preventing storage cost bloat over time.
Supported Query Engines
Because S3 Tables is built on the Apache Iceberg open standard, it works with all major analytics engines:
| Engine | Type | Primary Use Case |
|---|---|---|
| Amazon Athena | AWS Managed | Serverless SQL analytics |
| Amazon Redshift | AWS Managed | DWH integration |
| Amazon EMR | AWS Managed | Large-scale batch processing |
| Apache Spark | Open Source | ETL and ML pipelines |
| Apache Flink | Open Source | Streaming analytics |
| Trino | Open Source | Interactive SQL |
| DuckDB | Embedded OLAP | Local analytics and prototyping |
| PyIceberg | Python Library | Python workflow integration |
2025 re:Invent Feature Additions
AWS announced several major enhancements at re:Invent 2025: Intelligent-Tiering Support: Automatically moves data between storage tiers based on access patterns, enabling up to 80% cost reduction on infrequently accessed historical data. Cross-Region and Cross-Account Replication: Replicate tables across regions for disaster recovery, or share data across AWS accounts at the table level. Apache Iceberg V3 Support: Includes improved row-level deletes and enhanced MVCC (multi-version concurrency control) for high-concurrency workloads. S3 Storage Lens Metrics Export: Directly export table bucket access patterns and cost analytics to Storage Lens dashboards.
Pricing Overview
S3 Tables pricing has four main components (US East reference prices):
| Component | Approximate Price | Notes |
|---|---|---|
| Storage | ~$0.023 per GB/month | Can be reduced with Intelligent-Tiering |
| PUT requests | ~$0.005 per 1,000 | Data write operations |
| GET requests | ~$0.0004 per 1,000 | Data read operations |
| Maintenance | Based on data processed | Compaction and snapshot deletion |
Always verify current pricing on the official AWS pricing page, as rates vary by region.
Cost Warning — Managed Maintenance Costs
An analysis by Onehouse.ai (an Apache Hudi spinoff) found that S3 Tables automatic compaction costs can run up to 20x higher than expected for write-heavy tables. Key mitigation strategies include: - Run cost modeling in AWS Cost Explorer before production migration - Adjust compaction frequency thresholds away from defaults - Combine with Intelligent-Tiering to offset storage costs - Set budget alerts and review S3 Storage Lens reports monthly
Setup Guide — From Table Bucket to Athena Query
Step 1: Create a Table Bucket
aws s3tables create-table-bucket \
--name my-analytics-table-bucket \
--region us-east-1Step 2: Define a Table with boto3
import boto3
client = boto3.client('s3tables', region_name='us-east-1')
response = client.create_table(
tableBucketARN='arn:aws:s3tables:us-east-1:123456789012:bucket/my-analytics-table-bucket',
namespace='analytics',
name='sales_events',
format='ICEBERG'
)
print(response)Step 3: Run SQL in Athena
CREATE TABLE sales_events (
event_id BIGINT,
user_id BIGINT,
product_id STRING,
amount DOUBLE,
event_time TIMESTAMP
)
LOCATION 's3tables://my-analytics-table-bucket/analytics/sales_events'
TBLPROPERTIES ('table_type'='ICEBERG');
INSERT INTO sales_events VALUES
(1, 101, 'P-001', 9800.0, TIMESTAMP '2026-04-01 10:00:00');
SELECT product_id, SUM(amount) AS total
FROM sales_events
WHERE event_time >= TIMESTAMP '2026-04-01 00:00:00'
GROUP BY product_id
ORDER BY total DESC;Data Lakehouse Use Cases
S3 Tables excels in three core scenarios: Real-Time Analytics: Combine Apache Flink for stream ingestion with S3 Tables for durable Iceberg storage, then query with Athena for near-real-time dashboards. ML Feature Store: Use PyIceberg to manage versioned feature datasets with time-travel capabilities, enabling reproducible model training. Compliance-Ready Data Management: Row-level deletes (GDPR/right to erasure) combined with snapshot-based audit trails provide a compliant data management foundation.
Migrating from Existing Data Lakes
To migrate from S3 + Glue Data Catalog to S3 Tables: 1. Export Glue Definitions: Export existing table schemas from Glue Catalog as JSON 2. Create Table Buckets: Provision table buckets for production and staging 3. Convert to Iceberg: Run `ALTER TABLE SET TBLPROPERTIES ('table_type'='ICEBERG')` 4. Copy Data: Use AWS Glue ETL to copy Parquet data into the table bucket 5. Validate: Verify row counts and aggregate values match before switching traffic 6. Decommission: Archive or delete old general-purpose buckets after validation
Best Practices
Partition Design: Use hierarchical time partitioning (year/month/day) for timestamp columns. Avoid over-partitioning as it worsens small-file problems. Compression: Default to Parquet + Zstd compression with target file sizes of 128 MB to 512 MB at write time. Maintenance Frequency: High-write tables benefit from weekly compaction; read-only tables need only monthly runs. Carefully tune frequency to control costs. Intelligent-Tiering: Configure policies to move data inactive for 90+ days to Deep Archive automatically for up to 80% storage cost savings.
Frequently Asked Questions
Q1. Can I use S3 Tables with engines other than Athena? Yes. S3 Tables is built on Apache Iceberg, so it works with Spark, Trino, Flink, DuckDB, PyIceberg, Redshift, and EMR out of the box. Q2. How do I migrate existing S3 data to S3 Tables? Use AWS Glue ETL to copy Parquet files into the table bucket and redefine them as Iceberg tables. The process is non-destructive and allows parallel validation. Q3. Do S3 lifecycle policies apply to table buckets? No. Table buckets use Iceberg snapshot retention policies instead of standard S3 lifecycle rules. Q4. How do I control compaction costs? Raise the minimum file size threshold for compaction triggers, reduce frequency, and ensure write workflows produce files close to 256 MB or larger. Q5. What is the latency for cross-region replication? Same as standard S3 cross-region replication — typically seconds to minutes. Not suitable for real-time synchronization; primarily for DR use. Q6. What changed with Apache Iceberg V3 support? V3 makes row-level deletes dramatically more efficient, simplifying GDPR compliance. It also improves MVCC for high-concurrency write workloads.
How Oflight Can Help
Building an S3 Tables data lakehouse, migrating from an existing data lake, or optimizing your Iceberg architecture? Oflight provides end-to-end data platform design, implementation, and cost optimization services leveraging the latest AWS capabilities. Visit our Network & Infrastructure Services page to learn more.
Feel free to contact us
Contact Us