S3 Cost Guardrails Runbook
Operate S3 and R2 sync safely with request budgets, inventory imports, event ingestion, dry-runs, and reconcile windows.
S3 Cost Guardrails Runbook
Vaulthalla S3 vaults apply a non-unlimited request budget by default. New S3
sync policies start with the balanced preset unless an operator changes the
policy with vh vault sync set.
Budget Presets
Use presets for predictable defaults, then override individual fields only when the workload needs it:
1vh vault sync set <vault> --s3-budget-preset conservative2vh vault sync set <vault> --s3-budget-preset balanced3vh vault sync set <vault> --s3-budget-preset bulk4vh vault sync set <vault> --s3-budget-preset unlimited
Individual flags override the selected preset:
1vh vault sync set <vault> --s3-budget-preset conservative --s3-budget-get 25
Recommended starting points:
conservative: small buckets, exploratory imports, or operators validating a new policy.balanced: default for normal small-to-medium buckets.bulk: planned high-volume imports or maintenance windows.unlimited: only for controlled one-off operations where S3 cost is already bounded externally.
Small-Bucket Setup
- Create the S3 vault normally. The
balancedbudget is applied to new S3 sync policies by default. - Run
vh vault sync info <vault>and confirm the S3 request budget. - Trigger a normal sync with
vh vault sync <vault>. - If the event stalls with a budget reason, raise only the exhausted budget
field instead of switching to
unlimited. - Leave
max_remote_index_ageat its default unless the bucket is updated exclusively through Vaulthalla or event ingestion is known to be reliable.
Large-Bucket Setup
Avoid starting with a full ListObjectsV2 scan on large buckets.
- Configure
conservativeor a custom LIST budget before the first run. - Import S3 Inventory first when available.
- Enable event ingestion to keep the remote index warm.
- Use
vh vault sync dry-run <vault>to inspect planned request pressure from the local remote index. - Run reconcile only during a maintenance window with either an explicit LIST
budget or
--allow-list-scan. - Set a remote-index freshness window that matches the event/inventory cadence:
1vh vault sync set <vault> --max-remote-index-age 24h
Import S3 Inventory First
Use S3 Inventory for the first remote index when the bucket may contain many objects:
1vh vault sync inventory <vault> --file inventory.csv
For CSVs without a header:
1vh vault sync inventory <vault> --file inventory.csv --schema bucket,key,size,last_modified_date,etag,storage_class
Inventory import indexes object metadata and publishes the Vaulthalla manifest without downloading object bodies.
Event-Ingestion Path
Use S3 event notifications to keep the index current after the initial import:
1vh vault sync events <vault> --file s3-events.json
ObjectCreated events upsert index rows. ObjectRemoved events delete index rows.
When S3 provides object sequencers, older events do not overwrite newer index
state. Manifest objects under .vaulthalla/ are ignored.
vh vault sync info <vault> reports the remote index source, indexed time,
manifest ETag, manifest generated time, object count, and fresh/stale status.
If the local index is older than max_remote_index_age and the manifest cannot
be refreshed, sync stalls instead of silently trusting stale data.
Reconcile
vh vault sync reconcile performs an explicit ListObjectsV2 pass. It requires
one of:
- a configured
--s3-budget-listvalue on the sync policy, or - the explicit
--allow-list-scanflag.
When a prior index exists, the command prints a rough pre-run estimate of one LIST request per 1,000 indexed objects. Use this estimate to choose a LIST budget before running reconcile on a large bucket.
Dry Run
Use dry-run before changing a budget or launching a large sync:
1vh vault sync dry-run <vault>
By default, dry-run is local-index-only: it reads local database state and does not call S3, refresh manifest state, or mutate the remote object index. It builds the next sync plan from the local files and remote index, and prints estimated LIST, HEAD, GET, PUT, COPY, DELETE, body-download bytes, upload bytes, cache/index-only objects, and archive-tier body downloads skipped.
Use an explicit refresh only when the operator is allowed to trigger sync work:
1vh vault sync dry-run <vault> --refresh-index2vh vault sync dry-run <vault> --refresh-remote-index
--refresh-index may use S3 HEAD/GET requests to refresh the remote index
manifest before planning and requires sync trigger permission. Without a local
remote index, or when the local index is stale, default dry-run refuses to invent
a plan; import Inventory, ingest events, reconcile, or run dry-run with
--refresh-index.
Failure Modes
When a budget is exceeded, the sync event is marked stalled and
stall_reason contains the budget reason. Budget exhaustion should not appear
as a generic sync error unless another exception occurs after the budget
failure.
Downloaded-byte budgets are checked before planned body downloads and during body transfer callbacks. Cache remote-only indexing is tracked as index work and does not count as downloaded traffic.
Manifest publishes use conditional S3 PUTs when an ETag is known. If another writer wins the race, Vaulthalla reloads the latest manifest, replays the local index mutations, and retries a bounded number of times. Repeated conflicts stall the sync with a manifest conflict reason instead of silently losing updates.
