Skip to content

Media Transformation Architecture

How video-resizer-2 handles media transformation on Cloudflare Workers.

Single endpoint for video resize, frame extraction, spritesheet generation, and audio extraction. Three-tier transform routing: Media binding for R2 sources, cdn-cgi/media for remote sources, FFmpeg container for oversized/advanced transforms. Results stored persistently in R2 with edge cache on top. Zero memory buffering — all streams flow through without loading into Worker memory.


Three transform tiers + four transformation modes:

TierSourceSize LimitMethodLatencyWorker Memory
1R2≤100 MiB (configurable)env.MEDIA.input(stream)~2-10sStream only
2Remote/Fallback≤100 MiB (configurable)cdn-cgi/media URL fetch~3-15sZero
3aAny100-256 MiB or container-only paramsFFmpeg Container DO (sync)~30-120s (sync)Zero
3bAny>256 MiB or container-only paramsFFmpeg Container DO (via Queue)~60-300s (async)Zero

Transformation modes:

  • Video: Resize, compress, adjust quality/playback
  • Frame: Extract stills at timestamps (jpg/png)
  • Spritesheet: Grid previews for scrubbing UIs
  • Audio: Extract m4a audio tracks

Container-only features (fps, speed, rotate, crop, bitrate, h265/vp9/av1 codecs, duration >60s) route to the FFmpeg container regardless of file size. Sync container transforms run inline; async transforms (>256 MiB) are dispatched via Cloudflare Queue for deploy safety and automatic retry.


d2 diagram
v1 Patternv2 PatternWhy
Strategy per mode (5 files)Single transform handler, mode via paramsModes are just param combinations, not separate code paths
Reactive error handling (9402 -> container)Proactive size-based routing + reactive Cf-Resized parsingHEAD check + size thresholds for proactive routing; Cf-Resized header parsing for reactive fallback on CF error codes (9402, 9404, etc.)
5 singleton config managersSingle Zod 4 schema + KV hot-reloadOne source of truth, validated on upload
Akamai translation in middlewarePure function translateAkamaiParams()Testable, no side effects
KV chunked storage (5 MiB chunks)R2 persistent store + edge cacheNo chunking needed, R2 handles any size
__r2src self-referencing URLenv.MEDIA.input(stream) bindingDirect R2 stream, no HTTP subrequest
LayerScopeSpeedRange SupportPersistence
Edge cache (caches.default)Per data centerWorkers run before cache; same local store as CDNNative (206)Ephemeral
R2 persistent store (_transformed/)GlobalFastVia cache.matchPermanent until bust
KV version registryGlobalFastN/AManual bust

On every successful transform, the output flows sequentially with zero memory buffering:

  1. Transform output -> stream to R2 via FixedLengthStream
  2. R2 get -> stream to cache.put
  3. cache.match -> serve to client (native range request support)

Subsequent requests: edge cache HIT (fastest) or R2 HIT -> cache.put -> cache.match (cross-colo).

Every response includes:

HeaderValueNotes
X-R2-CacheHIT or MISSWhether result came from R2 persistent store
cf-cache-statusHIT, MISS, etc.CF edge cache status
X-Transform-Sourcebinding, cdn-cgi, containerWhich tier performed the transform
X-Source-Typer2, remote, fallbackWhich source provided the original
X-Cache-Keyvideo:path:w=1280:c=autoDeterministic cache key
X-Request-IDUUIDPer-request trace ID
X-Processing-Time-MsnumberTransform duration
X-DerivativenameResolved derivative
X-Resolved-Width/HeightnumberFinal dimensions
Viavideo-resizerLoop prevention
Cache-Tagcomma-separatedPurge-by-tag support
FeatureVideoFrameSpritesheetAudioContainer
Dimensions10-2000px10-2000pxRequired-10-2000px
Fit modesYesYesYes-Yes
Time0-10m0-10m0-10m0-10mUnlimited
Duration1s-60s-1s-60s1s-60sUnlimited
Format-jpg/pngJPEG onlym4a onlymp4/webm
Quality/compressionYes---Yes (CRF)
FPS/Speed/Rotate/CropNoNoNoNoYes
Bitrate controlNoNoNoNoYes
Input size limit100 MiB (binding, configurable)100 MiB (cdn-cgi, configurable)100 MiB100 MiB20 GB disk

Tier 1: Media Binding (R2 sources, ≤100 MiB)

Section titled “Tier 1: Media Binding (R2 sources, ≤100 MiB)”

Direct streaming from R2 into the Media Transformations binding. No HTTP subrequest, no video bytes in Worker memory.

R2 bucket.get(key) -> ReadableStream -> env.MEDIA.input(stream).transform(params).output()

Fallback: if the binding rejects the input (MediaError), falls back to container.

Tier 2: cdn-cgi/media (Remote sources, ≤100 MiB)

Section titled “Tier 2: cdn-cgi/media (Remote sources, ≤100 MiB)”

Constructs a cdn-cgi/media URL with transform params and lets the edge handle both fetch and transform. Zero Worker memory usage.

/cdn-cgi/media/width=1280,height=720,fit=contain/{sourceUrl}?v={version}

Fallback: if cdn-cgi returns 404/5xx, tries next source in priority order.

Tier 3: FFmpeg Container (>100 MiB or container-only params)

Section titled “Tier 3: FFmpeg Container (>100 MiB or container-only params)”

For transforms the binding/cdn-cgi can’t handle. Two sub-modes based on size:

  • Sync (100-256 MiB): dispatches directly to Container DO, streams result back inline
  • Async (>256 MiB): returns 202 immediately, dispatches via Queue, stores result in R2
Sync: -> dispatch to Container DO -> stream result -> R2 put -> cache.put -> serve
Async:
Request 1: -> 202 Processing (jobId + SSE URL, Retry-After: 10)
Queue: -> consumer dispatches to container DO
Container: -> downloads source -> ffmpeg -> stores in R2 (via outbound handler)
Request 2: -> R2 HIT -> cache.put -> cache.match -> 200 video/mp4

Container-only params that trigger this tier: fps, speed, rotate, crop, bitrate, h265/vp9/av1 codecs, duration >60s.


  • FFmpegContainer extends Container Durable Object with outbound handler
  • Instance key: ffmpeg:{origin}:{path}:{paramsHash} (FNV-1a hash ensures unique DO per transform)
  • Instance type: custom { vcpu: 4, memory_mib: 12288, disk_mb: 20000 } (max available)
  • sleepAfter: 15m, enableInternet: true
  • Container-side dedup: inflightJobs Map in server.mjs tracks running async transforms by sourceUrl|params key; duplicate /transform-url dispatches return 202 { dedup: true } instead of spawning another ffmpeg
  • Queue consumer dedup: R2 check before dispatch (idempotent completion)

Containers only intercept HTTP traffic (not HTTPS). FFmpegContainer.outbound intercepts all HTTP:

  • GET /internal/job-progress -> updates D1 status + percent progress
  • POST /internal/container-result -> stores transcoded output in R2, updates D1
  • GET /internal/r2-source -> serves raw R2 objects via binding (for R2-only sources)
  • GET (large files) -> proxy with source dedup (tee to R2 _source-cache/ + container)
  • Everything else -> fetch() with http->https upgrade

Source downloads use HTTPS directly (enableInternet=true, not intercepted).

Node.js 22 + ffmpeg in node:22-slim Docker image.

EndpointMethodDescription
/transformPOSTSync: stream source in, receive output
/transform-asyncPOSTAsync: stream source + callbackUrl, 202
/transform-urlPOSTAsync URL-based: container fetches source directly
/healthGETHealth check
  • Dynamic threads: os.availableParallelism() (up to 4 on max instance)
  • Fast seeking: -ss before -i
  • Even dimension enforcement: odd widths/heights rounded down for libx264
  • Source streaming: pipeline() to disk (no OOM on 725MB+)
  • Output streaming: createReadStream() + explicit Content-Length from stat()
  • Spritesheet: fps=N/duration,tile=COLSxROWS filter, imageCount defaults to 20, JPEG output
  • Progress reporting: stderr time=HH:MM:SS parsing → /internal/job-progress → D1 percent update → SSE

Note: av1 codec triggers container routing but the container currently encodes as h264 (no libaom/libsvtav1 installed). h265 (libx265) and vp9 (libvpx-vp9) are fully supported.

PresetCRFFFmpeg Preset
low28fast
medium23medium
high18medium

Container transforms are dispatched via Cloudflare Queue for durability (messages survive deploys, automatic retry with dead letter queue).

  1. Transform handler enqueues job to TRANSFORM_QUEUE + registers in D1 transform_jobs table
  2. Returns 202 JSON with jobId + SSE URL for real-time progress
  3. Queue consumer picks up job, checks R2 for existing result (idempotent dedup)
  4. If no result: dispatches to FFmpegContainer DO via /transform-url
  5. Container downloads source via HTTPS (or R2 via /internal/r2-source), streams to disk
  6. Container runs ffmpeg, reports progress via /internal/job-progress -> D1 percent update
  7. Container streams output to callback via http:// (outbound handler intercepts)
  8. Outbound handler stores in R2 (_transformed/{cacheKey}), updates D1
  9. Queue consumer retries with exponential backoff (120s, 240s, 480s… capped at 900s), finds R2 result, acks message
  10. Container-side dedup prevents duplicate ffmpeg processes if retry arrives while transform is still running
  11. Next client request: R2 HIT -> cache.put -> cache.match -> serve
  12. DLQ consumer marks jobs as 'failed' when all 10 retries exhausted
  13. Stuck jobs can be retried/deleted via POST /admin/jobs/retry (resets D1, cleans R2, re-enqueues)

Verified: 725MB .mov -> 31MB .mp4, served from edge cache in 0.1s with range requests.

The outbound handler caches remote source downloads in R2 (_source-cache/{path}). Uses body.tee() to stream to both the container and R2. Concurrent containers transforming the same 725MB source share one download instead of each downloading independently.

For spritesheets routed to the container (oversized sources), ffmpeg uses fps=1,tile=COLSxROWS filter. imageCount defaults to 20, grid layout via ceil(sqrt(N)) columns, output as JPEG.


ParamTypeRange/ValuesExample
widthint10-2000?width=1280
heightint10-2000?height=720
fitenumcontain, cover, scale-down?fit=cover
modeenumvideo, frame, spritesheet, audio?mode=frame
timestring0s-10m?time=5s
durationstring1s-60s (binding), unlimited (container)?duration=10s
audiobooltrue/false?audio=false
formatenumjpg, png (frame); m4a (audio)?format=png
filenamestringalphanumeric, max 120?filename=clip
derivativestringconfig key?derivative=tablet
qualityenumlow, medium, high, auto?quality=high
compressionenumlow, medium, high, auto?compression=low
fpsfloat>0 (container)?fps=24
speedfloat>0 (container)?speed=2
rotatefloatany (container)?rotate=90
cropstringgeometry (container)?crop=640:480:0:0
bitratestring(container)?bitrate=2M
dprfloat>0?dpr=2
imageCountint>0?imageCount=10
loopboolplayback hint header?loop=true
autoplayboolplayback hint header?autoplay=true
mutedboolplayback hint header?muted=true
preloadenumnone, metadata, auto?preload=auto
debuganyview for JSON diagnostics?debug=view

Full Akamai Image & Video Manager parameter translation. Explicit canonical params always win.

Akamai ParamCanonicalValue Translation
imwidthwidthDirect; triggers derivative matching
imheightheightDirect
impolicyderivativePolicy = derivative
imformatformath264->mp4; h265/vp9/av1->container
imdensitydprPixel density multiplier
imrefconsumedParsed for derivative matching context
im-viewwidthSets Sec-CH-Viewport-Width hint
im-viewheightSets Viewport-Height hint
im-densitySets Sec-CH-DPR hint
w, h, q, fwidth, height, quality, formatShorthands
obj-fitfitcrop->cover, fill->contain
start, durtime, durationShorthands
muteaudioInverted: mute=true -> audio=false

Named presets that bundle dimensions + quality + mode into a single parameter:

{
"tablet": {
"width": 1280,
"height": 720,
"fit": "contain",
"duration": "5m"
},
"mobile": { "width": 854, "height": 640, "fit": "contain", "duration": "5m" },
"thumbnail": {
"width": 640,
"height": 360,
"mode": "frame",
"format": "png",
"time": "0s"
}
}

Canonical invariant: derivative dimensions replace any explicit params. ?imwidth=1280 is used for derivative selection only, never for the actual transform or cache key.

When no explicit dimensions are provided, auto-sizing from client signals:

  1. Client Hints headers (Sec-CH-Viewport-Width, Sec-CH-DPR, Width)
  2. CF-Device-Type header (mobile/tablet/desktop)
  3. Breakpoint matching from config

Deterministic, built from resolved params (after derivative resolution):

{mode}:{path}[:w={width}][:h={height}][:mode-specific-params][:e={etag}][:v={version}]

Same derivative always produces the same key regardless of trigger: ?derivative=tablet, ?impolicy=tablet, and ?imwidth=1280 (resolved via responsive to tablet) all produce identical keys.

  • R2 sources: R2 object etag in cache key — automatic bust when source changes
  • Remote sources: KV-backed version number — manual bust via POST /admin/cache/bust
  • Purge by tag: Cache-Tag header with derivative, origin, mode tags for CF purge API
Transform output
|
v (FixedLengthStream, streaming)
R2 put (_transformed/{cacheKey})
|
v (R2 get, streaming)
cache.put (edge cache)
|
v (cache.match, native range support)
Client (200 or 206 with Content-Range)

No tee(), no arrayBuffer(), no memory buffering. Sequential streaming through R2 then cache.put. The final cache.match with the original client request (which may include a Range header) provides automatic 206 + Content-Range handling for video seeking — no manual byte math needed.


Origins configured as an array with regex matcher, capture groups, and prioritized sources:

{
"origins": [
{
"name": "standard",
"matcher": "^/([^.]+)\\.(mp4|webm|mov)",
"sources": [
{ "type": "remote", "priority": 0, "url": "https://videos.erfi.dev" },
{ "type": "r2", "priority": 1, "bucketBinding": "VIDEOS" }
],
"ttl": {
"ok": 86400,
"redirects": 300,
"clientError": 60,
"serverError": 10
}
}
]
}

Sources tried in priority order. If one fails (404, 5xx, timeout), falls through to next. Last resort: raw passthrough from any source.

TypeHow
aws-s3Presigned URLs via aws4fetch (cached in KV with auto-refresh)
bearerAuthorization: Bearer {token} from env var
headerCustom header name + value from env var

StrategyWhenWhat happens
Cf-Resized error parsingcdn-cgi returns 200 with err=XXXX in headerParse CF error code, route to appropriate recovery
Reactive container fallback (9402)cdn-cgi Cf-Resized: err=9402 (origin too large)Route to FFmpeg container, return 202
Source retry (9404/9407/9504)cdn-cgi Cf-Resized: err=9404/9407/9504Try next source in priority order
Duration limit retryBinding rejects durationExtract max from error, retry with capped duration
Alternative source retrySource 404/5xx HTTP statusTry next source in priority order
Binding → container fallbackBinding MediaErrorRe-fetch from R2, route to container
Raw passthroughAll transforms failServe untransformed source
202 ProcessingContainer async (>256 MiB) or reactive 9402Return JSON with jobId, SSE URL, Retry-After: 10

cdn-cgi/media may return HTTP 200 with an error embedded in the Cf-Resized response header (format: err=XXXX). Without parsing this header, the Worker would treat the response as a successful transform.

Known CF error codes:

CodeMeaningv2 Action
9401Invalid or missing transform optionsTry next source
9402Video too large or origin did not respondRoute to FFmpeg container (202)
9404Video not found at originTry next source
9406Non-HTTPS URL or URL has spaces/unescaped UnicodeTry next source
9407DNS lookup error for origin hostnameTry next source
9408Origin returned HTTP 4xx (access denied)Try next source
9412Origin returned non-video content (HTML/error page)Try next source
9419Non-HTTPS URL or URL has spaces/unescaped UnicodeTry next source
9504Origin unreachable (timeout/refused)Try next source
9509Origin returned HTTP 5xxTry next source
9517Internal CF transform errorTry next source
9523Internal CF transform errorTry next source

format=m4a without explicit mode=audio automatically switches to audio mode, clearing irrelevant params (width, height, fit). This maintains compatibility with v1 clients that relied on format-based mode inference.

All errors return structured JSON:

{
"error": {
"code": "NO_MATCHING_ORIGIN",
"message": "No origin matched: /path"
}
}

Five layers prevent duplicate work:

LayerScopeWhat it deduplicates
Edge cache (caches.default)Per-coloAll requests — cf-cache-status: HIT
R2 persistent cacheGlobalAll transform results across colos
RequestCoalescer (signal pattern)Per-isolateConcurrent requests in same isolate
Queue consumer R2 checkGlobalRe-dispatch after container already completed
Source cache (_source-cache/)GlobalMultiple containers downloading same source file

Auth-gated dashboard at /admin/dashboard (Astro + React + Tailwind v4):

  • Analytics tab: stat cards (total requests, success, errors, cache hit rate), latency metrics (avg, p50, p95), breakdown tables (by status, origin, derivative, transform source), recent errors table with admin noise filter toggle
  • Jobs tab: auto-discovers container transform jobs from D1, active jobs with SSE progress bars, status filter buttons (All/Active/Complete/Failed with counts), debounced search filter, expandable detail rows with param badges, retry/delete actions on individual jobs, bulk “Clear stale” for stuck jobs
  • Debug tab: test any URL with live param resolution, origin matching, response headers, timing, cache status

Auth: HMAC-SHA256 signed session cookie (HttpOnly, Secure, SameSite=Strict, 24h expiry). Login validates against CONFIG_API_TOKEN with timing-safe comparison.

All endpoints require Authorization: Bearer {CONFIG_API_TOKEN}.

EndpointMethodDescription
/admin/configGETRetrieve current config
/admin/configPOSTUpload new config (Zod 4 validated)
/admin/cache/bustPOSTBump cache version for a path
/admin/analyticsGETRequest summary (?hours=24)
/admin/analytics/errorsGETRecent errors (?hours=24&limit=50)
/admin/jobsGETList active/recent container jobs (?hours=24&filter=bunny&active=true)
/admin/jobs/retryPOSTRetry/delete/clear stuck jobs ({jobId}, {staleMinutes}, {jobId, delete: true})
/sse/job/:idGETSSE stream for real-time job progress (D1 polling)
/admin/dashboardGETDashboard UI (session auth)

Every request logged to D1 via waitUntil. Weekly cron drops + recreates for 7-day rolling window.


src/
index.ts # Hono app wiring (~95 lines)
middleware/ # via, config, passthrough, auth, error
handlers/ # admin, internal, transform, jobs (SSE), dashboard
config/ # Zod 4 schema, KV loader
params/ # Canonical params, Akamai translation, derivatives, responsive
transform/ # Media binding, cdn-cgi, FFmpeg container DO, job types
sources/ # Origin routing, auth, presigned URLs
cache/ # Cache key, version (KV), coalescing (signal pattern)
queue/ # Queue consumer + DLQ, D1 job registry
analytics/ # D1 middleware, aggregation queries, schema.sql (SSOT)
container/
Dockerfile # node:22-slim + ffmpeg
server.mjs # /transform, /transform-url, /health (with progress reporting)
dashboard/
src/components/ # Dashboard, AnalyticsTab, JobsTab, DebugTab, shared
scripts/
smoke.ts # 84 smoke tests with tail log capture
BindingTypeResource
MEDIAMediaMedia Transformations binding
VIDEOSR2Source videos + transform cache (_transformed/) + source cache (_source-cache/)
CONFIGKVWorker config (hot-reload, 5-min TTL)
CACHE_VERSIONSKVCache version management
ANALYTICSD1Request analytics + job registry (transform_log + transform_jobs)
FFMPEG_CONTAINERContainer DOFFmpeg container instances (4 vCPU, 12GB, 20GB disk)
TRANSFORM_QUEUEQueueDurable job dispatch (retry + DLQ)
ASSETSStatic AssetsDashboard UI
Terminal window
npm run test:run # 186 unit tests (vitest + workers pool)
npm run test:e2e # 92 E2E tests against live (vitest, 60s timeout)
npx tsx scripts/smoke.ts # 84 smoke tests against live
npx tsx scripts/smoke.ts --container # + container async polling (~10 min)
npm run test:browser # 22 Playwright browser tests
npm run check # TypeScript strict
Terminal window
npm run deploy # deploy worker + container + dashboard
npm run dashboard:build # rebuild Astro dashboard
npm run dev # local dev (requires Docker for containers)

Examples use two sources: rocky.mp4 (40MB, instant via binding/cdn-cgi) and Big Buck Bunny (725MB, container path).

Big Buck Bunny 320px (container)

?imwidth=320 — 725MB → 38MB via FFmpeg container → R2 cache

Rocky 640x360 (cdn-cgi)

?width=640&height=360&fit=cover — via cdn-cgi/media

Big Buck Bunny PNG at 30s

Frame at 30s?mode=frame&time=30s&width=640&format=png — from R2 cache

Rocky JPEG at 4s

Frame at 4s?mode=frame&time=4s&width=640&format=jpg — via binding

Big Buck Bunny 800x600

Spritesheet?mode=spritesheet&width=800&height=600 — from R2 cache

Rocky 640x480

Spritesheet?mode=spritesheet&width=640&height=480 — via cdn-cgi/media

Rocky audio (30s)

?mode=audio&duration=30s

Rocky auto m4a switch

?format=m4a&duration=30s — auto-switches to audio mode

Rocky tablet (1280x720)

?derivative=tablet (or ?impolicy=tablet) — instant via cdn-cgi

Rocky thumbnail (frame)

Thumbnail?derivative=thumbnail — 640x360 PNG frame via binding
Terminal window
# Resize (Akamai params)
curl -I "https://videos.erfi.io/big_buck_bunny_1080p.mov?imwidth=320"
curl -I "https://videos.erfi.io/big_buck_bunny_1080p.mov?w=640&h=360&obj-fit=crop"
# Frame + audio
curl -I "https://videos.erfi.io/big_buck_bunny_1080p.mov?mode=frame&time=30s&width=640&format=png"
curl -I "https://videos.erfi.io/big_buck_bunny_1080p.mov?format=m4a&duration=30s"
# Clip with time offset
curl -I "https://videos.erfi.io/big_buck_bunny_1080p.mov?w=320&start=2m&dur=10s&mute=true"
# Debug diagnostics
curl -s "https://videos.erfi.io/big_buck_bunny_1080p.mov?derivative=tablet&debug=view" | jq .diagnostics
# Cache + range headers
curl -I "https://videos.erfi.io/big_buck_bunny_1080p.mov?imwidth=320"
# -> X-R2-Cache: HIT, cf-cache-status: HIT
curl -I -H "Range: bytes=0-1023" "https://videos.erfi.io/big_buck_bunny_1080p.mov?imwidth=320"
# -> 206 Partial Content, Content-Range: bytes 0-1023/38924753
# Container async (uncached transform)
curl -I "https://videos.erfi.io/big_buck_bunny_1080p.mov?imwidth=480"
# -> 202 Processing (Retry-After: 10) or 200 video/mp4 (if cached)

d2 diagram d2 diagram d2 diagram d2 diagram