# Benchmarking Playbook (Render Pipeline / Node Streams) This is the practical workflow for benchmarking and profiling render pipeline changes in this repo. Primary tools: - `pnpm bench:render-pipeline` - `pnpm bench:render-pipeline:analyze` ## 1. Build-first baseline Always rebuild `next` before benchmark runs when framework source changed. ```bash pnpm --filter=next build ``` ## 2. End-to-end benchmark (full app render path) This measures the full request path (`renderToHTMLOrFlight`) through `bench/next-minimal-server`. In `scenario=full` and `scenario=all`, `--capture-cpu` defaults to `true`. Node streams only: ```bash pnpm bench:render-pipeline \ --scenario=full \ --stream-mode=node \ --build-full=true \ --json-out=bench/render-pipeline/artifacts//results.json \ --artifact-dir=bench/render-pipeline/artifacts/ ``` Web vs Node comparison: ```bash pnpm bench:render-pipeline \ --scenario=full \ --stream-mode=both \ --build-full=true \ --json-out=bench/render-pipeline/artifacts//results.json \ --artifact-dir=bench/render-pipeline/artifacts/ ``` ## 3. Route-focused stress runs Use this when targeting streaming-heavy behavior only. ```bash pnpm bench:render-pipeline \ --scenario=full \ --stream-mode=node \ --build-full=true \ --routes=/streaming/heavy,/streaming/chunkstorm,/streaming/wide \ --warmup-requests=10 \ --serial-requests=40 \ --load-requests=400 \ --load-concurrency=40 \ --json-out=bench/render-pipeline/artifacts//results.json \ --artifact-dir=bench/render-pipeline/artifacts/ ``` Default stress routes currently include: - `/` - `/streaming/light` - `/streaming/medium` - `/streaming/heavy` - `/streaming/chunkstorm` - `/streaming/wide` - `/streaming/bulk` ## 4. Isolate helper-level costs (micro scenario) Use this to quickly test helper-level changes before full runs. ```bash pnpm bench:render-pipeline \ --scenario=micro \ --iterations=300 \ --warmup=30 ``` Micro benchmark output includes cases for: - `teeNodeReadable` - `createBufferedTransformNode` - `createInlinedDataNodeStream` - `continueStaticPrerender` / `continueDynamicPrerender` / `continueDynamicHTMLResume` Flight payload mode toggles: ```bash # Binary-heavy flight chunks pnpm bench:render-pipeline --scenario=micro --binary-flight=true # UTF-8-heavy flight chunks pnpm bench:render-pipeline --scenario=micro --binary-flight=false ``` Stress payload shape: ```bash pnpm bench:render-pipeline \ --scenario=micro \ --iterations=300 \ --warmup=30 \ --flight-chunks=128 \ --flight-chunk-bytes=8192 \ --html-chunks=128 \ --html-chunk-bytes=32768 ``` ## 5. Capture CPU profiles and traces ```bash pnpm bench:render-pipeline \ --scenario=full \ --stream-mode=node \ --build-full=true \ --capture-trace=true \ --capture-next-trace=true \ --json-out=bench/render-pipeline/artifacts//results.json \ --artifact-dir=bench/render-pipeline/artifacts/ ``` Artifacts are written under: - `bench/render-pipeline/artifacts//node/node.cpuprofile` - `bench/render-pipeline/artifacts//node/node-trace-*.json` - `bench/render-pipeline/artifacts//node/next-runtime-trace.log` - `bench/render-pipeline/artifacts//results.json` ## 6. Analyze hotspots ```bash pnpm bench:render-pipeline:analyze \ --artifact-dir=bench/render-pipeline/artifacts/ \ --top=20 ``` Filter only the Node-stream-relevant hotspots: ```bash pnpm bench:render-pipeline:analyze --artifact-dir=bench/render-pipeline/artifacts/ --top=20 > /tmp/analyze.txt rg "use-flight-response|encodeFlightDataChunkNode|node-stream-tee|flushPending|node-stream-helpers|htmlEscapeJsonString" /tmp/analyze.txt ``` ## 7. Compare two runs quickly ```bash node - <<'NODE' const fs = require('fs') const [baseRun, candRun] = process.argv.slice(2) const load = (name) => JSON.parse( fs.readFileSync(`bench/render-pipeline/artifacts/${name}/results.json`, 'utf8') ).fullResults[0].routeResults const base = load(baseRun) const cand = load(candRun) for (const b of base) { const c = cand.find((x) => x.route === b.route && x.phase === b.phase) if (!c) continue const throughputDelta = ((c.throughputRps - b.throughputRps) / b.throughputRps) * 100 const p95Delta = ((b.latency.p95 - c.latency.p95) / b.latency.p95) * 100 console.log( `${b.route} ${b.phase} throughput ${throughputDelta >= 0 ? '+' : ''}${throughputDelta.toFixed(2)}% p95 ${p95Delta >= 0 ? '+' : ''}${p95Delta.toFixed(2)}%` ) } NODE investigation-10-boundary-data investigation-17-profile-current ``` ## 8. Noise control rules Use these rules to keep measurements trustworthy: - Build first (`pnpm --filter=next build`) after framework source changes. - Compare runs with identical route sets and request knobs. - Repeat suspicious runs at least once (especially if one route regresses while others improve). - Use dedicated artifact directories per run. - Prefer relative deltas across multiple runs over one-off absolute numbers. ## 9. Suggested iteration loop 1. Change one thing. 2. Build. 3. Run `scenario=micro` for quick signal. 4. Run focused full stress (`heavy/chunkstorm/wide`) with CPU profile. 5. Analyze hotspots and compare deltas. 6. Keep only changes that hold up across repeat runs.