← Back to Tutorials
Intermediate~14 min referenceUE 5.5 / 5.6

Performance Triage Decision Trees

Three flowcharts you can follow at 2 a.m. when QA reports a regression. Each starts with the symptom you can see in stat unit or a crash log, branches to the right diagnostic, and ends with a one-line fix or a tutorial deep-dive link.

Use this with PerfGuard CI tags PerfGuard regression reports tag failures with gpu-bound:lumen, cpu-bound:tick, memory:texture-pool — click through directly to the matching tree branch.

Tree 1: GPU Bound

Entry: stat unit shows GPU time exceeds frame budget (>16.6 ms at 60 Hz, >11.1 ms at 90 Hz, >8.3 ms at 120 Hz). Confirm: bump r.ScreenPercentage up — if GPU time scales linearly, you are genuinely GPU-bound. If unchanged, jump to Tree 2.

Step 1 Identify the dominant pass

Run stat gpu for live text breakdown, or ProfileGPU (Ctrl+Shift+,) for a single-frame waterfall. UE 5.6 GPU Profiler 2.0 splits Graphics vs Compute queues. Look for the largest cost in: Lumen*, Nanite*, ShadowDepths / VirtualShadowMaps, Translucency, PostProcessing, Reflections, Niagara*, BasePass.

Step 2 Branch by subsystem

Lumen dominant (Lumen.SceneLighting, Lumen.ScreenProbeGather, Lumen.ReflectionsHWRT):
  • HW vs SW raytracing decision (5.6 HWRT 2× faster on consoles via frustum-driven surface cache; or SWRT for AMD — ~1.2 ms saved per GPUOpen).
  • r.Lumen.ScreenProbeGather.IntegrateDownsampleFactor=2 — 5.6 default 3× faster integrate.
  • r.Lumen.ScreenProbeGather.SpatialFilterNumPasses 3 → 1 or 2.
  • r.Lumen.ScreenProbeGather.StochasticInterpolation 1 — ~30% gain on RDNA.
  • Deep dive: Lumen Performance
Nanite dominant (Nanite.BasePass, Nanite.CullRasterize):
  • Open NaniteVisualize MaterialComplexity to find heavy materials.
  • Tune r.Nanite.MinPixelsPerEdgeHW (32 on high-end, lower on weaker GPUs).
  • Audit non-Nanite static meshes that should be Nanite.
  • Deep dive: Nanite Performance
VSM dominant (ShadowDepths, VirtualShadowMapCacheUpdate):
  • Toggle r.Shadow.Virtual.UseReceiverMask 1 (off by default in 5.6).
  • Drop r.Shadow.Virtual.SMRT.SamplesPerRayLocal from 8 toward 4.
  • Audit movable lights with high resolution and WPO casters that bust the cache.
  • Deep dive: Virtual Shadow Maps
Reflections dominant:
  • Reduce r.Lumen.Reflections.DownsampleFactor (1 → 2).
  • Lower r.Lumen.Reflections.MaxRoughnessToTrace.
  • Prune small reflection captures.
  • Deep dive: Lumen Reflections
Translucency dominant:
  • Open Optimization Viewmodes → Quad Overdraw and Shader Complexity.
  • Red overdraw = fill-rate from particles/glass/foliage.
  • Switch large translucent VFX to GPU sprites; prune overlapping cards.
  • Deep dive: Translucency Overdraw + Niagara Performance
Niagara dominant:
  • Use fx.Niagara.LogParticleCounts 1 and FX Performance window.
  • Convert CPU emitters bottlenecked on tick to GPUCompute.
  • Cull by significance; raise distance-based detail cuts.
  • Deep dive: Niagara Performance
PostProcessing dominant:
  • Check Bloom, DOF, Motion Blur, AutoExposure cost.
  • Drop r.TSR.History.ScreenPercentage from 200 to 100 at Epic AA (1.2 ms at 4K).
  • Switch upscaler tier: TSR → FSR3/FSR2 if 1–2 ms is needed.
  • Deep dive: Upscaler Tuning

Step 3 Verify

Re-run stat unit and stat gpu. Still bound? Repeat from Step 1 — the second-largest pass is now the new bottleneck.

Tree 2: CPU Bound

Entry: stat unit shows Game or Render thread > frame budget while GPU < budget. The screen-percentage test does not change frame time.

Step 1 Render-thread bound (Draw > Game)

Run stat scenerendering and stat rhi. Probable causes:

Too many draw calls:
  • HISM/ISM/HLOD for repeated meshes.
  • Merge static actors.
  • Enable r.SceneCulling 1.
Too many movable shadow casters:
UE 5.6 renderer parallelization: ensure r.RDG.ParallelExecute 1.

Step 2 Game-thread bound (Game > Draw)

Run stat slow -ms=0.5 to catch any cycle counter > 0.5 ms. Then branch:

Tick cost (STAT_TickActor, STAT_TickComponent heavy):
  • dumpticks — lists every actor with registered tick. Filter to tick-enabled actors that don't need it.
  • Disable per-actor (PrimaryActorTick.bCanEverTick = false).
  • Batch with tick.AllowBatchedTicks 1 (UE 5.5+).
  • Deep dive: Tick Budgets
Animation (stat anim heavy):
  • Check STAT_AnimGameThreadTime, STAT_PoseUpdate, STAT_AnimGraphEvaluate.
  • Enable URO; use Component Use Fixed Skel Bounds.
  • Switch to fast-path AnimBP nodes.
  • Deep dive: Animation Performance
Physics (stat physics / stat chaos heavy):
  • Disable collision on cosmetic meshes.
  • Drop solver iterations.
  • Use simple primitive collision instead of per-poly.
  • Enable p.Chaos.Solver.Joint.UseSimd 1 (UE 5.5).
  • Deep dive: Chaos Physics
Garbage Collection hitches:
  • gc.MultithreadedDestructionEnabled 1.
  • Raise gc.TimeBetweenPurgingPendingKillObjects.
  • Audit churn (Spawn/Destroy loops → object pool).
  • Deep dive: GC Hitches + Object Pooling
Blueprint VM cost (STAT_BlueprintTime, STAT_ScriptVM):
  • Move hot Tick logic to C++.
  • Replace per-frame BP event chains with timers.
  • Prefer event-driven over polled.
  • Deep dive: Blueprint Tick Graphs
Streaming / async load on game thread:
  • Look for BlockTillLevelStreamingCompleted stalls (5.5 narrowed scope).
  • Check s.UseUnifiedTimeBudgetForStreaming.
  • Deep dive: Loading Times

Step 3 Verify

stat dumphitches writes any frame > t.HitchFrameTimeThreshold to log. Re-run scenario; confirm targeted hitch is gone.

Tree 3: Memory Exhausted

Entry signals: "TEXTURE STREAMING POOL OVER BUDGET" warning, OOM crash, GPU device removal, or stat memory shows Used Physical near platform limit.

Step 1 Categorize

Warning text references "Streaming Pool" → texture streaming path (Step 2).
OOM crash log references VRAM / GPU memory → GPU asset path (Step 3).
OOM crash log references system memory → CPU heap / level streaming path (Step 4).

Step 2 Texture streaming pool over budget

Run stat streaming — watch Streaming Pool Used, Wanted Mips, NonStreaming Mips %.

Quick triage: raise r.Streaming.PoolSize proportional to VRAM (3000–4000 for 8 GB GPUs, 16000+ for 16 GB). Silences warning; masks issue.
Root-cause fix:
  • Audit textures via Texture Stats window or DumpTextureStreamingStats.
  • Look for non-power-of-two sizes, unstreamed UI/lightmap textures, 4K textures on small props.
  • Apply LODGroup, MaxTextureSize, mip bias on distant content.
  • Deep dive: Memory & VRAM
Always re-test in packaged build — editor reports inflated NonStreaming usage.

Step 3 VRAM / GPU OOM

Run memreport -full → look at RHI Resource Memory. Use stat LLM, stat LLMFULL for tagged breakdown.

VSM physical pages large → reduce r.Shadow.Virtual.PhysicalPagePoolSize.
Lumen scene / surface cache large → drop r.LumenScene.SurfaceCache.AtlasSize or r.Lumen.SceneCaptureCacheResolution.
Nanite scratch large → reduce r.Nanite.MaxNodes, r.Nanite.MaxVisibleClusters.
RT scratch large → 5.6.1 regression flagged in Epic forum; lower RT detail or pin engine version.
Mesh memory large → audit Nanite enabled on tiny meshes, LOD0 tri counts, collision data, virtual textures.

Step 4 System memory OOM

memreport -full — top sections: Object Memory by Class, Asset Memory, Audio, World/Levels. Add -llmcsv for hierarchical CSV.

Levels not unloading → audit Level Streaming volumes, World Partition cell unload, replace Open Level with sub-levels.
Audio bank inflation → mark non-essential SoundCues stream from disk.
Asset reference leaks → soft refs for one-shot content; async loading instead of hard pointers. Deep dive: Loading Time Optimization.
Tick / spawn churn → object pooling. Deep dive: Object Pooling.

Step 5 Verify

Re-run scenario. Compare two memreports side-by-side. Confirm stat streaming pool used < pool size and no OOM after a 30-min soak.

When to escalate to Insights If a tree branch ends ambiguous (e.g., game thread heavy but stat slow is clean), capture an Insights trace with -trace=cpu,gpu,frame,bookmark,memory_light and step into the Timing Insights view.

PerfGuard regression reports auto-tag failures with gpu-bound:lumen, cpu-bound:tick, memory:texture-pool — surface a "Triage this" button that deep-links into the matching tree branch. Future pg-triage <regression-id> CLI prints the exact branch to terminal during CI failure.