Intermediate ~25 min read

Common Unreal Engine Performance Gotchas

A field guide to the twelve mistakes that cost UE5 projects the most frame time. Each gotcha is paired with a verified fix — the console variable, property, or API call that resolves it — pulled from Epic documentation, vendor performance guides, and well-known engineers in the community.

Complex Collision on Decorative Meshes

Every static mesh in Unreal has two collision representations: simple collision (a small set of primitives — boxes, capsules, convex hulls) and complex collision (the full triangle soup of the rendering geometry). Complex collision is per-triangle. The cost scales with how many triangles the query touches, even with the engine's BVH/octree acceleration.

The footgun is the Collision Complexity dropdown on the static mesh asset. When set to Use Complex Collision As Simple, every overlap query, sweep, line trace, and physics tick against that mesh runs against the full render mesh. On a 50k-triangle prop dropped a hundred times across a level, that's a six- to seven-figure triangle population paying collision cost on every relevant query.

The fix:

Leave new static meshes on Default (use simple collision for queries, complex only when explicitly requested).
For decorative props (rocks, foliage, statues, debris) that don't need a precise shape: assign a simple primitive from the Static Mesh Editor (Collision → Add Box / Sphere / Capsule / 26-DOP / 18-DOP) or, for irregular shapes, Auto Convex Collision.
If you need accurate ground or wall collision but the render mesh is heavy: author a separate low-poly collision proxy in your DCC tool and pair it with the visual mesh.
Disable collision entirely on truly decorative geometry. Set Collision Presets to NoCollision on small set-dressing pieces (loose pebbles, ceiling fans, distant background props).

🔎

How to find offenders Open the Statistics window (Window → Statistics) and switch to the Static Mesh Lighting Info or Primitive Stats view to spot meshes with collision triangle counts that don't match their gameplay role. The console command stat collision shows per-frame collision query cost during PIE.

Event Tick Enabled on Every Actor by Default

Every AActor-derived class in Unreal can tick. By convention, Blueprint actors created in the editor have ticking enabled; many C++ classes set PrimaryActorTick.bCanEverTick = true in their constructor without thinking about it. Even an empty tick costs something — the tick manager has to schedule, dispatch, and bookkeep it every frame.

An empty tick is cheap individually, but populations multiply. Practical profiling published by community engineers puts the cost of empty Blueprint ticks at roughly a millisecond per frame for a few hundred actors — which is meaningful when your whole frame is 16.67 ms at 60 fps.

The fix — in C++:

YourActor.cpp

// Off by default. Opt actors in only when they actually need per-frame work.
AYourActor::AYourActor()
{
    PrimaryActorTick.bCanEverTick = false;
    PrimaryActorTick.bStartWithTickEnabled = false;
}

// Enable at runtime only when needed (e.g. when state machine enters a tick-needing state).
SetActorTickEnabled(true);

// If you do need tick, throttle it. 0.1s = 10Hz, fine for most "polling" logic.
PrimaryActorTick.TickInterval = 0.1f;

The fix — in Blueprint: open the Class Defaults, scroll to Actor Tick, and uncheck Start with Tick Enabled. If the actor never needs to tick, also uncheck Can Ever Tick. For Blueprints that do need to poll on a heartbeat (UI updates, AI sense checks, pickup hover wiggle), set Tick Interval to 0.05–0.1 instead of running every frame.

📝

Audit existing projects Run the dumpticks console command in PIE. It prints every active tick function in the world. Anything in that list that doesn't justify its frame cost should have its tick disabled, throttled, or replaced with a Timer / event-driven update.

⚠

bCanEverTick must be set in the constructor You can flip SetActorTickEnabled() any time, but bCanEverTick has to be set during construction — once the actor is registered with the tick manager, you can only enable/disable an existing tick function, not create one.

Heavy Logic in Blueprint Tick Graphs

Blueprints execute on a virtual machine. Each node has a small but real per-call overhead compared to compiled C++. That overhead is invisible when you call a Blueprint function once on button press; it is very visible when a Tick graph traverses fifty nodes per frame across two hundred actors.

The myth is "Blueprints are slow." The reality is more specific: the BPVM's main meaningful cost is function-call overhead per node. A graph with one node calling an expensive native function is essentially free. A graph with dozens of pure-function chains, math ops, and branches inside a tight loop is not.

The fix — rules of thumb:

Minimise nodes inside Tick. If the graph runs every frame, every node compounds. Move per-frame logic to C++ when the graph gets dense.
Don't put pure functions inside loops. A pure Blueprint node re-evaluates each time its output is read. Cache the value into a local variable above the loop.
Pass large structs and arrays by reference. Function inputs default to pass-by-value in BP, which copies the data. Toggle the input pin to Pass-by-Reference for non-trivial types.
Replace polling with events. If you're checking HasReachedDestination() every tick, convert it to a delegate the destination component fires once.
Profile first. Use Unreal Insights (Trace → Bookmarks → CPU) to confirm a Blueprint graph is the bottleneck before rewriting it. Most graphs don't need to be C++.

💡

Casting isn't the boogeyman "Avoid Cast nodes" is a popular but misleading rule. The cast itself is cheap. The problem is that a hard reference to a class (which a Cast pin creates) can pull that class's referenced assets into memory at load time. If you're worried about a cast, ask whether the reference should be soft, not whether the cast should be replaced with an Interface (which still casts internally).

Movable Lights Casting Dynamic Shadows

Dynamic shadow casting is the single most expensive thing most projects do per light. Epic's own documentation is blunt about the multiple: a fully movable, shadow-casting light renders dramatically more expensively than a movable light with shadows disabled — on the order of ~20× on average per the Stationary Lights documentation, depending on radius and the geometry it touches.

The cost scales with two things: how many meshes the light's bounding volume overlaps, and the triangle count of those meshes. A movable light with a large attenuation radius hung over a forest is the worst case — thousands of foliage triangles getting re-rasterised into a shadow depth buffer every frame.

The fix — in priority order:

Don't make a light movable unless it actually moves. Use Static for fully-baked lights and Stationary for lights that need dynamic shadows on dynamic objects but a baked direct contribution on static geometry.
Disable shadow casting on fill lights. Many secondary lights exist purely to shape art; they don't need to cast shadows. Toggle Cast Shadows off on them.
Reduce attenuation radius. Aggressively trim point/spot light radii. A light with a 2000 uu radius affects a much smaller mesh count than one with a 5000 uu radius.
Use r.Shadow.RadiusThreshold to drop shadows from tiny on-screen meshes. Distant or small objects below the threshold stop casting shadows but still render normally.
Tune cascade count and dynamic shadow distance on the directional light. Each cascade is a full shadow-depth pass over the camera frustum. Two well-tuned cascades is usually better than four sloppy ones.

💡

Static point/spot lights still cache A movable point or spot light that is not actually moving can still benefit: when nothing within its volume has moved between frames, the engine can reuse the previous frame's shadow map. Turning a "movable but stationary" light into Stationary mobility (or simply not moving it) unlocks that caching path.

Translucent Material Overdraw

The deferred renderer was built for opaque geometry. Translucent and masked materials sit outside that pipeline and have to be shaded for every pixel they cover — including pixels behind other translucent surfaces. Stack five smoke quads on screen, and the GPU shades the same pixel five times.

This is "overdraw," and it is the most common reason a scene profiles fine on the CPU but tanks on the GPU. Particle systems with overlapping translucent quads, glass walls in front of glass walls, dirty volumetric fog volumes, decals on translucent surfaces — all classic offenders.

The fix:

Use the Shader Complexity view. Press Alt+8 in the editor (or the equivalent View Mode → Optimization Viewmodes → Shader Complexity) and walk the camera through your scene. Green = cheap. Red = expensive. White = wildly over budget. Anywhere translucents stack will glow.
Prefer Masked over Translucent when you don't need partial alpha. Masked is opaque-rendered with a clip; it benefits from depth pre-pass and early-Z.
Use opaque LODs at distance. Foliage cards in particular: have LOD0 use Masked translucent leaves, swap to a fully opaque, lower-quality variant for LOD2+.
Trim particle quad sizes and counts. A common smoke effect is two large quads instead of twenty small ones. The pixel-shaded area is what costs you, not the per-particle CPU work.
Disable expensive material features on translucents. Reflections, refraction, and contact shadows on translucent materials multiply the per-pixel cost.

Foliage WPO Invalidating Virtual Shadow Maps

Virtual Shadow Maps (VSM) get most of their performance from caching. A page that hasn't changed since last frame doesn't need to be re-rendered. The catch: any geometry that uses World Position Offset (WPO) in its material — e.g. wind-swayed foliage, animated banners, vertex-shader displacement — invalidates the cached pages it touches every frame.

On a forest scene, this is catastrophic. Thousands of trees with wind WPO mark thousands of VSM pages dirty each frame, and the shadow renderer ends up redoing the work it was supposed to skip. Epic's own VSM documentation calls out animated WPO and a continuously-moving sun as the two biggest enemies of VSM caching.

The fix:

Set WPO Disable Distance on every foliage and static mesh that uses WPO. The property lives on the Static Mesh asset under General Settings → Nanite Settings (for Nanite meshes) or under foliage type / instance settings. Past this distance, the engine treats the mesh as if its WPO was zero — restoring shadow caching for the trees you can't see closely anyway.
Disable WPO on shadow passes if the visual cost is acceptable. Use the Evaluate World Position Offset material setting and per-component overrides to skip WPO for shadow-only renders.
Avoid using WPO for things that aren't visibly motion. Don't smuggle a static height offset through WPO; bake it into the mesh.

⚠

The instanced foliage WPO bug There is a long-standing community-reported issue where WPO Disable Distance works in the editor viewport but stops working in PIE / packaged builds when the foliage uses World Partition's InstancedFoliageActor with the runtime grid set to MainGrid. The reported workaround is to set the runtime grid to None. Verify behaviour on your engine version — this has been worked on across 5.x point releases.

📚

Deep dive: Virtual Shadow Maps tuning The full mechanics of VSM caching, page invalidation, and the per-component ShadowCacheInvalidationBehavior=Static mitigation Fortnite uses are covered in the Virtual Shadow Maps Performance & Tuning tutorial.

Texture Streaming Pool Bloat

The infamous yellow on-screen warning — "TEXTURE STREAMING POOL OVER BUDGET" — means the streamer cannot fit the textures the camera wants at full resolution into the GPU memory pool. The engine responds by lowering mip levels, which produces the blurry-everything look and a measurable hit to fidelity.

The wrong fix is to bump the pool size. That hides the symptom and steals VRAM that other systems need. The right fix is to remove the bloat.

The fix:

Authoring discipline. Stop authoring everything at 4K. Set Maximum Texture Size on each texture asset to its actual on-screen need (a 256×256 prop in the world background does not need a 2048×2048 texture).
Use Texture Groups correctly. Set Mip Gen Settings to FromTextureGroup and pick the right group (UI, Character, World, WorldNormalMap, etc.). Each group has its own LOD bias and max size enforced globally.
Verify compression. Default (BC1/BC3) for albedo, NormalMap (BC5) for normals. A texture set to Uncompressed by accident is 4–8× the size of a properly compressed one.
Audit lightmaps. Lightmap textures are easy to over-provision. Lower lightmap resolution per static mesh until quality visibly suffers.
Use r.Streaming.PoolSize only as a last resort. If your scene legitimately needs more than the default, raise it — but do so knowing you're trading away VRAM that could go to render targets, geometry, and shaders.

🔎

Diagnostic console commands stat streaming shows the current pool, what's in it, and what's wanted. ListStreamingTextures dumps the largest textures in the pool. r.Streaming.UseAllMips 1 temporarily forces full-res streaming so you can see the “true” cost.

Missing or Misconfigured LODs

Without LODs, the GPU rasterises a 50,000-triangle hero prop the same way whether it fills the screen or covers four pixels in the distance. With Nanite this is mostly handled automatically. Without Nanite (or for hand-authored, non-Nanite content like skeletal meshes, foliage cards, or imported assets) you must author LODs yourself.

The common mistakes:

Importing meshes with no LODs and forgetting to generate them.
Generating LODs but leaving Auto Compute LOD Distances off without manually tuning the screen sizes — LODs never trigger because the screen-size thresholds are wrong.
Authoring three LODs that all have nearly the same triangle count (a triangle reduction of 90–75–50–25% per LOD level is a reasonable starting point).
Forgetting to set a max draw distance on small set-dressing — a tiny prop visible at 200 m is wasting draw calls.

The fix:

Use auto-LOD generation in the Static Mesh Editor (LOD Settings → Number of LODs, then Apply Changes). The engine's reduction is good enough for most non-hero content.
Leave Auto Compute LOD Distances on unless you have a specific artistic reason to override it. It picks transition screen sizes based on the visual error each reduction introduces.
Drop tiny background props with Desired Max Draw Distance on the Static Mesh Component, or use a Cull Distance Volume to apply distance-cull rules to a region in bulk.
For Nanite meshes, the LOD chain is replaced by the cluster hierarchy — you don't author LODs, but you should still validate small-feature culling and Fallback Mesh behaviour for platforms or paths that don't use Nanite.

Draw Call Sprawl from Non-Instanced Static Meshes

Each unique draw submission has a fixed CPU cost on the render thread (and a smaller cost on the RHI thread). A scene with the same wall mesh placed 200 times the naive way is 200 draw submissions; the same scene with that wall as an Instanced Static Mesh is closer to 1.

Modern Unreal mitigates this in two ways. First, auto-instancing (since 4.22) collapses identical static-mesh draws at submission time when their materials match. Second, Nanite dispatches geometry as virtualised clusters — you pay roughly per material, not per actor placement. So the gotcha is sharper than it used to be, but it still bites in two scenarios:

You have lots of unique meshes — auto-instancing only collapses identical ones. Different rocks, different walls, different foliage variants will not share a draw.
You have non-Nanite content with thousands of placements — foliage, debris, modular kits, decals, particle quads.

The fix:

Use Hierarchical Instanced Static Mesh (HISM) for large populations of the same mesh. HISM splits instances into a spatial cluster tree, so the renderer can skip whole clusters that fall outside the frustum — a plain ISM with one giant bounding box can't do that.
Use Foliage Mode for grass, trees, and rocks scattered across a landscape. The Foliage system places them as HISM under the hood, with paint tools and density controls.
Reduce material variety on background geometry. Two visually-similar walls that share a master material will auto-instance; two walls with bespoke material instances will not.
Use stat rhi and stat scenerendering to track DrawPrimitive Calls over the lifetime of a level. A regression there is almost always actor placement gone wrong.

Garbage Collection Hitches

Unreal's garbage collector runs on a configurable interval (default gc.TimeBetweenPurgingPendingKillObjects is 60 seconds in current UE5) and during its reachability analysis phase it traverses the UObject reference graph. On a project with hundreds of thousands of UObjects (typical for a streamed open-world game), this can stall the game thread for tens of milliseconds — visible as a single dropped frame, or a chunky stutter when a level streams in.

The cost scales with the total number of UObjects in memory, not just the ones being collected. So a project with a lot of long-lived data assets and persistent objects pays a tax on every GC pass even when nothing is actually being freed.

The fix:

Pool, don't churn. Bullets, particles, AI, hit indicators — anything that spawns and despawns hundreds of times per minute should be returned to a pool, not destroyed and re-spawned. Pooled objects don't enter GC.
Prefer plain C++ structs over UObjects for data that doesn't need replication, garbage collection, or Blueprint exposure.
Tune gc.TimeBetweenPurgingPendingKillObjects and incremental GC settings in DefaultEngine.ini. Spreading reachability over multiple frames reduces the per-frame stall at the cost of a slightly later actual free.
Watch for "GC trains": large level streaming events that release thousands of objects at once. Stagger streaming when possible, and consider forcing a GC during a loading screen or transition where a hitch is invisible.
Use Insights to attribute hitches. The CollectGarbage bookmark in Unreal Insights makes GC stalls trivially identifiable.

💵

Forcing GC at a safe moment GEngine->ForceGarbageCollection(true) from C++ (or the gc.CollectGarbage console command) triggers an immediate full GC. Use it during a loading screen or fade-out, when the player can't tell the frame stalled. Don't use it “just in case” mid-gameplay.

PSO / Shader Compilation Stutter

Modern GPUs don't render from raw shader bytecode — they render from Pipeline State Objects: compiled, hardware-specific permutations that bake together the shader, render state, vertex format, and target format. The first time the engine encounters a new PSO, the driver has to compile it. That compile happens on the render thread. The result is the now-infamous "first time you see an effect, the game hitches" pattern that has plagued many UE titles.

This is the single biggest source of player-visible stutter in shipped UE5 games. The good news is that the engine has explicit machinery to fix it.

The fix — PSO Precaching (UE 5.2+):

DefaultEngine.ini

[/Script/Engine.RendererSettings]
r.PSOPrecaching=1

That single line opts you into the runtime PSO precache path. As assets load, the engine speculatively compiles PSOs they're likely to need. When a mesh is about to render before its PSO is ready, the engine can either skip drawing it that frame or fall back to a default material — both of which produce a missing-frame artefact rather than a multi-second hitch.

Useful related CVars:

r.PSOPrecache.ProxyCreationWhenPSOReady=1 — delays scene proxy creation until the PSO has compiled. Prevents the “default material flash.”
r.PSOPrecache.Validation=2 — emits diagnostic stats and Unreal Insights tracking so you can see what is and isn't being precached.

Combine with Bundled PSO Cache for the long tail. Precaching does not yet catch every PSO type (graphics-pipeline globals can still slip through). The legacy bundled cache — gameplay-recorded, packaged with your build, compiled at startup — remains useful for the cases precaching misses. Many shipped 5.x titles use both.

⚠

Driver caches matter Driver-level shader caches mean a freshly-installed game on a fresh driver hits every PSO cold. The second launch is much faster because the GPU driver remembered the compiled programs. This is why "the game runs better after I played for an hour" is real, and why Day 0 reviewers always see the worst case.

📚

Deep dive: PSO Precaching The full three-tier model (runtime precache, bundled .spc cache, driver cache), the NVIDIA driver-eviction trap that resurrects 100ms hitches, and the validation workflow are covered in the PSO Precaching Deep Dive.

Lumen and Hardware Ray Tracing Defaults

UE5's defaults are tuned for showcasing engine features, not for shipping titles. Lumen, Nanite, Virtual Shadow Maps, and Temporal Super Resolution are all enabled by default in a fresh project, and several of their highest-quality paths are on out of the box. For projects that don't visually need that ceiling, the cost is paid for nothing.

Two specific defaults are worth auditing on every project:

1. Hardware vs software Lumen ray tracing. Hardware ray tracing produces sharper reflections and slightly better global illumination on supported GPUs — but it is also dramatically more expensive than software ray tracing, and many projects don't get a meaningful visual lift from it. AMD's UE performance guide cites a measurable per-frame difference (on the order of a millisecond) on tested scenes when forcing software RT.

DefaultEngine.ini (or scalability)

# Force software Lumen — usually a big win for projects that don't need HWRT-class reflections
r.Lumen.HardwareRayTracing=0

2. Mesh Distance Fields stay on even when Lumen is off. If you've disabled Lumen but not the underlying SDF generation, you're still paying the build-time and memory cost. Audit:

DefaultEngine.ini (when Lumen is fully unused)

r.DynamicGlobalIlluminationMethod=0
r.Lumen.Reflections.Allow=0
r.GenerateMeshDistanceFields=0

Other defaults worth knowing:

VSM samples. r.Shadow.Virtual.SMRT.SamplesPerRayLocal defaults to 8; reductions to 4–6 are often visually indistinguishable but measurably cheaper.
VSM page pool. r.Shadow.Virtual.MaxPhysicalPages may need to be raised on Nanite-heavy scenes to avoid page-pool-overflow corruption; the cost is GPU memory.
TSR async compute. r.TSR.AsyncCompute defaults to 1 (basic async); setting it to 2 runs more of TSR overlapped with the rest of the frame, often saving render-thread time when the GPU has idle slots. Worth measuring on your project.

⚠

Don't change CVars without measuring Every project's render path is different. The settings above are candidates to investigate, not a recommended preset. Capture a baseline, change one variable, capture again. Anything that doesn't show up as a frame-time win at your fidelity target should be reverted. This is exactly what PerfGuard is built for.

📚

Deep dive: Lumen Performance The full breakdown of Lumen's four cost centers, screen probe gather tuning, surface cache mechanics, and how to disable Lumen properly is covered in the Lumen Performance Deep Dive.

✓

Closing the Loop with PerfGuard

Knowing the gotchas is half the work. The other half is catching them before they ship. Most of the issues above are silent — you won't see them as errors, you'll see them as a slow erosion of frame time across a hundred small commits. By the time someone notices "the game feels slower lately," the regression is buried in months of merges.

Set up a PerfGuard scenario for each of your headline levels, record a baseline of FrameTime, GPUTime, DrawCalls, and RenderThreadTime, and gate every PR against it. When a teammate accidentally re-enables complex collision on a 20k-tri prop, ships a ticking actor with a hot loop, or flips a Lumen quality CVar, you'll know which commit caused it — instead of finding out in QA two milestones later.

Quick Start — install PerfGuard and record your first baseline in 15 minutes.
Diagnosing CPU Regressions — the gotchas above mapped to game-thread vs render-thread symptoms.
Diagnosing GPU Regressions — shadow, overdraw, and Nanite/Lumen detective work from a fail report.
Performance Testing Best Practices — how to keep the regression catches signal-rich and the false positives quiet.

References & Further Reading

Epic Games — Simple versus Complex Collision in Unreal Engine (official documentation).
Epic Games — Actor Ticking in Unreal Engine (official documentation).
Epic Games — Virtual Shadow Maps in Unreal Engine and the Fortnite Chapter 4 VSM tech blog.
Epic Games — Game engines and shader stuttering: Unreal Engine’s solution (PSO precaching tech blog).
AMD GPUOpen — Unreal Engine Performance Guide (Lumen, Nanite, VSM, TSR profiling and CVar recommendations).
Tom Looman — Setting up PSO Precaching & Bundled PSOs for Unreal Engine.
Intax — Performance guideline for Blueprints and making sense of Blueprint VM.
Epic Developer Community — Primer: Debugging Garbage Collection Performance.
Iri Shinsoj — Notes on foliage in Unreal 5 (WPO and VSM caching interactions).