Advanced ~22 min read

Lumen Performance Deep Dive

Lumen is the most expensive subsystem in a default UE5 project. This tutorial covers what Lumen actually does each frame, how its four cost centers move with scene content, the CVars that buy you the most milliseconds for the least visual impact, and how to know when the right answer is to turn the whole system off.

Where Lumen's milliseconds go

"Lumen is slow" is not actionable. Lumen is four discrete passes, and they regress for different reasons. Before tuning anything you need to know which pass is the offender. Run stat gpu in PIE on a representative scene and look for these four lines:

LumenSceneUpdate — surface cache page maintenance. Cost scales with the area of cards added/updated this frame.
DiffuseIndirectAndAO — the screen probe gather + final integrate that produces global illumination. Usually the largest Lumen line on the frame.
Lumen Reflections — a separate trace budget for glossy reflections. Independent of GI; can be tuned independently.
ShortRangeAO — the bent-normal AO that fills in contact shadowing at small scales.

Lumen's overall cost is dominated by the final gather (DiffuseIndirectAndAO) on most scenes. The other three move with specific content choices: lots of dynamic geometry rotating into the cache pumps LumenSceneUpdate, glossy reflective floors balloon Lumen Reflections, and dense small geometry hits ShortRangeAO. The mental model that makes the rest of this tutorial click: Lumen is a final-gather GI system on top of a separate reflection-trace pipeline, and both rely on a cached representation of the scene that has to be kept current.

📝

Read this first if you only have ten minutes The two highest-leverage changes in this tutorial — for almost every project — are setting r.Lumen.ScreenProbeGather.StochasticInterpolation 1 (community-measured ~30% gain on the screen probe pass per AMD's UE perf guide) and dropping r.Lumen.ScreenProbeGather.SpatialFilterNumPasses from 3 to 1. Skip to section 5 if you need that win today.

Capturing a clean Lumen baseline

You cannot tune what you have not measured, and Lumen is one of the systems where intuition is most often wrong. The minimum viable workflow is: capture an Insights trace with the GPU channel on, identify the heaviest Lumen pass, then run ProfileGPU for a single-frame breakdown.

PIE / packaged build

// Start an Insights trace targeting the GPU side of the renderer
Trace.Start gpu,rhi,frame,bookmark

// Walk through the level for ~30 seconds, then:
Trace.Stop

// For a single-frame snapshot of every pass:
ProfileGPU
// (Or press Ctrl+Shift+, in PIE)

Open the resulting trace in Unreal Insights and switch to the GPU view. Starting in UE 5.6, the GPU profiler is unified between ProfileGPU and Insights, with separate Graphics and Compute tracks (per Tom Looman's 5.6 performance roundup). The Compute track is where most of Lumen lives — final gather, screen probes, reflections.

For tracking change-over-time, capture twice: once on a stable build, once on the candidate. Diff the GPU pass timings. Anything that changed by more than the test machine's natural variance (typically ~3% on a thermally stable rig) is the real signal.

⚠

The editor is not your perf target Lumen costs differently in the editor than in PIE/standalone, and differently again in a packaged build (especially on consoles). For any tuning that's going to ship, capture in the same packaged build configuration the player will run.

Hardware vs software ray tracing — per-platform decisions

Lumen has two tracing backends: software ray tracing against a global signed-distance-field (SDF) representation of the scene, and hardware ray tracing against the GPU's BVH using DXR/Vulkan-RT. They produce noticeably different visual results — HWRT is sharper at fine geometry, less prone to leaking through thin walls, and significantly better on foliage. They also have very different costs.

AMD's official UE5 performance guide reports up to a 1.2 ms delta on an RX 7900 XTX at 4K in favor of software Lumen, with what they describe as "minor visual quality difference" on tested scenes (per GPUOpen's UE Performance Guide). On NVIDIA's side, Ada-architecture GPUs unlock Shader Execution Reordering (SER), which AMD's own measurements aside, NVIDIA's developer blog reports as 20–30% faster Lumen reflections in the City Sample (per NVIDIA's SER announcement). The 5.6 default is to enable SER (r.Lumen.HardwareRayTracing.ShaderExecutionReordering=1) when the GPU supports it.

The decision tree:

HWRT-Inline (Lumen's default HWRT mode) — correct first stop on RTX 30/40-series and RDNA 3+ GPUs. Modest cost, much better visual quality than SDF Lumen on detailed geometry.
HWRT-HitLighting (r.Lumen.HardwareRayTracing.LightingMode 2) — full per-hit shading. NVIDIA's own UE5 RT guide describes this as "expensive… not recommended for current-gen platforms; reserve for Epic/PC quality" (per the NVIDIA UE5 Ray Tracing Guideline 5.4).
Software (SDF) Lumen — cheaper on most hardware, the only option on RDNA 1/2 and consoles where hardware RT performance is variable. Acceptable visual quality for the majority of stylised content.

DefaultEngine.ini

[/Script/Engine.RendererSettings]
; Force software Lumen on a per-platform basis. Most projects ship
; SW on consoles + HWRT on PC, gated by scalability.
r.Lumen.HardwareRayTracing=0

; If you ARE running HWRT, opt in to SER on supported GPUs (default in 5.6)
r.Lumen.HardwareRayTracing.ShaderExecutionReordering=1

Krzysztof Narkowicz's "Journey to Lumen" documents Epic's own optimization arc: the Matrix Awakens demo on PS5 was driven from ~8 ms to roughly 4 ms of Lumen cost across development, almost all of it through algorithmic changes that landed in the engine. That work is now the default.

Reflections are a separate budget

The most common mistake when first profiling Lumen is to attribute reflection cost to GI. They are accounted separately and tuned separately. The Lumen Reflections stat covers a distinct trace pipeline that runs alongside the screen probe gather; on roughness-rich scenes (wet streets, polished interiors) it can rival GI in cost.

Hellblade II's optimization writeup is instructive here: that team kept Lumen GI on but aggressively trimmed reflections by halving the reflection trace resolution with r.Lumen.Reflections.DownsampleFactor=2 and capping r.Lumen.Reflections.MaxRoughnessToTrace to ~0.4 (per a published technical review). Above that roughness, the engine falls back to GI/skylight contribution, which is virtually free compared to a traced sample.

The reflection-tuning ladder, in priority order:

r.Lumen.Reflections.MaxRoughnessToTrace — default 0.4. Lower this first. Roughness above the threshold falls back to the radiance cache. Most surfaces above 0.5 don't visibly benefit from a traced reflection.
r.Lumen.Reflections.DownsampleFactor — default 1. Setting to 2 halves reflection resolution; large saving with minimal visual hit on rough surfaces.
r.Lumen.Reflections.RadianceCache 1 — default on. Reuses the world radiance cache for rough reflections. Verify it's enabled if you've inherited a project.
Disable per-pixel front-layer translucency reflections on stylised projects via r.Lumen.TranslucencyReflections.FrontLayer.Allow 0.
Kill switch: r.Lumen.Reflections.Allow 0 entirely disables Lumen reflections without affecting GI. Useful as a measurement isolation tool.

💡

5.6 reflection-output format saves ~0.05 ms In 5.6 Epic switched the reflection output to a 32-bit packed format, saving "0.02 ms in Lumen Reflections + 0.03 ms in water rendering at 1080p console" (per Tom Looman's 5.6 highlights). Free if you upgrade.

Screen Probe Gather levers, ranked by ROI

The screen probe gather is the largest line on most Lumen frames. It places probes on a tile grid, gathers radiance from each, denoises, and integrates into the final image. Every step in that pipeline has a CVar.

The order below is empirically the best ROI per AMD's UE perf guide and Tom Looman's 5.6 measurements:

1. Stochastic interpolation (highest single-CVar win). Replaces bilinear probe interpolation with random sampling. AMD reports ~30% faster screen probe gather on tested scenes, with denoising hiding the noise the random sampling introduces.

DefaultEngine.ini

[/Script/Engine.RendererSettings]
r.Lumen.ScreenProbeGather.StochasticInterpolation=1

2. Spatial filter passes — drop from 3 to 1. Default is 3. AMD recommends 1 or 2 in their UE perf guide. The first pass does most of the perceptual work; later passes have diminishing returns.

DefaultEngine.ini

r.Lumen.ScreenProbeGather.SpatialFilterNumPasses=1

3. Integrate at half resolution (5.6+, gated). r.Lumen.ScreenProbeGather.IntegrateDownsampleFactor=2 integrates the gather at half-res, with about a 3× speed-up on the integrate stage and ~0.3–0.5 ms saved at 1080p console per Tom Looman's 5.6 measurements. Default is 1 (full-res integrate); flip on after verifying no visible quality loss in your scene.

4. Probe density. r.Lumen.ScreenProbeGather.DownsampleFactor sets the pixel size of the tile each probe owns. Defaults are 8 (Cinematic), 16 (Quality 3), 32 (Low). Increase to 32 for a low-end SKU; the visual cost is more obvious on small-feature lighting (thin object highlights).

5. MaxRayIntensity firefly clamp. 5.6 tightened this from 40 to 10 by default. Aggressive clamp = fewer firefly highlights, less denoiser work, slightly less visible specular highlights on dim geometry. If you bump up to 5.6 and notice highlights flatten, this is why.

6. Reference mode for ground-truth A/B. When you're not sure whether a CVar tweak hurt quality, r.Lumen.ScreenProbeGather.ReferenceMode 1 brute-forces the gather without all the optimizations. Toggle, screenshot, toggle back, screenshot, diff — that's the only honest visual A/B you can do.

⚠

Test on the actual content Most of these CVars are content-sensitive. Stochastic interpolation looks clean in CitySample-class scenes; on a stylised animation production with high-frequency normal maps, you may see denoiser-introduced flicker. Capture both before and after on your worst-case scene before locking a value.

Taming the Surface Cache

The Surface Cache is Lumen's name for the parameterized representation of the scene that Lumen traces against. Every Nanite/static mesh contributes "cards" to a card atlas; lighting bounces are computed against those cards rather than the original geometry. The cost is in keeping the atlas current: cards get added when they enter the camera's range, updated when their lighting changes, evicted when they're stale.

The 5.6 surface cache improvements were significant. Tom Looman's 5.6 highlights cite a 2× speed-up on the surface cache pass, achieved by halving the page-update count via camera-distance-driven scheduling. Just upgrading captures most of the win for free.

Tuning levers:

Far Field tuning. Open worlds use a far-field cache for distant scene contribution. r.LumenScene.FarField.OcclusionOnly=1 in 5.6 makes the far field ~50% faster on console by skipping color computation for occlusion-only contribution. Worth it on most large outdoor scenes.
Card placement audit. Use r.Lumen.Visualize 1 (or ShowFlag.VisualizeLumen 1) to overlay where cards are landing. Cards on tiny clutter, on collision-only meshes, or on geometry the player never sees are wasted atlas space. Set Affect Distance Field Lighting to false on those meshes.
Distance field representation toggle. When running pure HWRT (no SDF tracing path), the SDFs themselves are wasted memory. The 5.6 disable saves "~0.07 ms at 1080p console" per Tom Looman's measurements.

The early City Sample numbers from Krzysztof Narkowicz's "Journey to Lumen" are useful as a cost-distribution sanity check — on an RX Vega 64 at 1080p, his early Lumen prototype totaled 25.56 ms: 3.86 ms radiosity + 8.48 ms voxel injection + 5.50 ms light card diffuse + 5.46 ms light card reflections. Modern Lumen has compressed all of those numbers significantly, but the relative shape (final gather dominating, reflections about equal in cost to GI) still matches what you see on most scenes today.

Content-side fixes that beat any CVar

The cheapest milliseconds are the ones you don't compute. Before tuning Lumen further, audit the content:

Distance fields on hero meshes. Software Lumen relies on SDF traces. Meshes with poor SDF representations (wide thin geometry, complex undercuts) leak light or trace incorrectly. Inspect with r.AOGlobalDistanceField.Visualize 1.
Two-sided distance fields for foliage cards. Foliage with masked translucency needs Two Sided Distance Field Generation on the static mesh. Without it, software Lumen loses the foliage volume and shadowing leaks.
Avoid translucent reflective surfaces stacked on each other. Lumen reflections sample from cards; through-translucency reflections compound the trace cost in the same way overdraw compounds shading cost.
No collision-only meshes contributing to Lumen. Anything that's not visually rendered should not contribute to the surface cache. Set Affect Distance Field Lighting off.
Audit dynamic-light-vs-Lumen interaction. Many Movable lights are silently inflating LumenSceneUpdate by invalidating cards in their range. Performance Gotcha #4 in our gotchas tutorial covers the related problem of overzealous dynamic shadows; the same audit applies here.

For a stark example of where content-side decisions dominate runtime cost: STALKER 2 ships software Lumen but with no Nanite on foliage. The result is a well-known light-leak + foliage pop-in interaction — Lumen surface cache cards land on the rendered foliage proxies rather than the actual visible meshes. GSC publicly described it as a tradeoff they accepted to ship (per a wccftech interview with the dev team).

When to disable Lumen entirely

Lumen is the right system for a lot of projects. It's not the right system for all of them. Cases where the correct answer is "off":

Performance modes targeting 60 fps on current-gen consoles. Black Myth Wukong's Performance mode famously drops Lumen GI in favor of a baked-lighting + DFAO + SSGI fallback to hold 60 fps. The visual loss is real; the framerate gain is bigger.
Top-down/isometric games where indirect lighting is barely visible. The few-pixel contribution doesn't justify the GPU cost.
Stylized/cel-shaded projects. If the art direction calls for flat shading, Lumen is computing nuance no one will see.
Mobile and lower-end SKUs. Lumen's hardware floor is comfortably above mid-range integrated graphics.

"Disabling Lumen" is more involved than flipping the project setting. The full kill-switch sequence:

DefaultEngine.ini (when Lumen is fully unused)

[/Script/Engine.RendererSettings]
r.DynamicGlobalIlluminationMethod=0       ; None (was 1 = Lumen)
r.ReflectionMethod=0                       ; None or 1 = SSR (was 1 = Lumen)
r.Lumen.DiffuseIndirect.Allow=0           ; Belt-and-suspenders — see callout
r.Lumen.Reflections.Allow=0
r.GenerateMeshDistanceFields=0            ; Stop paying the SDF build/storage cost

⚠

Lumen can stay active even when "disabled" project-wide Individual PostProcessVolume actors carry their own Lumen GI / Reflections override checkboxes. A volume override silently re-enables the system for that volume. Epic engineer Arkiras explicitly recommends r.Lumen.DiffuseIndirect.Allow 0 as the only reliable kill switch when diagnosing a stray Screen Probe Gather cost (per this forum thread).

Mesh distance fields are easy to forget. If you've turned Lumen off but kept SDF generation on, you're paying the build-time and memory cost for nothing. Toggle r.GenerateMeshDistanceFields=0 when Lumen is fully unused.

Diagnosing a bad Lumen frame

A Lumen pass is regressing. You don't know why. The systematic workflow:

Identify the offending pass. stat gpu separates the four cost centers. The largest delta against your baseline is your target.
Visualize the surface cache. r.Lumen.Visualize 1 (or Show Flags → Visualize → Lumen Scene) overlays card placement, surface cache atlas pages, and screen probes. Black or stale areas indicate cache pressure; large unexpected card extents indicate poorly-bounded contributors.
Brute-force ground truth. r.Lumen.ScreenProbeGather.ReferenceMode 1 runs the un-optimized full gather. Capture before and after to know whether a quality-vs-cost knob has been pushed too far.
Bisect by isolating components. Toggle r.Lumen.Reflections.Allow 0 and re-capture: if the bad delta vanishes, the regression is in reflections, not GI. Same for r.Lumen.DiffuseIndirect.Allow 0.
Pull a RenderDoc/PIX capture. When the GPU profiler line is "DiffuseIndirectAndAO" but you need to see which sub-pass within it is hot, RenderDoc's pass list will name them. PIX is the equivalent on Windows for D3D12.

🔎

Cross-reference our regression-diagnosis tutorial The general workflow for narrowing GPU regressions, including reading stat gpu, comparing captures, and isolating cost to a single pass, is covered in detail in Diagnosing GPU Regressions. Lumen-specific isolation is just that workflow with the four Lumen lines as your needles.

Building scalability presets that mean something

Most projects ship Lumen scalability tiers that are empty templates. The Cinematic tier sets one CVar; Low does the same. The result is that a player on a low-end SKU pays nearly the full cost of Lumen for Low-tier visual quality.

A useful scalability ladder builds from real measurements. The structure that's worked across multiple shipped UE5 titles:

Engine/Config/BaseScalability.ini (excerpt, illustrative)

[PostProcessQuality@0]   ; Low
r.Lumen.ScreenProbeGather.DownsampleFactor=32
r.Lumen.ScreenProbeGather.SpatialFilterNumPasses=1
r.Lumen.ScreenProbeGather.IntegrateDownsampleFactor=2
r.Lumen.Reflections.DownsampleFactor=2
r.Lumen.Reflections.MaxRoughnessToTrace=0.3
r.Lumen.HardwareRayTracing=0

[PostProcessQuality@2]   ; High
r.Lumen.ScreenProbeGather.DownsampleFactor=16
r.Lumen.ScreenProbeGather.SpatialFilterNumPasses=2
r.Lumen.Reflections.DownsampleFactor=1
r.Lumen.Reflections.MaxRoughnessToTrace=0.4

[PostProcessQuality@3]   ; Cinematic
r.Lumen.ScreenProbeGather.DownsampleFactor=8
r.Lumen.ScreenProbeGather.SpatialFilterNumPasses=3
r.Lumen.Reflections.MaxRoughnessToTrace=0.5

The pattern: the Low tier has every cheap quality-cost knob pushed; the Cinematic tier turns them off in favor of full-quality settings. Each tier should be measured in your worst-case scene; if Low and Medium produce the same frame time, the tier doesn't earn its existence.

✓

Locking the win in CI

Lumen regressions are rarely a single dramatic event. They look like a slow drift — one PR adds 0.2 ms to the surface cache pass, another adds 0.3 ms to reflections, and three months later your console build doesn't hit 30 fps anymore. The only way to keep that drift from happening is to track the four Lumen lines on every pull request.

Set up a representative scenario for each of your Lumen-heavy levels, capture stat gpu after a few seconds of gameplay, and parse the lines for LumenSceneUpdate, DiffuseIndirectAndAO, Lumen Reflections, and ShortRangeAO. Gate every PR against a per-pass budget. When a teammate accidentally re-enables hardware ray tracing, raises probe density, or forgets a reflections-roughness cap, you'll know which commit caused it — instead of finding out two milestones into QA.

This is exactly what PerfGuard automates: capture the baseline, track per-stat regression on every PR, fail the build when any of the four pass-budgets blow. The CVars in this tutorial give you the dials; PerfGuard makes sure no one quietly turns them.

Virtual Shadow Maps — Lumen's neighbor in cost; the two systems share many of the same enemies (WPO, dynamic geometry).
Nanite Performance Deep Dive — Lumen's surface cache leans on Nanite for cluster-level culling; the two systems are entwined.
Gotcha #12: Lumen and Hardware Ray Tracing Defaults — the short version of this tutorial in field-guide form.