Mobile Performance Deep Dive
Mobile is the worst possible home for "feels fine on my dev kit" optimization. Twenty-two thousand Android device models ship every year, fifty percent of them are low-end, and the wrong scalability tier silently kills frame rate on a quarter of your install base. This tutorial covers the rendering paths that actually run on mobile, the GPU-architecture deltas that decide your perf shape, and the per-device-tier discipline that shipped UE5 mobile titles use to survive.
Mobile rendering paths in 5.5/5.6
UE5 mobile supports three meaningfully different render paths. The choice cascades into every other decision in this tutorial:
- Mobile Forward (default). Single-pass, lowest bandwidth on tile-based GPUs. The shipping default for any project that doesn't have a specific reason otherwise. Limited dynamic-light support, no Lumen, no VSM for non-Nanite.
- Mobile Deferred. Vulkan/Metal only; needs MobileHDR enabled to allocate the floating-point GBuffer in tile memory. Better support for many lights, more post-processing options, but doubles tile-memory pressure and is gated off OpenGL ES via
r.Mobile.AllowDeferredShadingOpenGL=0. - Desktop Forward / Deferred on high-end mobile. Vulkan-on-Snapdragon-flagship and iPhone Metal can run the desktop renderer at much higher quality. Used for photorealistic mobile content; expect ~3× the per-frame cost of the mobile path.
The bandwidth-vs-ALU trade is the most important shape difference: tile-based GPUs are bandwidth-bound, not compute-bound, so MobileHDR (which moves the tile buffer from RGBA8 to floating-point) is a much bigger cost than its instruction count suggests.
Feature levels & shading paths matrix
The CVar that gates everything: r.Mobile.ShadingPath — 0 is forward (default), 1 is deferred. The deferred path requires Vulkan/Metal AND MobileHDR. Dropping any of those silently falls back.
[/Script/Engine.RendererSettings] ; Default mobile shading path r.Mobile.ShadingPath=0 ; Required for deferred / GTAO / PPR / post-process r.MobileHDR=True ; Auto-instancing on mobile (must be set in DefaultEngine.ini, ECVF_ReadOnly) r.Mobile.SupportGPUScene=1 ; AA: 0=off 1=FXAA 2=TemporalAA 3=MSAA r.Mobile.AntiAliasing=3
r.Mobile.SupportGPUScene is read-only. Setting it at runtime silently no-ops — it must live in DefaultEngine.ini. This is one of the most common misconfigurations on mobile-targeting projects (per the CVar wiki).
API selection per platform: Vulkan on Android-flagship and Snapdragon-Adreno is the modern default. OpenGL ES is the legacy path for older devices. Metal on iOS/iPadOS is non-negotiable. Forced GLES on Adreno5xx is engine-default behavior — Android_Adreno5xx_No_Vulkan profile force-disables Vulkan to avoid driver crashes.
Tile-based GPUs: Mali / Adreno / Apple
Every modern mobile GPU is tile-based. The frame is split into screen-aligned tiles (typically 16×16 or 32×32 pixels), each rasterized into on-chip tile memory before being written out to main RAM. The implication: bandwidth dominates. Reducing render-target writes, using FrameBufferFetch (Mali) and Pixel Local Storage (Adreno) for in-tile data flow, and avoiding mid-frame full resolves saves more than any ALU optimization.
Vendor specifics:
- Mali (Arm). 4× MSAA is "close to zero performance penalty" on Mali because the tile buffer natively supports 4 samples per pixel (per Arm's UE mobile blog). Memory residency tends to run higher on Mali than on Adreno for the same scene (forum-confirmed).
- Adreno (Qualcomm). 4× MSAA is not free here — Adreno tile memory is sized differently. Snapdragon Profiler is the diagnostic tool. Lightspeed/Tencent's Adreno Tile Memory Heap integration in Neverness to Everness (per their GDC 2025 session) is the canonical Adreno deep-dive.
- Apple. A-series GPUs are bandwidth-strong but power-bound. Xcode's Metal GPU Capture is fast and surfaces tile-memory utilization. iOS thermal throttle is aggressive; design for sustained, not peak, power.
Draw call & instancing budgets per tier
Epic's official guideline: under 700 draw calls per view on mobile (per Performance Guidelines for Mobile Devices). Practical tier targets that shipped projects use:
| Tier | Draw calls / view | Tris on screen | Notes |
|---|---|---|---|
| Low-end | ~250 | 250k | Snapdragon 6-series, Mali-G52, A11 Bionic and similar |
| Mid | ~450 | 500k | Snapdragon 7-series, Mali-G610, A13–A14 |
| Flagship | ~700 | 1M+ | Snapdragon 8 Gen 2/3, Mali-G715/G720, A16+ |
Auto-instancing on mobile requires r.Mobile.SupportGPUScene=1. Without it, every draw is its own submission; HISMs and merged static actors are the only path to staying under budget. Flat scenes with many unique materials are worst-case.
The single emitter beats ten emitters rule. One Niagara emitter at 1,000 particles is consistently faster than ten emitters at 100, because per-emitter tick and dispatch overhead dominate at low particle counts (per Epic's Niagara optimization tutorial).
Texture memory budgets
Mobile uses ASTC (Android/iOS modern) and ETC2 (legacy Android fallback). The streaming pool (r.Streaming.PoolSize) should be sized to 30–40% of the device VRAM budget; for a phone with 6 GB total RAM and ~2 GB available to the app, that's roughly 600–800 MB texture pool.
The memreport -full workflow on a packaged build is non-negotiable for finding bloat. Mali devices in particular run memory residency higher than Qualcomm for identical scenes (documented) — budget for the worst case.
Common offenders to audit per platform:
- UI atlases too large (4K UI textures on a 1080p phone screen).
- Lightmaps not in mobile-specific groups.
- Per-character normal maps that should be shared.
- Unused Anim sequences cooking into the build (audit with
obj list class=AnimSequence).
Mobile lighting reality — what works
Mobile cannot run Lumen. Mobile cannot run Nanite. Mobile cannot run Virtual Shadow Maps for non-Nanite content. The realistic toolset:
- One stationary directional light + Cascaded Shadow Maps for the sun. CSM Dynamic Shadow Distance ~4500 cm is a typical Fortnite-mobile reference.
- Per-component "Receive CSM Shadows" gating. Most static set-dressing should not receive CSM — sample with sky/ambient instead.
- Stationary point/spot lights with baked direct contribution + dynamic for character only.
- Sky DFAO and baked GI for indirect lighting where appropriate.
- Distance-field shadows are desktop-on-mobile only. Don't assume them on the standard mobile path.
For projects upgrading from desktop, this is usually the largest content rework. Skylight + DFAO + SSGI is a workable indirect substitute for Lumen at mobile target quality; baked GI + lightmaps is sturdier still.
Mobile-specific scalability
Lyra-style Mobile Device Profile inheritance is the canonical pattern for the Android long tail. Each device tier inherits from a base mobile profile and overrides specific sg.* values:
; Base mobile profile [Mobile DeviceProfile] DeviceType=Android BaseProfileName= +CVars=sg.EffectsQuality=2 +CVars=sg.ShadowQuality=2 +CVars=sg.ResolutionQuality=85 ; Low-end Adreno (Snapdragon 6-series) [Android_Adreno6xx DeviceProfile] BaseProfileName=Mobile +CVars=sg.EffectsQuality=0 +CVars=sg.ShadowQuality=1 +CVars=sg.ResolutionQuality=70 ; Flagship Snapdragon 8 Gen 2 [Android_Adreno7xx DeviceProfile] BaseProfileName=Mobile +CVars=sg.EffectsQuality=3 +CVars=sg.ShadowQuality=3 +CVars=sg.ResolutionQuality=100
r.MobileContentScaleFactor is the global resolution multiplier. Note: it's silently ignored on iOS in some UE5 configurations (per a tracked forum bug) — verify on packaged iOS builds.
Dynamic Resolution complements per-device profiles; on flagship devices set min 70%, max 100% of native; on low-end set min 60%, max 85% to avoid GPU spikes. TSR's compute cost can outweigh the upscale benefit on weaker mobile GPUs — FXAA + lower screen percentage often beats TSR on the lowest tier.
Mobile post-process and effects
Each post-process feature is gated by MobileHDR + a quality CVar:
- GTAO via
r.Mobile.AmbientOcclusionQuality(default 0 = off; 1+ enables; requires MobileHDR + Mobile Ambient Occlusion in Project Settings). - Pixel Projected Reflections (PPR) via
r.Mobile.PixelProjectedReflectionQuality(requires MobileHDR + Planar Reflection Mode = MobilePPR/MobilePPRExclusive). UE 5.7.2 dropped this Project Settings UI entry — verify per engine version (tracked). - Bloom + Tonemap — cost in tile memory; both add fp tile-buffer load.
- Shader complexity targets: 250–300 PS instructions on flagships, 80–120 on low-end (per community measurements).
Niagara CPU sims work universally on mobile. GPU compute sims are inconsistent on Android — per Epic forums, GPUCompute sims don't display on the Mobile Previewer and behave unevenly across Android drivers. Default to CPU sims with conservative spawn rates on mobile.
libGLES_mali.so during descriptor-set updates on Vulkan-Mali devices in UE 5.6 (per Epic forums). Until Epic ships a fix, consider 5.5 stable for Mali-targeting projects or fall back to OpenGL ES on affected SKUs.
Profiling toolchain (Snapdragon, Mali, Xcode)
Insights on a desktop dev kit doesn't tell you what runs on a real phone. The vendor profilers do:
- Snapdragon Profiler (Adreno). Render-stage trace, perf counters, bandwidth, GPU timing per pass. Required for any Adreno-targeting project.
- Arm Performance Studio / Mali Graphics Debugger. Tile timeline, FBF (FrameBufferFetch) analysis. Free download from developer.arm.com.
- Xcode Metal GPU Capture & Frame Capture (iOS). Built into Xcode; tile-memory utilization is one click away.
- RenderDoc Meta Fork. For Quest and Quest-class XR mobile.
- Unreal Insights with
-tracehost. Connect from desktop to a phone running a Development build. - Material Editor's Mali Offline Compiler / Adreno Offline Compiler. Per-material shader cycle count and register pressure, surfaced inside the editor (per Meta's docs).
Shipping checklist for mobile
Before submitting a mobile build, run through this list against a packaged Test/Shipping build on real hardware in each tier:
Pre-ship audit
- Render path verified per platform. Forward + Vulkan/Metal on flagship; OpenGL ES forward on low-end Adreno5xx.
r.Mobile.SupportGPUScene=1set inDefaultEngine.ini(read-only; runtime sets no-op).- Draw calls under 700 per view on flagship target, scaled down per tier.
- MobileHDR on/off decision is intentional — not just inherited from a desktop project's defaults.
- Lyra-style Device Profile inheritance covering at minimum: low-Adreno, mid-Adreno, flagship-Adreno, low-Mali, mid-Mali, flagship-Mali, A11−A13 Apple, A14+ Apple.
- Streaming pool sized 30–40% of device VRAM budget per tier.
- Niagara emitters consolidated; CPU sims preferred on Android due to GPUCompute fragility.
- Mobile shader complexity under 300 PS instructions on flagship, 120 on low-end.
- UE 5.6 Mali Vulkan crash mitigation in place (engine pin or GLES fallback) until Epic ships the fix.
- Snapdragon Profiler + Mali Performance Studio captures saved as baseline for the next regression compare.
For continuous regression-tracking across the long tail, PerfGuard baselines per device profile bucket (low-Adreno, mid-Mali, flagship-Vulkan, Apple A-series), so a single CI run flags draw-call ceilings, texture-pool overflows, MobileHDR toggles, and 5.6-style RHI crash signatures before they ship to a quarter of your install base.
- Upscaler Tuning — TSR vs FXAA decision on mobile.
- Niagara Performance — the GPUCompute-on-mobile caveat in detail.
- VR/XR Performance — the Quest/mobile-XR overlap.