LTX 2.3 Quants Compared — GGUF, FP8, FP4, NVFP4 (2026)

LTX 2.3's 22 billion parameters are big. The full BF16 weights are 44 GB — too large for most consumer GPUs. The community has produced quantized variants that compress the model to 11 GB (NVFP4) or 13 GB (GGUF Q4) without catastrophic quality loss. This guide compares every quant — FP16, BF16, FP8, FP4, NVFP4, GGUF Q4_K_M, GGUF Q8 — and tells you which to pick for your hardware.

If you'd rather not pick at all, our VirtuaVixen Studio runs the FP8 model on rented 48 GB GPUs in your browser — best-quality output, no decisions. The ComfyUI Workflow Pack ships the FP8 quant by default with optional GGUF variants for 16 GB cards. Discord if you want to compare quants with our help.

Quants Compared

Quant	Size	VRAM	Speed (rel.)	Quality	Best GPU
FP16 / BF16	44 GB	~50 GB	Baseline	Best	A100 80 GB / H100
FP8 (e4m3fn)	22 GB	~32 GB	1.0×	Very close to FP16	5090, A6000, RTX PRO 6000
FP8 + offload	22 GB	~24 GB	0.85×	Same as FP8	4090, 3090
NVFP4	11 GB	~22 GB	1.5×	~95% FP8	5090 only (Blackwell)
GGUF Q8	23 GB	~32 GB	0.7×	Very close to FP8	Any 24+ GB
GGUF Q4_K_M	13 GB	~22 GB	0.5×	~88% FP8	4080, 5080, 16 GB cards
GGUF Q4_0	12 GB	~20 GB	0.5×	~85% FP8	16 GB cards, tighter fit
FP4 / Q3	~9 GB	~18 GB	0.4×	Visibly degraded	12 GB cards (last resort)

FP8 — The Practical Default

If you have 24+ GB VRAM, use FP8. It's what we run in production at the Studio. The e4m3fn variant is the most common and works everywhere. Quality is essentially indistinguishable from FP16 in our blind comparisons — small differences only visible if you A/B the same seed at extreme resolution.

Filename: ltx-2.3-22b-distilled-fp8_e4m3fn.safetensors. Download from Lightricks/LTX-Video.

NVFP4 — The Best Choice for RTX 5090

NVFP4 is a new 4-bit format with native hardware support on NVIDIA's Blackwell architecture (RTX 5090, B200). On older cards (4090, 3090) NVFP4 falls back to software emulation — slower than FP8. On a 5090 it's faster than FP8 with negligible quality loss.

If you have a 5090, this is the quant to use. On any other card, stick with FP8 or GGUF.

GGUF Q4_K_M — The 16 GB Solution

GGUF is a quantization format originally built for llama.cpp but adapted for diffusion models via City96's port. Q4_K_M is the sweet spot — 4-bit weights with mixed precision for the most sensitive layers. Quality drops about 10–12% relative to FP8, but you can run LTX 2.3 on a 16 GB card.

Use the UnetLoaderGGUF node in ComfyUI to load it. Setup is identical to FP8 otherwise.

GGUF Q8 — Best Quality at the Cost of Speed

Q8 GGUF is essentially the same VRAM footprint as FP8 (~32 GB total system) but ~30% slower because GGUF loads through a different code path. Useful if you specifically want the GGUF format for compatibility with non-NVIDIA backends. For NVIDIA GPUs, use FP8 instead.

FP4 / Q3 — Last Resort

3-bit and below quants exist but the quality drop is visible — outputs become blurry, motion stutters, faces deform. Only use these on 12 GB cards as an absolute last resort. Better to rent a cloud GPU for an hour than run Q3 locally.

Distilled vs Dev

Quant choice is independent of model variant. LTX 2.3 ships in two flavours:

Distilled (default for most users) — 9 sampling steps, CFG=1.0. Faster.
Dev — 25–35 steps, CFG=4–7. Slower but more controllable for advanced workflows.

Both are available in FP8, GGUF Q4, etc. We use distilled in our production workflows. Full breakdown: LTX 2.3 Dev vs Distilled.

Where to Download

Lightricks official — FP16, BF16, FP8 variants, both distilled and dev. huggingface.co/Lightricks/LTX-Video
QuantStack — community GGUF mirror. Q4_K_M, Q8, etc.
Kijai — popular community releases including NVFP4 builds.

For setup instructions, see How to Install LTX 2.3 Locally.

Skip the Decision

If you don't want to download a 22 GB file just to test the model, our Studio runs FP8 LTX 2.3 in your browser. Free 160 daily tokens — try it before committing to a download. For local power users, the Workflow Pack ships FP8 by default and includes optional GGUF Q4 variants for 16 GB cards.

LTX 2.3 Quants Compared: GGUF Q4, Q8, FP8, FP4 & NVFP4

Quants Compared

FP8 — The Practical Default

NVFP4 — The Best Choice for RTX 5090

GGUF Q4_K_M — The 16 GB Solution

GGUF Q8 — Best Quality at the Cost of Speed

FP4 / Q3 — Last Resort

Distilled vs Dev

Where to Download

Skip the Decision

Related Reading

Author

Leave a comment

Cancel reply

Categories

LTX 2.3 Quants Compared: GGUF Q4, Q8, FP8, FP4 & NVFP4

Quants Compared

FP8 — The Practical Default

NVFP4 — The Best Choice for RTX 5090

GGUF Q4_K_M — The 16 GB Solution

GGUF Q8 — Best Quality at the Cost of Speed

FP4 / Q3 — Last Resort

Distilled vs Dev

Where to Download

Skip the Decision

Related Reading

Author

Leave a comment

Are you 18 or older?

Before you go, fuel your WAN 2.2 AI Studio

Categories