LTX 2.3's 22 billion parameters are big. The full BF16 weights are 44 GB — too large for most consumer GPUs. The community has produced quantized variants that compress the model to 11 GB (NVFP4) or 13 GB (GGUF Q4) without catastrophic quality loss. This guide compares every quant — FP16, BF16, FP8, FP4, NVFP4, GGUF Q4_K_M, GGUF Q8 — and tells you which to pick for your hardware.
If you'd rather not pick at all, our VirtuaVixen Studio runs the FP8 model on rented 48 GB GPUs in your browser — best-quality output, no decisions. The ComfyUI Workflow Pack ships the FP8 quant by default with optional GGUF variants for 16 GB cards. Discord if you want to compare quants with our help.
Quants Compared
| Quant | Size | VRAM | Speed (rel.) | Quality | Best GPU |
|---|---|---|---|---|---|
| FP16 / BF16 | 44 GB | ~50 GB | Baseline | Best | A100 80 GB / H100 |
| FP8 (e4m3fn) | 22 GB | ~32 GB | 1.0× | Very close to FP16 | 5090, A6000, RTX PRO 6000 |
| FP8 + offload | 22 GB | ~24 GB | 0.85× | Same as FP8 | 4090, 3090 |
| NVFP4 | 11 GB | ~22 GB | 1.5× | ~95% FP8 | 5090 only (Blackwell) |
| GGUF Q8 | 23 GB | ~32 GB | 0.7× | Very close to FP8 | Any 24+ GB |
| GGUF Q4_K_M | 13 GB | ~22 GB | 0.5× | ~88% FP8 | 4080, 5080, 16 GB cards |
| GGUF Q4_0 | 12 GB | ~20 GB | 0.5× | ~85% FP8 | 16 GB cards, tighter fit |
| FP4 / Q3 | ~9 GB | ~18 GB | 0.4× | Visibly degraded | 12 GB cards (last resort) |
FP8 — The Practical Default
If you have 24+ GB VRAM, use FP8. It's what we run in production at the Studio. The e4m3fn variant is the most common and works everywhere. Quality is essentially indistinguishable from FP16 in our blind comparisons — small differences only visible if you A/B the same seed at extreme resolution.
Filename: ltx-2.3-22b-distilled-fp8_e4m3fn.safetensors. Download from Lightricks/LTX-Video.
NVFP4 — The Best Choice for RTX 5090
NVFP4 is a new 4-bit format with native hardware support on NVIDIA's Blackwell architecture (RTX 5090, B200). On older cards (4090, 3090) NVFP4 falls back to software emulation — slower than FP8. On a 5090 it's faster than FP8 with negligible quality loss.
If you have a 5090, this is the quant to use. On any other card, stick with FP8 or GGUF.
GGUF Q4_K_M — The 16 GB Solution
GGUF is a quantization format originally built for llama.cpp but adapted for diffusion models via City96's port. Q4_K_M is the sweet spot — 4-bit weights with mixed precision for the most sensitive layers. Quality drops about 10–12% relative to FP8, but you can run LTX 2.3 on a 16 GB card.
Use the UnetLoaderGGUF node in ComfyUI to load it. Setup is identical to FP8 otherwise.
GGUF Q8 — Best Quality at the Cost of Speed
Q8 GGUF is essentially the same VRAM footprint as FP8 (~32 GB total system) but ~30% slower because GGUF loads through a different code path. Useful if you specifically want the GGUF format for compatibility with non-NVIDIA backends. For NVIDIA GPUs, use FP8 instead.
FP4 / Q3 — Last Resort
3-bit and below quants exist but the quality drop is visible — outputs become blurry, motion stutters, faces deform. Only use these on 12 GB cards as an absolute last resort. Better to rent a cloud GPU for an hour than run Q3 locally.
Distilled vs Dev
Quant choice is independent of model variant. LTX 2.3 ships in two flavours:
- Distilled (default for most users) — 9 sampling steps, CFG=1.0. Faster.
- Dev — 25–35 steps, CFG=4–7. Slower but more controllable for advanced workflows.
Both are available in FP8, GGUF Q4, etc. We use distilled in our production workflows. Full breakdown: LTX 2.3 Dev vs Distilled.
Where to Download
- Lightricks official — FP16, BF16, FP8 variants, both distilled and dev. huggingface.co/Lightricks/LTX-Video
- QuantStack — community GGUF mirror. Q4_K_M, Q8, etc.
- Kijai — popular community releases including NVFP4 builds.
For setup instructions, see How to Install LTX 2.3 Locally.
Skip the Decision
If you don't want to download a 22 GB file just to test the model, our Studio runs FP8 LTX 2.3 in your browser. Free 160 daily tokens — try it before committing to a download. For local power users, the Workflow Pack ships FP8 by default and includes optional GGUF Q4 variants for 16 GB cards.
