Can I run Qwen 2.5 72B (Q4_K_M (GGUF 4-bit)) on NVIDIA RTX 3090 Ti?

cancel
Fail/OOM
This GPU doesn't have enough VRAM
GPU VRAM
24.0GB
Required
36.0GB
Headroom
-12.0GB

VRAM Usage

0GB 100% used 24.0GB

info Technical Analysis

The NVIDIA RTX 3090 Ti, while a powerful GPU with 10752 CUDA cores and 24GB of GDDR6X VRAM, falls short of the VRAM requirement for running Qwen 2.5 72B, even with aggressive quantization. The Q4_K_M quantization reduces the model's footprint to approximately 36GB, but this still exceeds the 3090 Ti's available 24GB. This VRAM deficit will prevent the model from loading and running effectively, as the entire model and its working memory cannot fit within the GPU's memory.

Memory bandwidth, while substantial at 1.01 TB/s on the RTX 3090 Ti, is secondary to the VRAM limitation in this scenario. Even with sufficient bandwidth, the GPU cannot process data it cannot store. The Ampere architecture's Tensor Cores would be beneficial for accelerating matrix multiplications during inference, but they remain unusable without the necessary memory capacity. The high TDP of 450W also becomes a non-issue since the model will likely not be able to run.

lightbulb Recommendation

Unfortunately, running Qwen 2.5 72B on a single RTX 3090 Ti is not feasible due to the VRAM limitation. Consider using a GPU with at least 36GB of VRAM or exploring alternative solutions like offloading layers to system RAM. Offloading will significantly degrade performance, making it unsuitable for real-time applications. Another option is to use a smaller model variant that fits within the available VRAM or use distributed inference across multiple GPUs, if possible.

tune Recommended Settings

Batch_Size
1 (if offloading to system RAM)
Context_Length
Reduce context length to the minimum acceptable v…
Other_Settings
['Enable CPU offloading (expect very slow performance)', 'Reduce the number of layers processed on the GPU']
Inference_Framework
llama.cpp (for CPU offloading)
Quantization_Suggested
No further quantization will help fit the model o…

help Frequently Asked Questions

Is Qwen 2.5 72B (72.00B) compatible with NVIDIA RTX 3090 Ti? expand_more
No, Qwen 2.5 72B is not directly compatible with the NVIDIA RTX 3090 Ti due to insufficient VRAM.
What VRAM is needed for Qwen 2.5 72B (72.00B)? expand_more
Qwen 2.5 72B requires at least 36GB of VRAM when quantized to Q4_K_M.
How fast will Qwen 2.5 72B (72.00B) run on NVIDIA RTX 3090 Ti? expand_more
Qwen 2.5 72B will likely not run on the NVIDIA RTX 3090 Ti without significant performance degradation due to VRAM limitations and the need for CPU offloading. Expect very slow inference speeds.