Qwen 2.5 72B on RTX 3090 Ti: Compatibility Analysis

info Technical Analysis

The NVIDIA RTX 3090 Ti, while a powerful GPU with 10752 CUDA cores and 24GB of GDDR6X VRAM, falls short of the VRAM requirement for running Qwen 2.5 72B, even with aggressive quantization. The Q4_K_M quantization reduces the model's footprint to approximately 36GB, but this still exceeds the 3090 Ti's available 24GB. This VRAM deficit will prevent the model from loading and running effectively, as the entire model and its working memory cannot fit within the GPU's memory.

Memory bandwidth, while substantial at 1.01 TB/s on the RTX 3090 Ti, is secondary to the VRAM limitation in this scenario. Even with sufficient bandwidth, the GPU cannot process data it cannot store. The Ampere architecture's Tensor Cores would be beneficial for accelerating matrix multiplications during inference, but they remain unusable without the necessary memory capacity. The high TDP of 450W also becomes a non-issue since the model will likely not be able to run.

lightbulb Recommendation

Unfortunately, running Qwen 2.5 72B on a single RTX 3090 Ti is not feasible due to the VRAM limitation. Consider using a GPU with at least 36GB of VRAM or exploring alternative solutions like offloading layers to system RAM. Offloading will significantly degrade performance, making it unsuitable for real-time applications. Another option is to use a smaller model variant that fits within the available VRAM or use distributed inference across multiple GPUs, if possible.

tune Recommended Settings

Batch_Size

1 (if offloading to system RAM)

Context_Length

Reduce context length to the minimum acceptable v…

Other_Settings

['Enable CPU offloading (expect very slow performance)', 'Reduce the number of layers processed on the GPU']

Inference_Framework

llama.cpp (for CPU offloading)

Quantization_Suggested

No further quantization will help fit the model o…

help Frequently Asked Questions

Is Qwen 2.5 72B (72.00B) compatible with NVIDIA RTX 3090 Ti? expand_more

No, Qwen 2.5 72B is not directly compatible with the NVIDIA RTX 3090 Ti due to insufficient VRAM.

What VRAM is needed for Qwen 2.5 72B (72.00B)? expand_more

Qwen 2.5 72B requires at least 36GB of VRAM when quantized to Q4_K_M.

How fast will Qwen 2.5 72B (72.00B) run on NVIDIA RTX 3090 Ti? expand_more

Qwen 2.5 72B will likely not run on the NVIDIA RTX 3090 Ti without significant performance degradation due to VRAM limitations and the need for CPU offloading. Expect very slow inference speeds.

NelsaHost

Can I run Qwen 2.5 72B (Q4_K_M (GGUF 4-bit)) on NVIDIA RTX 3090 Ti?

VRAM Usage

info Technical Analysis

lightbulb Recommendation

tune Recommended Settings

help Frequently Asked Questions

GPU

AI Model

More with RTX 3090 Ti