Qwen 2.5 72B on RTX 4090: Compatibility Analysis

info Technical Analysis

The NVIDIA RTX 4090, with its 24GB of GDDR6X VRAM, falls short of the 36GB VRAM requirement for running Qwen 2.5 72B (72.00B) quantized to Q4_K_M. This means the entire model cannot be loaded onto the GPU, preventing successful inference. While the RTX 4090 boasts a high memory bandwidth of 1.01 TB/s and a substantial number of CUDA and Tensor cores, these specifications become irrelevant when the model exceeds the available VRAM. Attempting to run the model in this configuration will result in errors, as the system will be unable to allocate the necessary memory for the model's weights and activations.

lightbulb Recommendation

Due to the VRAM limitations, directly running Qwen 2.5 72B (72.00B) on a single RTX 4090 is not feasible. Consider exploring model parallelism across multiple GPUs if available, which involves splitting the model across several GPUs to distribute the VRAM load. Alternatively, explore further quantization options, such as Q2 or even lower precisions, which might reduce the VRAM footprint at the cost of some accuracy. Finally, for single RTX 4090 usage, focus on smaller models with parameter counts that fit within the 24GB VRAM limit.

tune Recommended Settings

Batch_Size

1 (if any success with lower quantization)

Context_Length

Potentially reduce context length to free up VRAM…

Other_Settings

['Offload some layers to CPU (very slow)', 'Enable memory optimizations in llama.cpp']

Inference_Framework

llama.cpp

Quantization_Suggested

Q2_K or lower (if available and acceptable accura…

help Frequently Asked Questions

Is Qwen 2.5 72B (72.00B) compatible with NVIDIA RTX 4090? expand_more

No, Qwen 2.5 72B (72.00B) is not directly compatible with the NVIDIA RTX 4090 due to insufficient VRAM.

What VRAM is needed for Qwen 2.5 72B (72.00B)? expand_more

Qwen 2.5 72B (72.00B) requires at least 36GB of VRAM when quantized to Q4_K_M. FP16 requires 144GB.

How fast will Qwen 2.5 72B (72.00B) run on NVIDIA RTX 4090? expand_more

It will not run on the RTX 4090 in the tested configuration. Even with aggressive quantization or offloading, performance will likely be very slow if it runs at all.

NelsaHost

Can I run Qwen 2.5 72B (Q4_K_M (GGUF 4-bit)) on NVIDIA RTX 4090?

VRAM Usage

info Technical Analysis

lightbulb Recommendation

tune Recommended Settings

help Frequently Asked Questions

GPU

AI Model

More with RTX 4090