Qwen 2.5 72B on RTX 3090 Ti: Compatibility Analysis

info Technical Analysis

The NVIDIA RTX 3090 Ti, with its 24GB of GDDR6X VRAM, faces a challenge when running the Qwen 2.5 72B model, even with quantization. While the model's original FP16 precision demands a hefty 144GB of VRAM, quantizing to q3_k_m reduces this requirement to approximately 28.8GB. However, this still exceeds the RTX 3090 Ti's available VRAM by 4.8GB. This VRAM shortfall will prevent the model from loading and running directly on the GPU. The RTX 3090 Ti's 1.01 TB/s memory bandwidth and substantial CUDA and Tensor core counts would otherwise contribute to decent inference speeds if the model fit within the available memory.

lightbulb Recommendation

Due to the VRAM limitation, directly running Qwen 2.5 72B (q3_k_m) on the RTX 3090 Ti is not feasible. Consider offloading some layers to system RAM (CPU) using llama.cpp, although this will significantly reduce inference speed. Alternatively, explore using a smaller model variant of Qwen or other models with similar capabilities but lower VRAM footprints. Another option is to utilize cloud-based GPU services that offer instances with sufficient VRAM to accommodate the model.

tune Recommended Settings

Batch_Size

1

Context_Length

Reduce context length to minimize memory usage

Other_Settings

['Use --threads to maximize CPU utilization', 'Enable memory mapping (--mlock)', 'Experiment with different numbers of layers offloaded to the CPU']

Inference_Framework

llama.cpp (with CPU offloading)

Quantization_Suggested

q4_k_m or lower (if available and supported)

help Frequently Asked Questions

Is Qwen 2.5 72B (72.00B) compatible with NVIDIA RTX 3090 Ti? expand_more

No, Qwen 2.5 72B, even when quantized to q3_k_m, requires more VRAM (28.8GB) than the RTX 3090 Ti provides (24GB).

What VRAM is needed for Qwen 2.5 72B (72.00B)? expand_more

The VRAM required depends on the precision and quantization level. In FP16, it needs 144GB. Quantized to q3_k_m, it requires approximately 28.8GB.

How fast will Qwen 2.5 72B (72.00B) run on NVIDIA RTX 3090 Ti? expand_more

Due to insufficient VRAM, the model cannot run directly on the RTX 3090 Ti. Offloading layers to CPU will result in significantly slower performance compared to a GPU with sufficient VRAM.

NelsaHost

Can I run Qwen 2.5 72B (q3_k_m) on NVIDIA RTX 3090 Ti?

VRAM Usage

info Technical Analysis

lightbulb Recommendation

tune Recommended Settings

help Frequently Asked Questions

GPU

AI Model

More with RTX 3090 Ti