Can I run DeepSeek-V2.5 on AMD RX 7900 XT?

cancel
Fail/OOM
This GPU doesn't have enough VRAM
GPU VRAM
20.0GB
Required
472.0GB
Headroom
-452.0GB

VRAM Usage

0GB 100% used 20.0GB

info Technical Analysis

The AMD RX 7900 XT, with its 20GB of GDDR6 VRAM, falls significantly short of the memory requirements for running DeepSeek-V2.5. This Large Language Model (LLM), boasting 236 billion parameters, necessitates approximately 472GB of VRAM when using FP16 precision. The substantial 452GB VRAM deficit means the entire model cannot be loaded onto the GPU for inference. Furthermore, while the RX 7900 XT offers a memory bandwidth of 0.8 TB/s, which is respectable, it becomes irrelevant when the model exceeds the GPU's memory capacity. The absence of Tensor Cores on the RX 7900 XT further limits the potential for optimized tensor operations, impacting performance even if the model could somehow fit in memory.

Due to the insufficient VRAM, the AMD RX 7900 XT cannot directly run DeepSeek-V2.5. Even with aggressive quantization techniques, fitting the entire model and its working memory into the available 20GB is highly improbable. This limitation will prevent any meaningful inference, resulting in either an out-of-memory error or extremely slow processing speed due to constant data swapping between system RAM and GPU memory, rendering real-time or even near real-time interaction impossible. The estimated tokens per second and batch size are effectively zero in this scenario.

lightbulb Recommendation

Given the VRAM limitations, running DeepSeek-V2.5 directly on the AMD RX 7900 XT is not feasible. Consider using cloud-based inference services that offer access to GPUs with sufficient VRAM, such as those provided by NelsaHost, or explore distributed inference solutions that split the model across multiple GPUs. Alternatively, focus on smaller LLMs that fit within the RX 7900 XT's VRAM capacity, or consider extreme quantization techniques like 4-bit or even 2-bit quantization coupled with CPU offloading, though this will severely impact performance and potentially the model's accuracy.

If you are determined to experiment, investigate llama.cpp with aggressive quantization to the lowest possible bit depth that retains acceptable accuracy for your use case. Be prepared for very slow inference speeds and consider this approach only for experimentation or very low-throughput applications. Prioritize optimizing for the smallest possible memory footprint and be ready to offload significant portions of the computation to the CPU.

tune Recommended Settings

Batch_Size
1
Context_Length
1024
Other_Settings
['CPU offloading (if possible)', 'Reduce number of layers', 'Use smaller embedding size']
Inference_Framework
llama.cpp
Quantization_Suggested
q2_K or lower

help Frequently Asked Questions

Is DeepSeek-V2.5 compatible with AMD RX 7900 XT? expand_more
No, DeepSeek-V2.5 is not directly compatible with the AMD RX 7900 XT due to insufficient VRAM.
What VRAM is needed for DeepSeek-V2.5? expand_more
DeepSeek-V2.5 requires approximately 472GB of VRAM for FP16 precision.
How fast will DeepSeek-V2.5 run on AMD RX 7900 XT? expand_more
DeepSeek-V2.5 will run extremely slowly or not at all on the AMD RX 7900 XT due to the VRAM limitations, likely resulting in unusable performance. Expect very low tokens per second.