Can I run LLaVA 1.6 34B on AMD RX 7900 XTX?

cancel
Fail/OOM
This GPU doesn't have enough VRAM
GPU VRAM
24.0GB
Required
68.0GB
Headroom
-44.0GB

VRAM Usage

0GB 100% used 24.0GB

info Technical Analysis

The primary limiting factor in running LLaVA 1.6 34B on an AMD RX 7900 XTX is the VRAM. LLaVA 1.6 34B, in FP16 (half-precision floating point) format, requires approximately 68GB of VRAM to load the model weights and perform inference. The RX 7900 XTX, however, only has 24GB of VRAM. This means the model cannot be loaded onto the GPU in its entirety, leading to a 'FAIL' verdict for direct compatibility. The significant VRAM deficit of 44GB prevents even basic inference from being performed.

While the RX 7900 XTX boasts a substantial memory bandwidth of 0.96 TB/s, this is irrelevant in this scenario because the model cannot even be loaded. Even if the model could be squeezed into VRAM via aggressive quantization, the lack of dedicated Tensor Cores on the RX 7900 XTX would result in significantly slower performance compared to NVIDIA GPUs with Tensor Cores. Without optimization techniques like quantization or offloading layers to system RAM, running this model directly on the RX 7900 XTX is not feasible. Given the large model size and limited VRAM, expect extremely slow or non-functional operation without significant modifications.

lightbulb Recommendation

To run LLaVA 1.6 34B with the RX 7900 XTX, aggressive quantization techniques are essential. Consider using 4-bit quantization (Q4) which significantly reduces the VRAM footprint. However, this will come at the cost of some accuracy. Even with quantization, it might be necessary to offload some layers to system RAM (CPU), which will drastically slow down inference speed.

Alternatively, explore using a smaller model variant, such as LLaVA 1.5 7B or 13B, which require significantly less VRAM. If the 34B model is absolutely necessary, consider using cloud-based inference services or upgrading to a GPU with more VRAM, such as an NVIDIA RTX 4090 or an AMD Instinct MI250X.

tune Recommended Settings

Batch_Size
1
Context_Length
2048
Other_Settings
['Offload layers to CPU (experimental and slow)', 'Use a smaller model variant', 'Enable memory mapping (if supported by the framework)']
Inference_Framework
llama.cpp
Quantization_Suggested
Q4_K_M

help Frequently Asked Questions

Is LLaVA 1.6 34B compatible with AMD RX 7900 XTX? expand_more
No, LLaVA 1.6 34B is not directly compatible with the AMD RX 7900 XTX due to insufficient VRAM. The model requires approximately 68GB of VRAM, while the RX 7900 XTX only has 24GB.
What VRAM is needed for LLaVA 1.6 34B? expand_more
LLaVA 1.6 34B requires approximately 68GB of VRAM in FP16 (half-precision floating point) format.
How fast will LLaVA 1.6 34B run on AMD RX 7900 XTX? expand_more
Without significant optimization (like quantization and CPU offloading), LLaVA 1.6 34B will likely not run at all on the AMD RX 7900 XTX. Even with aggressive optimization, expect extremely slow inference speeds.