Can I run BGE-M3 on AMD RX 7900 XTX?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
24.0GB
Required
1.0GB
Headroom
+23.0GB

VRAM Usage

0GB 4% used 24.0GB

Performance Estimate

Tokens/sec ~63.0
Batch size 32

info Technical Analysis

The AMD RX 7900 XTX, equipped with 24GB of GDDR6 VRAM and leveraging the RDNA 3 architecture, offers substantial resources for running the BGE-M3 embedding model. BGE-M3, with its relatively small 0.5B parameter size, only requires approximately 1GB of VRAM in FP16 precision. This leaves a significant 23GB of VRAM headroom, allowing for larger batch sizes, longer context lengths, or the concurrent execution of multiple model instances. The RX 7900 XTX's 0.96 TB/s memory bandwidth ensures rapid data transfer between the GPU and memory, further contributing to efficient model execution.

However, it's crucial to acknowledge the absence of dedicated Tensor Cores on AMD GPUs. While the RX 7900 XTX can still perform the necessary computations, the lack of Tensor Cores may result in lower throughput compared to NVIDIA GPUs with equivalent specifications when running models specifically optimized for Tensor Core acceleration. The estimated 63 tokens/sec is an approximation, and the actual performance will depend on the specific inference framework used and the level of optimization achieved. Despite this, the ample VRAM and high memory bandwidth make the RX 7900 XTX a viable option for deploying BGE-M3.

lightbulb Recommendation

Given the ample VRAM, users should prioritize maximizing batch size to improve throughput. Experiment with different batch sizes to find the optimal balance between latency and throughput for your specific application. Consider using an inference framework optimized for AMD GPUs, such as ONNX Runtime with the AMD ROCm backend, or libraries like `torch-mlir` or `shark` to compile the model specifically for the RDNA3 architecture. While FP16 precision is sufficient for BGE-M3, exploring lower precision options like INT8 might provide further performance gains with minimal impact on accuracy, but requires careful evaluation.

tune Recommended Settings

Batch_Size
32
Context_Length
8192
Other_Settings
['Enable graph optimization in ONNX Runtime', 'Profile performance to identify bottlenecks', 'Use asynchronous data loading to overlap computation and data transfer']
Inference_Framework
ONNX Runtime with ROCm backend
Quantization_Suggested
INT8 (after accuracy evaluation)

help Frequently Asked Questions

Is BGE-M3 compatible with AMD RX 7900 XTX? expand_more
Yes, BGE-M3 is fully compatible with the AMD RX 7900 XTX.
What VRAM is needed for BGE-M3? expand_more
BGE-M3 requires approximately 1GB of VRAM when using FP16 precision.
How fast will BGE-M3 run on AMD RX 7900 XTX? expand_more
BGE-M3 is estimated to run at approximately 63 tokens/sec on the AMD RX 7900 XTX, but the actual speed may vary depending on the inference framework and optimization level.