Can I run BGE-Small-EN on AMD RX 7900 XT?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
20.0GB
Required
0.1GB
Headroom
+19.9GB

VRAM Usage

0GB 1% used 20.0GB

Performance Estimate

Tokens/sec ~63.0
Batch size 32

info Technical Analysis

The AMD RX 7900 XT, with its 20GB of GDDR6 VRAM and RDNA 3 architecture, is exceptionally well-suited for running the BGE-Small-EN embedding model. BGE-Small-EN, with only 0.03 billion parameters, requires a mere 0.1GB of VRAM in FP16 precision. This leaves a massive 19.9GB of VRAM headroom, allowing for significant batch processing and concurrent execution of multiple instances of the model without memory constraints. The RX 7900 XT's 0.8 TB/s memory bandwidth ensures that data can be transferred quickly between the GPU and memory, further enhancing performance.

While the RX 7900 XT lacks dedicated tensor cores found in NVIDIA GPUs, the RDNA 3 architecture's compute units are still capable of efficiently handling the matrix multiplications involved in the BGE-Small-EN model. Given the model's small size, performance is primarily limited by memory bandwidth and compute throughput rather than VRAM capacity. Expect excellent inference speeds and the ability to process large batches of text embeddings concurrently. The estimated 63 tokens/second reflects the raw processing power available, and this can be further optimized with appropriate software and settings.

lightbulb Recommendation

The AMD RX 7900 XT is an excellent choice for running BGE-Small-EN. To maximize performance, use a framework like ONNX Runtime or DirectML that is optimized for AMD GPUs. Experiment with different batch sizes to find the optimal balance between throughput and latency. Since VRAM is not a limiting factor, increasing the batch size to the suggested 32 or even higher will likely improve overall tokens/second.

Consider using mixed precision (FP16 or even INT8 quantization, if supported by your chosen framework) to further improve performance. While BGE-Small-EN is already small, quantization can reduce memory footprint and improve compute efficiency. Monitor GPU utilization to ensure that the GPU is being fully utilized; if not, increase batch size or run multiple inference processes concurrently.

tune Recommended Settings

Batch_Size
32+
Context_Length
512
Other_Settings
['Optimize for AMD GPUs in chosen framework', 'Experiment with mixed precision (FP16)']
Inference_Framework
ONNX Runtime, DirectML
Quantization_Suggested
INT8 (if supported by framework)

help Frequently Asked Questions

Is BGE-Small-EN compatible with AMD RX 7900 XT? expand_more
Yes, BGE-Small-EN is perfectly compatible with the AMD RX 7900 XT.
What VRAM is needed for BGE-Small-EN? expand_more
BGE-Small-EN requires approximately 0.1GB of VRAM in FP16 precision.
How fast will BGE-Small-EN run on AMD RX 7900 XT? expand_more
You can expect approximately 63 tokens per second, potentially higher with optimizations and larger batch sizes.