Can I run BGE-M3 on NVIDIA RTX 3090?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
24.0GB
Required
1.0GB
Headroom
+23.0GB

VRAM Usage

0GB 4% used 24.0GB

Performance Estimate

Tokens/sec ~90.0
Batch size 32

info Technical Analysis

The NVIDIA RTX 3090, with its substantial 24GB of GDDR6X VRAM, is exceptionally well-suited for running the BGE-M3 embedding model. BGE-M3, at only 0.5B parameters, requires a mere 1GB of VRAM in FP16 precision, leaving a massive 23GB of headroom. This abundant VRAM allows for large batch sizes and the potential to run multiple instances of the model concurrently, significantly boosting throughput. The RTX 3090's high memory bandwidth of 0.94 TB/s ensures rapid data transfer between the GPU and memory, preventing memory bottlenecks that can hinder performance, especially during large batch inferences.

lightbulb Recommendation

Given the RTX 3090's ample resources, users should prioritize maximizing batch size to fully utilize the GPU's parallel processing capabilities. Experiment with batch sizes up to 32 or even higher, monitoring GPU utilization to find the optimal balance between throughput and latency. Consider using inference frameworks like `vLLM` or `text-generation-inference` to take advantage of optimized kernel implementations and advanced scheduling techniques that can further improve performance. Quantization to INT8 or even lower precisions could further reduce VRAM footprint, allowing for even larger batch sizes without sacrificing accuracy significantly.

tune Recommended Settings

Batch_Size
32
Context_Length
8192
Other_Settings
['Enable CUDA graph capture', 'Experiment with different attention mechanisms (e.g., flash attention)', 'Monitor GPU utilization and adjust batch size accordingly']
Inference_Framework
vLLM
Quantization_Suggested
INT8

help Frequently Asked Questions

Is BGE-M3 compatible with NVIDIA RTX 3090? expand_more
Yes, BGE-M3 is perfectly compatible with the NVIDIA RTX 3090, thanks to the GPU's large VRAM capacity.
What VRAM is needed for BGE-M3? expand_more
BGE-M3 requires approximately 1GB of VRAM when using FP16 precision.
How fast will BGE-M3 run on NVIDIA RTX 3090? expand_more
You can expect excellent performance with BGE-M3 on the RTX 3090, potentially reaching around 90 tokens/sec. This can be further optimized with larger batch sizes and efficient inference frameworks.