Can I run BGE-M3 on NVIDIA H100 PCIe?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
80.0GB
Required
1.0GB
Headroom
+79.0GB

VRAM Usage

0GB 1% used 80.0GB

Performance Estimate

Tokens/sec ~117.0
Batch size 32

info Technical Analysis

The NVIDIA H100 PCIe, with its massive 80GB of HBM2e VRAM and 2.0 TB/s memory bandwidth, is exceptionally well-suited for running the BGE-M3 embedding model. BGE-M3, a relatively small model with only 0.5 billion parameters, requires a mere 1GB of VRAM in FP16 precision. This leaves a substantial 79GB of VRAM headroom, allowing for large batch sizes, concurrent model serving, or the simultaneous operation of other AI workloads. The H100's Hopper architecture, featuring 14592 CUDA cores and 456 Tensor Cores, provides ample computational power for accelerating the matrix multiplications and other operations inherent in BGE-M3, resulting in high throughput and low latency.

lightbulb Recommendation

Given the vast resources available on the H100, users should prioritize maximizing throughput by increasing the batch size. Experiment with batch sizes up to 32 or even higher, monitoring VRAM usage to ensure optimal utilization without exceeding capacity. Furthermore, explore inference frameworks like vLLM or NVIDIA's TensorRT to further optimize performance. While BGE-M3 is already a compact model, consider quantization techniques like INT8 or even lower precisions if latency is a critical factor, although the performance gains might be minimal given the H100's inherent power.

tune Recommended Settings

Batch_Size
32
Context_Length
8192
Other_Settings
['Enable CUDA graph capture for reduced latency', 'Experiment with different scheduling algorithms in vLLM', 'Monitor GPU utilization and adjust batch size accordingly']
Inference_Framework
vLLM
Quantization_Suggested
None (FP16 is likely optimal)

help Frequently Asked Questions

Is BGE-M3 compatible with NVIDIA H100 PCIe? expand_more
Yes, BGE-M3 is fully compatible with the NVIDIA H100 PCIe. The H100 significantly exceeds the model's requirements.
What VRAM is needed for BGE-M3? expand_more
BGE-M3 requires approximately 1GB of VRAM when using FP16 precision.
How fast will BGE-M3 run on NVIDIA H100 PCIe? expand_more
Expect very fast inference speeds. The H100 can achieve an estimated 117 tokens/second and potentially higher with optimizations.