Can I run BGE-Large-EN on NVIDIA H100 PCIe?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
80.0GB
Required
0.7GB
Headroom
+79.3GB

VRAM Usage

0GB 1% used 80.0GB

Performance Estimate

Tokens/sec ~117.0
Batch size 32

info Technical Analysis

The NVIDIA H100 PCIe, with its massive 80GB of HBM2e VRAM and 2.0 TB/s memory bandwidth, is exceptionally well-suited for running the BGE-Large-EN embedding model. BGE-Large-EN, with only 0.33 billion parameters, requires a mere 0.7GB of VRAM when using FP16 precision. This leaves a substantial 79.3GB of VRAM headroom, allowing for significant batch processing, concurrent model serving, or the deployment of other models alongside BGE-Large-EN. The H100's Hopper architecture, featuring 14592 CUDA cores and 456 Tensor Cores, further accelerates the model's computations, ensuring low latency and high throughput.

lightbulb Recommendation

Given the ample VRAM available on the H100, users should prioritize maximizing batch size to improve overall throughput. Experiment with batch sizes up to the estimated limit of 32, and monitor GPU utilization to find the optimal balance between latency and throughput. Consider using a high-performance inference framework like vLLM or NVIDIA's TensorRT to further optimize performance. While FP16 precision is sufficient for BGE-Large-EN, exploring lower precision options like INT8 or even quantization-aware training might yield further performance gains with minimal accuracy loss.

tune Recommended Settings

Batch_Size
32
Context_Length
512
Other_Settings
['Enable CUDA graph capture', 'Use asynchronous data loading', 'Optimize tensor core usage']
Inference_Framework
vLLM
Quantization_Suggested
INT8 (optional, for further speedup)

help Frequently Asked Questions

Is BGE-Large-EN compatible with NVIDIA H100 PCIe? expand_more
Yes, BGE-Large-EN is fully compatible with the NVIDIA H100 PCIe.
What VRAM is needed for BGE-Large-EN? expand_more
BGE-Large-EN requires approximately 0.7GB of VRAM when using FP16 precision.
How fast will BGE-Large-EN run on NVIDIA H100 PCIe? expand_more
You can expect approximately 117 tokens/second with optimized settings on the NVIDIA H100 PCIe.