Can I run BGE-Small-EN on NVIDIA H100 SXM?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
80.0GB
Required
0.1GB
Headroom
+79.9GB

VRAM Usage

0GB 0% used 80.0GB

Performance Estimate

Tokens/sec ~135.0
Batch size 32

info Technical Analysis

The NVIDIA H100 SXM, with its massive 80GB of HBM3 VRAM and 3.35 TB/s memory bandwidth, is an exceptionally powerful GPU, making it ideally suited for a wide range of AI workloads. The BGE-Small-EN model, being a relatively small embedding model with only 0.03B parameters and a modest 0.1GB VRAM footprint in FP16 precision, presents virtually no challenge for the H100. The H100's architecture, based on the Hopper generation, includes 16896 CUDA cores and 528 Tensor cores, providing ample computational resources for rapid inference. This combination ensures that the model can be loaded entirely into the GPU's memory with a significant amount of headroom remaining for larger batch sizes or concurrent deployments of other models.

lightbulb Recommendation

Given the substantial VRAM headroom, users should explore increasing the batch size to maximize throughput. Experiment with batch sizes larger than the estimated 32 to fully utilize the H100's processing capabilities. Also, consider deploying multiple instances of the BGE-Small-EN model concurrently to serve a larger number of requests in parallel. While FP16 offers a good balance of speed and accuracy, for applications where even higher throughput is crucial, explore using INT8 quantization to potentially further accelerate inference without significant loss in embedding quality. Monitor GPU utilization to ensure optimal performance and avoid bottlenecks.

tune Recommended Settings

Batch_Size
32 (start), experiment with larger sizes
Context_Length
512
Other_Settings
['Enable CUDA graph capture', 'Use asynchronous batching']
Inference_Framework
vLLM or text-generation-inference
Quantization_Suggested
INT8 (optional)

help Frequently Asked Questions

Is BGE-Small-EN compatible with NVIDIA H100 SXM? expand_more
Yes, BGE-Small-EN is perfectly compatible with the NVIDIA H100 SXM due to its small size and the H100's vast resources.
What VRAM is needed for BGE-Small-EN? expand_more
BGE-Small-EN requires approximately 0.1GB of VRAM when using FP16 precision.
How fast will BGE-Small-EN run on NVIDIA H100 SXM? expand_more
BGE-Small-EN is estimated to achieve around 135 tokens/sec on the NVIDIA H100 SXM. This performance can be further improved by optimizing batch size and quantization.