Can I run Phi-3 Mini 3.8B on NVIDIA H100 PCIe?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
80.0GB
Required
7.6GB
Headroom
+72.4GB

VRAM Usage

0GB 10% used 80.0GB

Performance Estimate

Tokens/sec ~117.0
Batch size 32
Context 128000K

info Technical Analysis

The NVIDIA H100 PCIe, with its 80GB of HBM2e memory and 2.0 TB/s memory bandwidth, is exceptionally well-suited for running the Phi-3 Mini 3.8B model. Phi-3 Mini, in FP16 precision, requires approximately 7.6GB of VRAM, leaving a substantial 72.4GB of headroom on the H100. This ample VRAM allows for large batch sizes and extended context lengths without encountering memory constraints. The H100's 14592 CUDA cores and 456 Tensor Cores further accelerate the model's computations, leading to high throughput and low latency during inference. The Hopper architecture is optimized for transformer models like Phi-3 Mini, ensuring efficient utilization of the GPU's resources.

lightbulb Recommendation

Given the H100's capabilities, users should aim for high batch sizes (e.g., 32 or higher) to maximize throughput. Experiment with different context lengths up to the model's limit of 128000 tokens to determine the optimal balance between performance and information retention. Consider using inference frameworks optimized for NVIDIA GPUs, such as vLLM or NVIDIA's TensorRT, to further enhance performance. While FP16 precision is sufficient, exploring mixed precision (e.g., bfloat16) might yield additional speedups without significant loss in accuracy.

tune Recommended Settings

Batch_Size
32
Context_Length
128000
Other_Settings
['Enable CUDA graph capture', 'Use asynchronous data loading', 'Profile performance to identify bottlenecks']
Inference_Framework
vLLM
Quantization_Suggested
None (FP16)

help Frequently Asked Questions

Is Phi-3 Mini 3.8B (3.80B) compatible with NVIDIA H100 PCIe? expand_more
Yes, Phi-3 Mini 3.8B is perfectly compatible with the NVIDIA H100 PCIe, offering substantial VRAM headroom and excellent performance.
What VRAM is needed for Phi-3 Mini 3.8B (3.80B)? expand_more
Phi-3 Mini 3.8B requires approximately 7.6GB of VRAM when using FP16 precision.
How fast will Phi-3 Mini 3.8B (3.80B) run on NVIDIA H100 PCIe? expand_more
Expect approximately 117 tokens/second with a batch size of 32, but actual performance may vary depending on the specific inference framework, context length, and other settings.