Can I run Phi-3 Small 7B (INT8 (8-bit Integer)) on NVIDIA H100 PCIe?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
80.0GB
Required
7.0GB
Headroom
+73.0GB

VRAM Usage

0GB 9% used 80.0GB

Performance Estimate

Tokens/sec ~117.0
Batch size 32
Context 128000K

info Technical Analysis

The NVIDIA H100 PCIe, with its massive 80GB of HBM2e VRAM and 2.0 TB/s memory bandwidth, is exceptionally well-suited for running the Phi-3 Small 7B model. Phi-3 Small, in its INT8 quantized form, requires only 7GB of VRAM. This leaves a substantial 73GB of VRAM headroom, enabling large batch sizes and the ability to handle very long context lengths (up to 128,000 tokens) without encountering memory limitations. The H100's 14592 CUDA cores and 456 Tensor Cores further accelerate the model's computations, leading to high throughput and low latency during inference. The Hopper architecture's advancements in tensor processing contribute significantly to the model's efficient execution.

lightbulb Recommendation

Given the abundant VRAM and computational power of the H100, focus on maximizing throughput by experimenting with larger batch sizes. Start with a batch size of 32 as suggested, and incrementally increase it to find the optimal balance between latency and throughput for your specific application. Consider using techniques like speculative decoding or continuous batching to further improve performance. Profile the model's execution to identify any potential bottlenecks and fine-tune the configuration accordingly. Also, consider using the full 128k context window to experiment with long-context applications.

tune Recommended Settings

Batch_Size
32 (Experiment up to 64 or 128)
Context_Length
128000
Other_Settings
['Enable CUDA graph capture', 'Use persistent memory allocators', 'Experiment with different attention mechanisms for potential speedups']
Inference_Framework
vLLM or NVIDIA TensorRT-LLM
Quantization_Suggested
INT8 (as currently used)

help Frequently Asked Questions

Is Phi-3 Small 7B (7.00B) compatible with NVIDIA H100 PCIe? expand_more
Yes, Phi-3 Small 7B is perfectly compatible with the NVIDIA H100 PCIe. The H100 has ample VRAM and compute to run the model efficiently.
What VRAM is needed for Phi-3 Small 7B (7.00B)? expand_more
In INT8 quantized format, Phi-3 Small 7B requires approximately 7GB of VRAM.
How fast will Phi-3 Small 7B (7.00B) run on NVIDIA H100 PCIe? expand_more
You can expect an estimated throughput of around 117 tokens/second. Actual performance may vary depending on batch size, context length, and other optimization settings.