Can I run Phi-3 Mini 3.8B (INT8 (8-bit Integer)) on NVIDIA RTX 3090?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
24.0GB
Required
3.8GB
Headroom
+20.2GB

VRAM Usage

0GB 16% used 24.0GB

Performance Estimate

Tokens/sec ~90.0
Batch size 26
Context 128000K

info Technical Analysis

The NVIDIA RTX 3090, with its 24GB of GDDR6X VRAM and Ampere architecture, is exceptionally well-suited for running the Phi-3 Mini 3.8B model, especially when quantized to INT8. The model requires only 3.8GB of VRAM in its INT8 quantized form, leaving a substantial 20.2GB of VRAM headroom. This ample VRAM allows for larger batch sizes and longer context lengths, leading to improved throughput and efficiency during inference. The RTX 3090's high memory bandwidth of 0.94 TB/s further ensures that data can be transferred quickly between the GPU and memory, minimizing bottlenecks and maximizing the utilization of the Tensor Cores.

lightbulb Recommendation

Given the comfortable VRAM headroom, users should experiment with increasing the batch size to maximize GPU utilization and throughput. Utilizing a framework like `vLLM` or `text-generation-inference` can provide significant performance gains through optimized memory management and kernel implementations. While INT8 quantization offers excellent performance with minimal accuracy loss, consider experimenting with FP16 for tasks where higher precision is critical, but be mindful of the increased VRAM usage. Monitor GPU utilization and memory usage to fine-tune settings for optimal performance.

tune Recommended Settings

Batch_Size
26
Context_Length
128000
Other_Settings
['Enable CUDA graph capture', 'Use Pytorch 2.0 or later', 'Experiment with different attention mechanisms (e.g. FlashAttention)']
Inference_Framework
vLLM
Quantization_Suggested
INT8

help Frequently Asked Questions

Is Phi-3 Mini 3.8B (3.80B) compatible with NVIDIA RTX 3090? expand_more
Yes, Phi-3 Mini 3.8B is fully compatible with the NVIDIA RTX 3090, providing excellent performance thanks to the RTX 3090's large VRAM capacity.
What VRAM is needed for Phi-3 Mini 3.8B (3.80B)? expand_more
The INT8 quantized version of Phi-3 Mini 3.8B requires approximately 3.8GB of VRAM.
How fast will Phi-3 Mini 3.8B (3.80B) run on NVIDIA RTX 3090? expand_more
You can expect approximately 90 tokens per second when running Phi-3 Mini 3.8B on an RTX 3090, but this can vary based on the inference framework, batch size, and other settings.