Can I run Phi-3 Mini 3.8B (INT8 (8-bit Integer)) on NVIDIA RTX 3090 Ti?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
24.0GB
Required
3.8GB
Headroom
+20.2GB

VRAM Usage

0GB 16% used 24.0GB

Performance Estimate

Tokens/sec ~90.0
Batch size 26
Context 128000K

info Technical Analysis

The NVIDIA RTX 3090 Ti, with its substantial 24GB of GDDR6X VRAM, is exceptionally well-suited for running the Phi-3 Mini 3.8B model. The model, when quantized to INT8, requires only 3.8GB of VRAM, leaving a significant 20.2GB headroom. This ample VRAM allows for larger batch sizes and longer context lengths, improving throughput. Furthermore, the RTX 3090 Ti's impressive memory bandwidth of 1.01 TB/s ensures rapid data transfer between the GPU and memory, crucial for minimizing latency during inference. The 10752 CUDA cores and 336 Tensor Cores provide substantial computational power to accelerate the matrix multiplications and other operations inherent in transformer-based models like Phi-3 Mini.

lightbulb Recommendation

To maximize performance, utilize an optimized inference framework such as `llama.cpp` or `vLLM`, which are designed to leverage the RTX 3090 Ti's architecture efficiently. Start with a batch size of 26 and experiment with increasing it until you observe diminishing returns in tokens/sec. Given the 3090 Ti's capabilities, you should be able to comfortably utilize the full 128000 token context length. Monitor GPU utilization and temperature to ensure optimal operation. If you encounter VRAM limitations when experimenting with larger batch sizes or longer context lengths, consider further quantization to INT4 to reduce memory footprint.

tune Recommended Settings

Batch_Size
26 (start and adjust based on VRAM usage)
Context_Length
128000
Other_Settings
['Enable CUDA graph capture for reduced latency', 'Use pinned memory for faster data transfers', 'Experiment with different attention mechanisms for performance gains']
Inference_Framework
llama.cpp or vLLM
Quantization_Suggested
INT8 (default), consider INT4 for larger batch si…

help Frequently Asked Questions

Is Phi-3 Mini 3.8B (3.80B) compatible with NVIDIA RTX 3090 Ti? expand_more
Yes, the Phi-3 Mini 3.8B model is fully compatible with the NVIDIA RTX 3090 Ti. The GPU has sufficient VRAM and compute power to run the model effectively.
What VRAM is needed for Phi-3 Mini 3.8B (3.80B)? expand_more
The Phi-3 Mini 3.8B model requires approximately 3.8GB of VRAM when quantized to INT8.
How fast will Phi-3 Mini 3.8B (3.80B) run on NVIDIA RTX 3090 Ti? expand_more
You can expect approximately 90 tokens per second on the NVIDIA RTX 3090 Ti, but this can vary depending on the specific inference framework, batch size, and other optimization settings.