Phi-3 Mini on RTX 3090 Ti: Compatibility & Performance

info Technical Analysis

The NVIDIA RTX 3090 Ti, with its substantial 24GB of GDDR6X VRAM, is exceptionally well-suited for running the Phi-3 Mini 3.8B model. The model, when quantized to INT8, requires only 3.8GB of VRAM, leaving a significant 20.2GB headroom. This ample VRAM allows for larger batch sizes and longer context lengths, improving throughput. Furthermore, the RTX 3090 Ti's impressive memory bandwidth of 1.01 TB/s ensures rapid data transfer between the GPU and memory, crucial for minimizing latency during inference. The 10752 CUDA cores and 336 Tensor Cores provide substantial computational power to accelerate the matrix multiplications and other operations inherent in transformer-based models like Phi-3 Mini.

lightbulb Recommendation

To maximize performance, utilize an optimized inference framework such as `llama.cpp` or `vLLM`, which are designed to leverage the RTX 3090 Ti's architecture efficiently. Start with a batch size of 26 and experiment with increasing it until you observe diminishing returns in tokens/sec. Given the 3090 Ti's capabilities, you should be able to comfortably utilize the full 128000 token context length. Monitor GPU utilization and temperature to ensure optimal operation. If you encounter VRAM limitations when experimenting with larger batch sizes or longer context lengths, consider further quantization to INT4 to reduce memory footprint.

tune Recommended Settings

Batch_Size

26 (start and adjust based on VRAM usage)

Context_Length

128000

Other_Settings

['Enable CUDA graph capture for reduced latency', 'Use pinned memory for faster data transfers', 'Experiment with different attention mechanisms for performance gains']

Inference_Framework

llama.cpp or vLLM

Quantization_Suggested

INT8 (default), consider INT4 for larger batch si…

help Frequently Asked Questions

Is Phi-3 Mini 3.8B (3.80B) compatible with NVIDIA RTX 3090 Ti? expand_more

Yes, the Phi-3 Mini 3.8B model is fully compatible with the NVIDIA RTX 3090 Ti. The GPU has sufficient VRAM and compute power to run the model effectively.

What VRAM is needed for Phi-3 Mini 3.8B (3.80B)? expand_more

The Phi-3 Mini 3.8B model requires approximately 3.8GB of VRAM when quantized to INT8.

How fast will Phi-3 Mini 3.8B (3.80B) run on NVIDIA RTX 3090 Ti? expand_more

You can expect approximately 90 tokens per second on the NVIDIA RTX 3090 Ti, but this can vary depending on the specific inference framework, batch size, and other optimization settings.

NelsaHost

Can I run Phi-3 Mini 3.8B (INT8 (8-bit Integer)) on NVIDIA RTX 3090 Ti?

VRAM Usage

Performance Estimate

info Technical Analysis

lightbulb Recommendation

tune Recommended Settings

help Frequently Asked Questions

GPU

AI Model

Alternative Quantizations

More with RTX 3090 Ti