RTX 3090: Running Phi-3 Medium 14B - Compatibility & Performance

info Technical Analysis

The NVIDIA RTX 3090, with its 24GB of GDDR6X VRAM, is well-suited for running the Phi-3 Medium 14B model, especially when using INT8 quantization. Quantization reduces the model's memory footprint, bringing the VRAM requirement down to a manageable 14GB. This leaves a significant 10GB VRAM headroom, which is beneficial for handling larger batch sizes and longer context lengths without encountering out-of-memory errors. The RTX 3090's substantial memory bandwidth of 0.94 TB/s ensures rapid data transfer between the GPU and memory, crucial for maintaining high inference speeds. Furthermore, the 10496 CUDA cores and 328 Tensor Cores provide ample computational power to accelerate the matrix multiplications and other operations inherent in transformer-based models like Phi-3.

lightbulb Recommendation

For optimal performance with Phi-3 Medium 14B on the RTX 3090, prioritize using an efficient inference framework like `llama.cpp` or `vLLM`. Experiment with different batch sizes to find a balance between throughput and latency. A batch size of 3 is a good starting point, but increasing it can significantly improve tokens/sec if your application is less sensitive to latency. Also, consider using a context length smaller than the maximum of 128000 if you don't need the full length, as shorter contexts generally lead to faster processing. Monitor GPU utilization and VRAM usage to fine-tune these parameters for your specific use case. Profile your application and consider other optimization techniques like attention mechanisms to further optimize performance.

tune Recommended Settings

Batch_Size

3

Context_Length

64000

Other_Settings

['Enable CUDA graph capture', 'Use Paged Attention', 'Experiment with different scheduling algorithms in vLLM']

Inference_Framework

vLLM

Quantization_Suggested

INT8

help Frequently Asked Questions

Is Phi-3 Medium 14B (14.00B) compatible with NVIDIA RTX 3090? expand_more

Yes, Phi-3 Medium 14B is fully compatible with the NVIDIA RTX 3090, especially when using INT8 quantization.

What VRAM is needed for Phi-3 Medium 14B (14.00B)? expand_more

With INT8 quantization, Phi-3 Medium 14B requires approximately 14GB of VRAM.

How fast will Phi-3 Medium 14B (14.00B) run on NVIDIA RTX 3090? expand_more

You can expect an estimated throughput of around 60 tokens/sec on the RTX 3090 with INT8 quantization and a reasonable batch size. Actual performance may vary depending on the specific implementation, context length, and other system factors.

NelsaHost

Can I run Phi-3 Medium 14B (INT8 (8-bit Integer)) on NVIDIA RTX 3090?

VRAM Usage

Performance Estimate

info Technical Analysis

lightbulb Recommendation

tune Recommended Settings

help Frequently Asked Questions

GPU

AI Model

Alternative Quantizations

More with RTX 3090