Can I run Phi-3 Medium 14B on NVIDIA RTX 3090?

cancel
Fail/OOM
This GPU doesn't have enough VRAM
GPU VRAM
24.0GB
Required
28.0GB
Headroom
-4.0GB

VRAM Usage

0GB 100% used 24.0GB

info Technical Analysis

The NVIDIA RTX 3090, with its 24GB of GDDR6X VRAM, falls short of the 28GB required to load the Phi-3 Medium 14B model in FP16 precision. This discrepancy of 4GB means the model, in its full FP16 format, cannot be directly loaded onto the GPU for inference. While the RTX 3090 boasts a high memory bandwidth of 0.94 TB/s and a substantial number of CUDA and Tensor cores (10496 and 328, respectively), these specifications become secondary when the model exceeds the available VRAM. Attempting to run the model without sufficient VRAM will result in errors, as the necessary weights and activations cannot be stored on the GPU. Memory bandwidth, while important for data transfer speed during inference, cannot compensate for the fundamental lack of memory capacity.

lightbulb Recommendation

To run Phi-3 Medium 14B on an RTX 3090, quantization is essential. Consider using Q4_K_M or even lower quantization levels via llama.cpp or similar frameworks. This will significantly reduce the model's memory footprint, potentially bringing it within the RTX 3090's 24GB VRAM limit. Experiment with different quantization methods to find a balance between memory usage and acceptable performance degradation. Alternatively, investigate offloading some layers to system RAM, although this will substantially reduce inference speed. If feasible, consider upgrading to a GPU with more VRAM or distributing the model across multiple GPUs using model parallelism.

tune Recommended Settings

Batch_Size
1
Context_Length
4096
Other_Settings
['Reduce the context length to the minimum required for your task to minimize VRAM usage.', 'Experiment with different quantization methods to find the best balance between performance and memory usage.', 'Monitor VRAM usage closely during inference to ensure you are not exceeding the available memory.']
Inference_Framework
llama.cpp
Quantization_Suggested
Q4_K_M

help Frequently Asked Questions

Is Phi-3 Medium 14B (14.00B) compatible with NVIDIA RTX 3090? expand_more
Not directly. The RTX 3090's 24GB VRAM is insufficient to load the Phi-3 Medium 14B model in FP16. Quantization is required.
What VRAM is needed for Phi-3 Medium 14B (14.00B)? expand_more
The Phi-3 Medium 14B model requires approximately 28GB of VRAM in FP16 precision.
How fast will Phi-3 Medium 14B (14.00B) run on NVIDIA RTX 3090? expand_more
Performance will depend heavily on the quantization level used. Expect significantly reduced tokens/sec compared to running the model in FP16 on a GPU with sufficient VRAM. Performance will also be affected by context length and batch size. Experimentation is needed to determine optimal settings.