Can I run Mistral Large 2 on NVIDIA RTX 3090 Ti?

cancel
Fail/OOM
This GPU doesn't have enough VRAM
GPU VRAM
24.0GB
Required
246.0GB
Headroom
-222.0GB

VRAM Usage

0GB 100% used 24.0GB

info Technical Analysis

The NVIDIA RTX 3090 Ti, while a powerful GPU, falls short of the VRAM requirements for running Mistral Large 2. Mistral Large 2, with its 123 billion parameters, necessitates approximately 246GB of VRAM when using FP16 precision. The RTX 3090 Ti offers only 24GB of VRAM, resulting in a significant deficit of 222GB. This VRAM shortfall means the entire model cannot be loaded onto the GPU simultaneously for inference. Memory bandwidth, while substantial at 1.01 TB/s on the 3090 Ti, becomes irrelevant when the model cannot fit in memory.

Due to the VRAM limitation, directly running Mistral Large 2 on the RTX 3090 Ti is infeasible without employing advanced techniques like model quantization or offloading layers to system RAM. Even with these strategies, performance will be significantly degraded. Expect extremely low tokens per second and severely limited batch sizes, making real-time or interactive applications impractical. The Ampere architecture of the RTX 3090 Ti, while offering substantial CUDA and Tensor cores, cannot overcome the fundamental memory constraint in this scenario.

lightbulb Recommendation

Given the substantial VRAM difference, directly running Mistral Large 2 on the RTX 3090 Ti is not recommended. Consider exploring model quantization techniques, such as using 4-bit or 8-bit quantization, which can significantly reduce the VRAM footprint. Frameworks like `llama.cpp` excel at this. Another option is offloading some model layers to system RAM, although this will drastically reduce inference speed. For optimal performance, consider using a multi-GPU setup with sufficient combined VRAM or utilizing cloud-based inference services designed for large language models.

If you proceed with quantization and offloading, start with a very small batch size (e.g., 1) and a reduced context length to minimize VRAM usage. Closely monitor VRAM usage and adjust settings accordingly. Be prepared for significantly slower inference speeds compared to GPUs with sufficient VRAM. As an alternative, consider using smaller models that fit within the RTX 3090 Ti's VRAM capacity, even if they are less powerful than Mistral Large 2.

tune Recommended Settings

Batch_Size
1
Context_Length
2048
Other_Settings
['Use --threads to adjust CPU usage for offloading', 'Experiment with different quantization methods to find the best balance between VRAM usage and performance', 'Monitor VRAM usage closely during inference']
Inference_Framework
llama.cpp
Quantization_Suggested
Q4_K_M or lower

help Frequently Asked Questions

Is Mistral Large 2 compatible with NVIDIA RTX 3090 Ti? expand_more
No, the RTX 3090 Ti does not have enough VRAM to run Mistral Large 2 without significant modifications and performance degradation.
What VRAM is needed for Mistral Large 2? expand_more
Mistral Large 2 requires approximately 246GB of VRAM when using FP16 precision.
How fast will Mistral Large 2 run on NVIDIA RTX 3090 Ti? expand_more
Due to VRAM limitations, expect extremely slow inference speeds (very low tokens per second) and a very small batch size, making it impractical for most real-time applications. Performance will be severely limited even with quantization and offloading techniques.