The NVIDIA RTX 3090 Ti, while a powerful GPU, falls short of the VRAM requirements for running Mistral Large 2. Mistral Large 2, with its 123 billion parameters, necessitates approximately 246GB of VRAM when using FP16 precision. The RTX 3090 Ti offers only 24GB of VRAM, resulting in a significant deficit of 222GB. This VRAM shortfall means the entire model cannot be loaded onto the GPU simultaneously for inference. Memory bandwidth, while substantial at 1.01 TB/s on the 3090 Ti, becomes irrelevant when the model cannot fit in memory.
Due to the VRAM limitation, directly running Mistral Large 2 on the RTX 3090 Ti is infeasible without employing advanced techniques like model quantization or offloading layers to system RAM. Even with these strategies, performance will be significantly degraded. Expect extremely low tokens per second and severely limited batch sizes, making real-time or interactive applications impractical. The Ampere architecture of the RTX 3090 Ti, while offering substantial CUDA and Tensor cores, cannot overcome the fundamental memory constraint in this scenario.
Given the substantial VRAM difference, directly running Mistral Large 2 on the RTX 3090 Ti is not recommended. Consider exploring model quantization techniques, such as using 4-bit or 8-bit quantization, which can significantly reduce the VRAM footprint. Frameworks like `llama.cpp` excel at this. Another option is offloading some model layers to system RAM, although this will drastically reduce inference speed. For optimal performance, consider using a multi-GPU setup with sufficient combined VRAM or utilizing cloud-based inference services designed for large language models.
If you proceed with quantization and offloading, start with a very small batch size (e.g., 1) and a reduced context length to minimize VRAM usage. Closely monitor VRAM usage and adjust settings accordingly. Be prepared for significantly slower inference speeds compared to GPUs with sufficient VRAM. As an alternative, consider using smaller models that fit within the RTX 3090 Ti's VRAM capacity, even if they are less powerful than Mistral Large 2.