Can I run Mistral Large 2 on NVIDIA H100 PCIe?

cancel
Fail/OOM
This GPU doesn't have enough VRAM
GPU VRAM
80.0GB
Required
246.0GB
Headroom
-166.0GB

VRAM Usage

0GB 100% used 80.0GB

info Technical Analysis

The NVIDIA H100 PCIe, while a powerful GPU, falls short of the VRAM requirements for running Mistral Large 2 in FP16 precision. Mistral Large 2, with its 123 billion parameters, necessitates 246GB of VRAM when using FP16 (half-precision floating point). The H100 PCIe offers only 80GB of HBM2e memory. This substantial deficit of 166GB means the entire model cannot be loaded onto the GPU simultaneously for inference. Consequently, without significant optimization techniques, direct inference is impossible. Memory bandwidth, though a robust 2.0 TB/s on the H100, becomes irrelevant when the primary constraint is insufficient memory capacity. The large context length of 128,000 tokens further exacerbates memory pressure during inference.

lightbulb Recommendation

To run Mistral Large 2, you'll need to explore several optimization strategies. Quantization is crucial; consider using 4-bit or 8-bit quantization (e.g., QLoRA, bitsandbytes) to drastically reduce the model's memory footprint. Even with quantization, offloading layers to system RAM (CPU) might be necessary, which will significantly slow down inference speed. Alternatively, explore distributed inference across multiple GPUs, if available. If these options are not viable, consider using a smaller model or a cloud-based inference service with sufficient GPU resources. Cloud services often offer optimized environments and managed infrastructure for large language model inference.

tune Recommended Settings

Batch_Size
1 (start with the lowest possible batch size and …
Context_Length
Reduce context length to the bare minimum require…
Other_Settings
['Enable CPU offloading if necessary', 'Use a smaller model if performance is critical', 'Explore techniques like model parallelism if using multiple GPUs']
Inference_Framework
vLLM or text-generation-inference (for optimized …
Quantization_Suggested
QLoRA 4-bit or 8-bit

help Frequently Asked Questions

Is Mistral Large 2 compatible with NVIDIA H100 PCIe? expand_more
No, the NVIDIA H100 PCIe does not have enough VRAM to directly run Mistral Large 2 without significant optimization.
What VRAM is needed for Mistral Large 2? expand_more
Mistral Large 2 requires approximately 246GB of VRAM when using FP16 precision.
How fast will Mistral Large 2 run on NVIDIA H100 PCIe? expand_more
Without optimization, it won't run. With aggressive quantization and CPU offloading, expect significantly reduced tokens/second compared to optimal hardware. Performance will be highly dependent on the specific optimization techniques employed.